【英文原版】StableDiffusion3技术报告-英.docx

上传人:p** 文档编号:1002977 上传时间:2024-06-15 格式:DOCX 页数:30 大小:934.01KB
下载 相关 举报
【英文原版】StableDiffusion3技术报告-英.docx_第1页
第1页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第2页
第2页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第3页
第3页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第4页
第4页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第5页
第5页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第6页
第6页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第7页
第7页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第8页
第8页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第9页
第9页 / 共30页
【英文原版】StableDiffusion3技术报告-英.docx_第10页
第10页 / 共30页
亲,该文档总共30页,到这儿已超出免费预览范围,如果喜欢就下载吧!
资源描述

《【英文原版】StableDiffusion3技术报告-英.docx》由会员分享,可在线阅读,更多相关《【英文原版】StableDiffusion3技术报告-英.docx(30页珍藏版)》请在第壹文秘上搜索。

1、ScalingRectifledFlowTransformersforHigh-ResolutionImageSynthesisPatrickEsserSumithKulalAndreasBlattmannRahimEntezariJonasMu,llerHarrySainiYam1.eviDominik1.orenzAxelSauerFredericBoeselDustinPodelITimDockhornZionEnglishKyle1.aceyAlexGoodwinYannikMarekRobinRombach*StabilityAIFigure1.High-resolutionsamp

2、lesfromour8Brectifiedflowmodel,showcasingitscapabilitiesintypography,precisepromptfollowingandspatialreasoning,attentiontofinedetails,andhighimagequalityacrossawidevarietyofstyles.AbstractDiffusionmodelscreatedatafromnoisebyinvertingtheforwardpathsofdatatowardsnoiseandhaveemergedasapowerfulgenerativ

3、emodelingtechniqueforhigh-dimensional,perceptualdatasuchasimagesandvideos.Rectifiedflowisarecentgenerativemodelformulationthatconnectsdataandnoiseinastraightline.Despiteitsbettertheoreticalpropertiesandconceptualsimplicity,itisnotyetdecisivelyestablishedasstandardpractice.Inthiswork,weimproveexistin

4、gnoisesamplingtechniquesfbrtrainingrectifiedflowmodelsbybiasingthemtowardsperceptuallyrelevantscales.Throughalarge-scalestudy,wedemon-4Equalcontribution.stability.ai.stratethesuperiorperformanceofthisapproachcomparedtoestablisheddiffusionformulationsforhigh-resolutiontext-to-imagesynthesis.Additiona

5、lly,wepresentanoveltransformer-basedarchitecturefortext-to-imagegenerationthatusesseparateweightsforthetwomodalitiesandenablesabidirectionalflowofinformationbetweenimageandtexttokens,improvingtextcomprehension,typography,andhumanpreferenceratings.Wedemonstratethatthisarchitecturefollowspredictablesc

6、alingtrendsandcorrelateslowervalidationlosstoimprovedtext-to-imagesynthesisasmeasuredbyvariousmetricsandhumanevaluations.Ourlargestmodelsoutperformstate-of-the-artmodels,andwewillmakeourexperimentaldata,code,andmodelweightspubliclyavailable.1. IntroductionDiffusionmodelscreatedatafromnoise(Songetal.

7、,2020).Theyaretrainedtoinvertforwardpathsofdatatowardsrandomnoiseand,thus,inconjunctionwithapproximationandgeneralizationpropertiesofneuralnetworks,canbeusedtogeneratenewdatapointsthatarenotpresentinthetrainingdatabutfollowthedistributionofthetrainingdata(Sohl-Dicksteinetal.,2015;Song&Ermon,2020).Th

8、isgenerativemodelingtechniquehasproventobeveryeffectiveformodelinghigh-dimensional,perceptualdatasuchasimages(HOetal.,2020).Inrecentyears,diffusionmodelshavebecomethede-factoapproachforgeneratinghigh-resolutionimagesandvideosfromnaturallanguageinputswithimpressivegeneralizationcapabilities(Sahariaet

9、al.,2022b;Rameshetal.,2022;Rombachetal.,2022;Podelletal.,2023;Daietal.,2023;Esseretal.,2023;Blattmannetal.,2023b;Betkeretal.,2023;Blattmannetal.,2023a;Singeretal.l2022).Duetotheiriterativenatureandtheassociatedcomputationalcosts,aswellasthelongsamplingtimesduringinference,researchonformulationsformo

10、reefficienttrainingand/orfastersamplingofthesemodelshasincreased(Karrasetal.,2023;1.iuetal.,2022).Whilespecifyingaforwardpathfromdatatonoiseleadstoefficienttraining,italsoraisesthequestionofwhichpathtochoose.Thischoicecanhaveimportantimplicationsforsampling.Forexample,aforwardprocessthatfailstoremov

11、eallnoisefromthedatacanleadtoadiscrepancyintrainingandtestdistributionandresultinartifactssuchasgrayimagesamples(1.inetal.,2024).Importantly,thechoiceoftheforwardprocessalsoinfluencesthelearnedbackwardprocessand,thus,thesamplingefficiency.Whilecurvedpathsrequiremanyintegrationstepstosimulatetheproce

12、ss,astraightpathcouldbesimulatedwithasinglestepandislesspronetoerroraccumulation.Sinceeachstepcorrespondstoanevaluationoftheneuralnetwork,thishasadirectimpactonthesamplingspeed.Aparticularchoicefortheforwardpathisaso-calledRectifiedFlow(1.iuetal.,2022;Albergo&Vanden-Eijnden,2022;1.ipmanetal.,2023),w

13、hichconnectsdataandnoiseonastraightline.Althoughthismodelclasshasbettertheoreticalproperties,ithasnotyetbecomedecisivelyestablishedinpractice.Sofar,someadvantageshavebeenempiricallydemonstratedinsmallandmedium-sizedexperiments(Maetal.,2024),butthesearemostlylimitedtoclass-conditionalmodels.Inthiswor

14、k,wechangethisbyintroducingare-weightingofthenoisescalesinrectifiedflowmodels,similartonoise-predictivediffusionmodels(Hoetal.,2020).Throughalarge-scalestudy,wecompareournewformulationtoexistingdiffusionformulationsanddemonstrateitsbenefits.Weshowthatthewidelyusedapproachfortext-to-imagesynthesis,wh

15、ereafixedtextrepresentationisfeddirectlyintothemodel(e.g.,viacross-attention(Vaswanietal.,2017;Rombachetal.,2022),isnotideal,andpresentanewarchitecturethatincorporatesIeamablestreamsforbothimageandtexttokens,whichenablesatwo-wayflowOfinformationbetweenthem.Wecombinethiswithourimprovedrectifiedflowfo

16、rmulationandinvestigateitsscalability.Wedemonstrateapredictablescalingtrendinthevalidationlossandshowthatalowervalidationlosscorrelatesstronglywithimprovedautomaticandhumanevaluations.Ourlargestmodelsoutperformstate-of-theartopenmodelssuchasSDX1.(Podelletal.,2023),SDX1.-Turbo(Saueretal.,2023),Pixart-(Chenetal.,2023),andclosed-sourcemodelssuchasDA1.1.-E3(Betkeretal.,2023)bothinquantitativeevaluation

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 外语学习 > 英语学习

copyright@ 2008-2023 1wenmi网站版权所有

经营许可证编号:宁ICP备2022001189号-1

本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。第壹文秘仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知第壹文秘网,我们立即给予删除!