Post
The Stable Diffusion 3 research paper broken down, including some overlooked details! π
Model
π 2 base model variants mentioned: 2B and 8B sizes
π New architecture in all abstraction levels:
- π½ UNet; β¬οΈ Multimodal Diffusion Transformer, bye cross attention π
- π Rectified flows for the diffusion process
- 𧩠Still a Latent Diffusion Model
π 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness
ποΈ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)
Variants
π A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
βοΈ An Instruct Edit 2B model was trained, and learned how to do text-replacement
Results
β State of the art in automated evals for composition and prompt understanding
β Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)
Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf
Model
π 2 base model variants mentioned: 2B and 8B sizes
π New architecture in all abstraction levels:
- π½ UNet; β¬οΈ Multimodal Diffusion Transformer, bye cross attention π
- π Rectified flows for the diffusion process
- 𧩠Still a Latent Diffusion Model
π 3 text-encoders: 2 CLIPs, one T5-XXL; plug-and-play: removing the larger one maintains competitiveness
ποΈ Dataset was deduplicated with SSCD which helped with memorization (no more details about the dataset tho)
Variants
π A DPO fine-tuned model showed great improvement in prompt understanding and aesthetics
βοΈ An Instruct Edit 2B model was trained, and learned how to do text-replacement
Results
β State of the art in automated evals for composition and prompt understanding
β Best win rate in human preference evaluation for prompt understanding, aesthetics and typography (missing some details on how many participants and the design of the experiment)
Paper: https://stabilityai-public-packages.s3.us-west-2.amazonaws.com/Stable+Diffusion+3+Paper.pdf