March 14, 2024

Stable Diffusion 3 – next-gen AI image generator

Stability AI, a prominent player in the field of artificial intelligence, has announced the release of Stable Diffusion 3 (SD3), the latest iteration in its line of open-weights image-synthesis models.

The Stable Diffusion family of models, including versions 1.4, 1.5, 2.0, 2.1, XL, XL Turbo, and now 3, has consistently pushed the boundaries of what AI can achieve in image generation. With SD3, Stability AI aims to provide a more open alternative to proprietary models like OpenAI’s DALL-E 3, while acknowledging the challenges of copyrighted training data, bias, and potential misuse.

Unlike its predecessors, SD3 boasts a range of models varying in size from 800 million to 8 billion parameters, enabling it to cater to a diverse array of devices, from smartphones to servers. This versatility in model size ensures that SD3 can accommodate different computational requirements while maintaining its capability to generate complex and realistic images.

CEO of Stability AI, Emad Mostaque, highlighted the technical advancements underpinning SD3, stating, "This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements. This takes advantage of transformer improvements and can not only scale further but accept multimodal inputs."

A "flow matching" technique ensures a smooth transition from random noise to structured images, thereby enhancing the model's ability to generate visually coherent outputs. And with its diffusion transformer architecture, SD3 adopts a novel approach to image synthesis, drawing inspiration from transformers known for their prowess in handling patterns and sequences. This innovative methodology not only facilitates efficient scaling but also yields higher-quality image outputs.

One of the standout features of SD3 is its adeptness in text generation, a capability that has historically posed challenges for image-synthesis models. Early indications suggest that SD3 excels in faithfully translating text prompts into corresponding images, a feat previously associated with commercial business models.

In addition to Stable Diffusion 3, Stability AI has been actively exploring other image-synthesis architectures, including the recently announced Stable Cascade, which employs a three-stage process for text-to-image synthesis. With each innovation, the company reaffirms its position as a pioneer in the realm of AI-driven image generation, pushing the boundaries of what is possible in the field.

While Stable Diffusion 3 is not yet publicly available, Stability AI has opened a waitlist for an early preview. The company has reiterated its commitment to making SD3 freely available for download and local deployment once testing is complete, emphasizing the importance of community feedback in refining the model's performance and safety.

Join the waitlist for Stable Diffusion 3 and explore the limitless potential of AI-generated art.

AI/ML News

Stable Diffusion 3 – next-gen AI image generator