This is a StableDiffusion3 model uploaded using the KerasHub library and can be used with JAX, TensorFlow, and PyTorch backends. Model config:

name: stable_diffusion_3.5_medium_backbone
trainable: True
dtype: {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'bfloat16'}, 'registered_name': None}
mmdit_patch_size: 2
mmdit_hidden_dim: 1536
mmdit_num_layers: 24
mmdit_num_heads: 24
mmdit_position_size: 384
mmdit_qk_norm: rms_norm
mmdit_dual_attention_indices: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
vae: {'module': 'keras_hub.src.models.vae.vae_backbone', 'class_name': 'VAEBackbone', 'config': {'name': 'vae', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'bfloat16'}, 'registered_name': None}, 'encoder_num_filters': [128, 256, 512, 512], 'encoder_num_blocks': [2, 2, 2, 2], 'decoder_num_filters': [512, 512, 256, 128], 'decoder_num_blocks': [3, 3, 3, 3], 'sampler_method': 'sample', 'input_channels': 3, 'sample_channels': 32, 'output_channels': 3, 'scale': 1.5305, 'shift': 0.0609}, 'registered_name': 'VAEBackbone'}
clip_l: {'module': 'keras_hub.src.models.clip.clip_text_encoder', 'class_name': 'CLIPTextEncoder', 'config': {'name': 'clip_l', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float16'}, 'registered_name': None}, 'vocabulary_size': 49408, 'embedding_dim': 768, 'hidden_dim': 768, 'num_layers': 12, 'num_heads': 12, 'intermediate_dim': 3072, 'intermediate_activation': 'quick_gelu', 'intermediate_output_index': 10, 'max_sequence_length': 77}, 'registered_name': 'keras_hub>CLIPTextEncoder'}
clip_g: {'module': 'keras_hub.src.models.clip.clip_text_encoder', 'class_name': 'CLIPTextEncoder', 'config': {'name': 'clip_g', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float16'}, 'registered_name': None}, 'vocabulary_size': 49408, 'embedding_dim': 1280, 'hidden_dim': 1280, 'num_layers': 32, 'num_heads': 20, 'intermediate_dim': 5120, 'intermediate_activation': 'gelu', 'intermediate_output_index': 30, 'max_sequence_length': 77}, 'registered_name': 'keras_hub>CLIPTextEncoder'}
t5: None
latent_channels: 16
output_channels: 3
num_train_timesteps: 1000
shift: 3.0
image_shape: [1024, 1024, 3]

This model card has been generated automatically and should be completed by the model author. See Model Cards documentation for more information.

Downloads last month: 8

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support