YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Pretrained models for the paper Scaling up Masked Diffusion Models on Text
Scaling law experiments: We provided all pre-trained models in the ar_safetensors and mdm_safetensors folders.
For instance, the checkpoint mdm-1028M-1600e18.safetensors
represents an MDM model with 1,028 million non-embedding
parameters and 1,600e18 training FLOPs. Similarly, the checkpoint mdm-170M-100e18-rsl-0.01.safetensors
indicates
an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected
to random sequence lengths during pretraining.
Math reasoning: please see the gsm8k_safetensors folder.
Conditional generation: please see the sharegpt_safetensors folder.
Reverse curse: please see the reverse_safetensors folder
For all models, we provide models in .pth
and .safetensors
formats.
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.