YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Pretrained models for the paper Scaling up Masked Diffusion Models on Text

Scaling law experiments: We provided all pre-trained models in the ar_safetensors and mdm_safetensors folders. For instance, the checkpoint mdm-1028M-1600e18.safetensors represents an MDM model with 1,028 million non-embedding parameters and 1,600e18 training FLOPs. Similarly, the checkpoint mdm-170M-100e18-rsl-0.01.safetensors indicates an MDM model with 170 million non-embedding parameters, 100e18 training FLOPs, and 1% of the dataset subjected to random sequence lengths during pretraining.

Math reasoning: please see the gsm8k_safetensors folder.

Conditional generation: please see the sharegpt_safetensors folder.

Reverse curse: please see the reverse_safetensors folder

For all models, we provide models in .pth and .safetensors formats.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.