--- license: mit datasets: - normster/RealGuardrails base_model: - Qwen/Qwen2.5-7B - normster/RealGuardrails-Qwen2.5-7B-SFT library_name: transformers --- # RealGuardrails Models This model was trained on the [RealGuardrails](https://huggingface.co/datasets/normster/RealGuardrails) dataset, an instruction-tuning dataset focused on improving system prompt adherence and precedence. In particular, it was trained via SFT on the `systemmix` split (150K examples) using our custom training library [torchllms](https://github.com/normster/torchllms) (yielding [normster/RealGuardrails-Qwen2.5-7B-SFT](https://huggingface.co/normster/RealGuardrails-Qwen2.5-7B-SFT)), and then trained via DPO on the `preferencemix` split (30K examples), and converted back to a `transformers` compatible checkpoint. ## Training Hyperparameters | Name | Value | | :--- | :--- | | DPO beta | 0.01 | | optimizer | AdamW | | batch size | 128 | | learning rate | 1e-5 | | lr scheduler | cosine with 50 warmup steps | | betas | (0.9, 0.999) | | eps | 1e-8 | | weight decay | 0 | | epochs | 1 | | max grad norm | 1.0 | | precision | bf16 | | max length | 4096 |