|
--- |
|
license: llama3.2 |
|
datasets: |
|
- open-thoughts/OpenThoughts-114k |
|
- Jiayi-Pan/Countdown-Tasks-3to4 |
|
- FreedomIntelligence/medical-o1-verifiable-problem |
|
base_model: |
|
- meditsolutions/Llama-3.2-SUN-2.5B-chat |
|
--- |
|
|
|
# mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GPRO |
|
|
|
**Important Notice:** |
|
This model is provided strictly for research purposes and is not intended for production use. It should not be considered a validated source of medical or professional advice. Use only in controlled experimental settings. |
|
|
|
--- |
|
|
|
## Model Overview |
|
|
|
mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GPRO is a fine-tuned variant of meditsolutions/Llama-3.2-SUN-2.5B-chat, adapted specifically for exploring natural language understanding and reasoning. This model leverages a multi-stage training approach combining Blurred Thoughts Supervised Fine-Tuning (BT-SFT) and Group Relative Policy Optimization (GRPO) to enhance its performance on specialized tasks. |
|
|
|
--- |
|
|
|
## Training Procedure |
|
|
|
The model was developed through the following sequential steps: |
|
|
|
1. **Initial Blurred Thoughts Supervised Fine-Tuning (BT-SFT):** |
|
- **Base Model:** meditsolutions/Llama-3.2-SUN-2.5B-chat |
|
- **Parameters:** 2600 steps, batch size 2, accumulation iterations 16, learning rate 1e-6 |
|
- **Dataset:** open-thoughts/OpenThoughts-114k |
|
- **Details:** For further information on BT-SFT, see the [detailed post](https://huggingface.co/posts/mkurman/496852395740108) and the [GitHub repository](https://github.com/mkurman/blurred-thoughts-SFT). |
|
|
|
2. **Group Relative Policy Optimization (GRPO) Stage 1:** |
|
- **Dataset:** Jiayi-Pan/Countdown-Tasks-3to4 |
|
- **Training:** 500 steps |
|
|
|
3. **Group Relative Policy Optimization (GRPO) Stage 2:** |
|
- **Dataset:** FreedomIntelligence/medical-o1-verifiable-problem |
|
- **Training:** 50 steps |
|
|
|
4. **Final BT-SFT Stage:** |
|
- **Parameters:** Same settings as the initial BT-SFT, applied for an additional 400 steps |
|
|
|
--- |
|
|
|
## Datasets Utilized |
|
|
|
- **open-thoughts/OpenThoughts-114k:** |
|
A dataset consisting of open-ended thoughts that supports diverse conversational contexts during the initial supervised fine-tuning. |
|
|
|
- **Jiayi-Pan/Countdown-Tasks-3to4:** |
|
A dataset designed for task-specific learning, aiding in the model’s adaptation to structured problem-solving. |
|
|
|
- **FreedomIntelligence/medical-o1-verifiable-problem:** |
|
A dataset curated for enhancing the model's capabilities in addressing verifiable medical problems. |
|
|
|
--- |
|
|
|
## Intended Use |
|
|
|
- **Research and Experimental Applications:** |
|
This model is optimized for academic research and exploratory projects. It is ideal for investigating advanced fine-tuning methods and evaluating performance on task-oriented conversational scenarios. |
|
|
|
- **Controlled Environments:** |
|
Users should deploy this model only within controlled experimental frameworks where rigorous evaluation and proper safety guardrails are in place. |
|
|
|
--- |
|
|
|
## Limitations and Ethical Considerations |
|
|
|
- **Not for Clinical or Production Use:** |
|
The model’s outputs have not been validated for clinical accuracy or professional decision-making. It must not be used as a primary source for medical, legal, or safety-critical information. |
|
|
|
- **Safety and Guardrails:** |
|
All users must implement appropriate safety measures and validation protocols. The model may produce biased or inaccurate results and should be used with caution. |
|
|
|
- **Experimental Nature:** |
|
Given its research-oriented design, the model’s performance can vary widely based on input and context. It is essential to perform thorough testing and validation before drawing any conclusions from its outputs. |
|
|
|
--- |
|
|
|
## License |
|
|
|
This model is released under the Llama 3.2 license. Users must adhere to the terms specified in the license when utilizing this model. |
|
|
|
--- |
|
|
|
## Final Notice |
|
|
|
All outputs from **mkurman/Llama-3.2-MedIT-SUN-2.5B-BT-GPRO** are intended solely for research purposes. This model is not a comprehensive knowledge source and should not be used as a substitute for professional advice or decision-making. Ensure that all necessary guardrails and safety protocols are in place when conducting any experiments with this model. |