--- base_model: unsloth/mistral-small-24b-instruct-2501-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - mistral - trl - grpo license: apache-2.0 language: - en datasets: - qingy2024/NuminaMath-282k-GRPO --- # Step Recap 24B Instruct This model is created with GRPO (QLoRA) from Mistral Small 24B Instruct to follow the thinking pattern of step, recap, answer. **Example: Which is bigger? 9.9 or 9.11?** ```xml To determine which number is bigger between 9.11 and 9.9, we can compare them directly. I've identified the numbers to compare: 9.11 and 9.9. The next step is to compare these numbers to determine which is larger. Comparing the two numbers: - 9.11 has a tenths digit of 1 and a hundredths digit of 1. - 9.9 has a tenths digit of 9 and no hundredths digit (which is equivalent to 0 hundredths). Since 9 (tenths digit of 9.9) is greater than 1 (tenths digit of 9.11), 9.9 is larger than 9.11. I have compared the numbers 9.11 and 9.9 and determined that 9.9 is the larger number. There are no further steps needed to solve this problem. 9.9 ``` [](https://github.com/unslothai/unsloth)