--- license: apache-2.0 tags: - merge - mergekit - lazymergekit - gordicaleksa/YugoGPT - mlabonne/AlphaMonarch-7B model-index: - name: Tito-7B-slerp results: - task: type: text-generation name: Text Generation dataset: name: AI2 Reasoning Challenge (25-Shot) type: ai2_arc config: ARC-Challenge split: test args: num_few_shot: 25 metrics: - type: acc_norm value: 68.09 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: HellaSwag (10-Shot) type: hellaswag split: validation args: num_few_shot: 10 metrics: - type: acc_norm value: 86.38 name: normalized accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU (5-Shot) type: cais/mmlu config: all split: test args: num_few_shot: 5 metrics: - type: acc value: 64.01 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: TruthfulQA (0-shot) type: truthful_qa config: multiple_choice split: validation args: num_few_shot: 0 metrics: - type: mc2 value: 57.01 source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: Winogrande (5-shot) type: winogrande config: winogrande_xl split: validation args: num_few_shot: 5 metrics: - type: acc value: 81.69 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GSM8k (5-shot) type: gsm8k config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 63.61 name: accuracy source: url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Stopwolf/Tito-7B-slerp name: Open LLM Leaderboard --- # Tito-7B-slerp Tito-7B-slerp is a merge of the following models using [mergekit](https://github.com/cg123/mergekit): * [gordicaleksa/YugoGPT](https://huggingface.co/gordicaleksa/YugoGPT) * [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B) ## 🧩 Configuration ```yaml slices: - sources: - model: gordicaleksa/YugoGPT layer_range: [0, 32] - model: mlabonne/AlphaMonarch-7B layer_range: [0, 32] merge_method: slerp base_model: mlabonne/AlphaMonarch-7B parameters: t: - filter: self_attn value: [0, 0.5, 0.3, 0.7, 1] - filter: mlp value: [1, 0.5, 0.7, 0.3, 0] - value: 0.6 dtype: bfloat16 ``` ## Results Evaluations on Serbian LLM eval suite (or rather, performance and knowledge of Serbian): | | ARC-E | ARC-C | Hellaswag | BoolQ | Winogrande | OpenbookQA | PiQA | NQ Open | TriviaQA | Avg. | |-----------|-------|-------|-----------|-------|------------|------------|-------|---------|----------|-------| | [Zamfir-7B](https://huggingface.co/Stopwolf/Zamfir-7B-slerp) | 51.85 | 32.25 | 46.03 | 75.59 | 62.59 | 26.00 | 66.81 | 16.09 | 36.11 | 45.92 | | [Mustra-7B](https://huggingface.co/Stopwolf/Mustra-7B-Instruct-v0.1) | 52.95 | 33.70 | 45.89 | **77.55** | 64.17 | **30.60** | 67.25 | 15.40 | 34.84 | 46.93 | | [Tito-7B](https://huggingface.co/Stopwolf/Tito-7B-slerp) | 55.43 | **34.73** | 48.19 | 77.37 | **65.27** | 30.00 | 67.30 | **16.7** | 35.38 | **47.82** | | [YugoGPT](https://huggingface.co/gordicaleksa/YugoGPT) | **57.79** | **34.73** | **49.89** | 69.45 | 64.56 | 28.20 | **72.03** | 15.82 | **36.14** | 47.62 | Here, all benchmarks were done 0-shot, on the exception of NQ Open and TriviaQA which were done in 5-shot manner, in order to be comparable to Mistral paper. If we try to replicate OpenLLM Leaderboard results on available Serbian datasets (running an appropriate amount of shots instead of 0), we get: | | ARC | Hellaswag | Winogrande | TruthfulQA | Avg. | |---------|-------|-----------|------------|------------|-------| | Tito-7B | 47.27 | - | 69.93 | **57.48** | 58.23 | | [Perucac-7B](https://huggingface.co/Stopwolf/Perucac-7B-slerp) | **49.74** | - | **71.98** | 56.03 | **59.25** | | YugoGPT | 44.03 | - | 70.64 | 48.06 | 54.24 | | Llama3-8B | 42.24 | - | 61.25 | 51.08 | 51.52 | | SambaLingo | 37.88 | - | 61.48 | 47.23 | 48.86 | Note that YugoGPT, Llama3 and SambaLingo are all base models, unlike Tito and Perucac. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Stopwolf__Tito-7B-slerp) | Metric |Tito | YugoGPT | |---------------------------------|----:|--------:| |Avg. |70.13| 57.34 | |AI2 Reasoning Challenge (25-Shot)|68.09| 58.10 | |HellaSwag (10-Shot) |86.38| 81.44 | |MMLU (5-Shot) |64.01| 60.68 | |TruthfulQA (0-shot) |57.01| 36.60 | |Winogrande (5-shot) |81.69| 76.56 | |GSM8k (5-shot) |63.61| 30.70 |