File size: 6,351 Bytes
6501fae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7fbb961
 
 
 
 
 
 
 
 
74e98a0
 
 
 
 
 
 
 
 
 
 
6501fae
 
 
 
9aa780b
6501fae
 
 
 
 
 
 
2b4c079
 
 
 
 
 
 
 
 
 
 
 
 
e88f23b
6501fae
 
 
 
 
 
 
9aa780b
 
 
 
 
 
 
 
6501fae
 
9aa780b
 
 
 
 
 
 
 
 
 
 
6501fae
 
9aa780b
 
 
 
 
 
 
 
 
 
6501fae
 
9aa780b
 
 
 
 
 
 
 
 
6501fae
 
9aa780b
 
 
 
 
 
 
 
 
 
 
6501fae
 
9aa780b
 
 
 
 
 
 
 
 
6501fae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
license: apache-2.0
tags:
- moe
- frankenmoe
- merge
- mergekit
- lazymergekit
- Locutusque/TinyMistral-248M-v2
- Locutusque/TinyMistral-248M-v2.5
- Locutusque/TinyMistral-248M-v2.5-Instruct
- jtatman/tinymistral-v2-pycoder-instruct-248m
- Felladrin/TinyMistral-248M-SFT-v4
- Locutusque/TinyMistral-248M-v2-Instruct
base_model:
- Locutusque/TinyMistral-248M-v2
- Locutusque/TinyMistral-248M-v2.5
- Locutusque/TinyMistral-248M-v2.5-Instruct
- jtatman/tinymistral-v2-pycoder-instruct-248m
- Felladrin/TinyMistral-248M-SFT-v4
- Locutusque/TinyMistral-248M-v2-Instruct
inference:
  parameters:
    do_sample: true
    temperature: 0.2
    top_p: 0.14
    top_k: 12
    max_new_tokens: 250
    repetition_penalty: 1.15
widget:
- text: |
    <|im_start|>user
    Write me a Python program that calculates the factorial of n. <|im_end|>
    <|im_start|>assistant
- text: >-
    An emerging clinical approach to treat substance abuse disorders involves a
    form of cognitive-behavioral therapy whereby addicts learn to reduce their
    reactivity to drug-paired stimuli through cue-exposure or extinction
    training. It is, however,
datasets:
- nampdn-ai/mini-peS2o
---

# TinyMistral-6x248M

TinyMistral-6x248M is a Mixure of Experts (MoE) made with the following models using [LazyMergekit](https://colab.research.google.com/drive/1obulZ1ROXHjYLn6PPZJwRR6GzgQogxxb?usp=sharing):
* [Locutusque/TinyMistral-248M-v2](https://huggingface.co/Locutusque/TinyMistral-248M-v2)
* [Locutusque/TinyMistral-248M-v2.5](https://huggingface.co/Locutusque/TinyMistral-248M-v2.5)
* [Locutusque/TinyMistral-248M-v2.5-Instruct](https://huggingface.co/Locutusque/TinyMistral-248M-v2.5-Instruct)
* [jtatman/tinymistral-v2-pycoder-instruct-248m](https://huggingface.co/jtatman/tinymistral-v2-pycoder-instruct-248m)
* [Felladrin/TinyMistral-248M-SFT-v4](https://huggingface.co/Felladrin/TinyMistral-248M-SFT-v4)
* [Locutusque/TinyMistral-248M-v2-Instruct](https://huggingface.co/Locutusque/TinyMistral-248M-v2-Instruct)

The resulting model is then pre-trained on 600,000 examples of nampdn-ai/mini-peS2o.

We don't recommend using the Inference API as the model has serious performance degradation.

### Recommended inference parameters

```
do_sample: true
temperature: 0.2
top_p: 0.14
top_k: 12
repetition_penalty: 1.15
```

## 🧩 Configuration

```yaml
base_model: Locutusque/TinyMistral-248M-v2.5
experts:
  - source_model: Locutusque/TinyMistral-248M-v2
    positive_prompts:
      - "An emerging trend in global economics is"
      - "TITLE: The Next Generation of Internet Connectivity"
      - "begin a comprehensive analysis on the sociopolitical effects of"
    negative_prompts:
      - "Code a simple"
      - "Explain the Krebs cycle in detail"
      - "Compose a sonnet about"

  - source_model: Locutusque/TinyMistral-248M-v2.5
    positive_prompts:
      - "Advanced C++ memory management techniques"
      - "C# asynchronous programming best practices"
      - "AI's role in predictive analytics"
      - "textbook review on machine learning algorithms"
      - "## Exercise: Design a C# interface for a CRM system"
      - "## Solution: Optimize an AI-powered recommendation engine"
    negative_prompts:
      - "Narrate the story of"
      - "The ethical considerations in"
      - "Review the latest art exhibition by"
  
  - source_model: Locutusque/TinyMistral-248M-v2.5-Instruct
    positive_prompts:
      - "What is the chemical formula for photosynthesis?"
      - "Identification of a new mineral found on Mars"
      - "physics: Explaining the concept of relativity"
      - "Solve for x using differential equations:"
      - "history: Analyze the causes of the French Revolution"
    negative_prompts:
      - "Devise a business plan for"
      - "The evolution of culinary arts"
      - "Orchestrate a piece for a string quartet"
  
  - source_model: jtatman/tinymistral-v2-pycoder-instruct-248m
    positive_prompts:
      - "Write a Python program for facial recognition"
      - "Explain dynamic typing in programming languages"
      - "algorithm development for efficient data sorting"
    negative_prompts:
      - "Who was the first Emperor of Rome?"
      - "Discuss the political dynamics in"
      - "Provide a proof for Fermat's Last Theorem"
      - "physics: The principles of thermodynamics"
  
  - source_model: Felladrin/TinyMistral-248M-SFT-v4
    positive_prompts:
      - "Escreba sobre a influência da música no Brasil"
      - "Voici un guide pour les voyageurs en France"
      - "Para entender la política de México, se debe considerar"
      - "Cuales son los efectos de la globalización en Argentina"
      - "Welche gesellschaftlichen Veränderungen gibt es in Deutschland"
      - "If you had to imagine a utopian city, what would be its core values?"
    negative_prompts:
      - "Calculate the integral of"
      - "Describe the process of cell division"
      - "Review the latest advancements in quantum computing"

  - source_model: Locutusque/TinyMistral-248M-v2-Instruct
    positive_prompts:
      - "Write an essay on the evolution of international trade laws"
      - "What are the key components of a sustainable urban ecosystem?"
      - "instruct on effective negotiation techniques in diplomacy"
      - "How does cognitive bias affect decision making in high-pressure environments?"
      - "Identify the architectural significance of the Sydney Opera House"
    negative_prompts:
      - "Develop a script to automate"
      - "Understanding inheritance in object-oriented programming"
      - "philosophy of existentialism in contemporary society"
```

## 💻 Usage

```python
!pip install -qU transformers bitsandbytes accelerate

from transformers import AutoTokenizer
import transformers
import torch

model = "M4-ai/TinyMistral-6x248M"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    model_kwargs={"torch_dtype": torch.float16, "load_in_4bit": True},
)

messages = [{"role": "user", "content": "Explain what a Mixture of Experts is in less than 100 words."}]
prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipeline(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```