|
--- |
|
language: |
|
- en |
|
license: mit |
|
tags: |
|
- text-generation-inference |
|
- transformers |
|
- unsloth |
|
- mixture of experts |
|
- jamba |
|
datasets: |
|
- Severian/Internal-Knowledge-Map |
|
base_model: ai21labs/Jamba-v0.1 |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/64740cf7485a7c8e1bd51ac9/SwdXRoyi08neRiI8pJrYI.webp" width="500" height="500"> |
|
|
|
# Jamba-Nexus-IKM-v1 |
|
|
|
## This model has been trained for 6.3 Epochs (2hrs/~3700 Steps) using Unsloth on the Internal Knowledge Map dataset. |
|
|
|
--- |
|
### *I haven't had the chance to truly test this model, so it could work, could not. Some outputs are fine others are wonky. Training new version right now.* |
|
|
|
Since this is a base model the IKM dataset greatly affects the output. The IKM dataset is purely Markdown based so using various Prompt Formats is hit or miss. |
|
|
|
``` |
|
{System} |
|
### Prompt: |
|
{User} |
|
### Response: |
|
|
|
``` |
|
--- |
|
|
|
## Inference |
|
|
|
```py |
|
!pip install -qqq transformers>=4.39.0 mamba-ssm causal-conv1d>=1.2.0 accelerate bitsandbytes --progress-bar off |
|
!pip install flash-attn --no-build-isolation |
|
|
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
|
|
# Load model in 4-bit precision |
|
quantization_config = BitsAndBytesConfig( |
|
load_in_4bit=True, |
|
llm_int8_skip_modules=["mamba"] |
|
) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"Severian/Jamba-Nexus-IKM-v1", |
|
trust_remote_code=True, |
|
torch_dtype=torch.bfloat16, |
|
attn_implementation="flash_attention_2", |
|
quantization_config=quantization_config |
|
) |
|
tokenizer = AutoTokenizer.from_pretrained("Severian/Jamba-Nexus-IKM-v1") |
|
|
|
# Tokenize input |
|
prompt = """How could we use cheese to reignite the sun? Answer:""" |
|
input_ids = tokenizer( |
|
prompt, |
|
return_tensors='pt' |
|
).to(model.device)["input_ids"] |
|
|
|
# Generate answer |
|
outputs = model.generate(input_ids, max_new_tokens=216) |
|
|
|
# Print output |
|
print(tokenizer.batch_decode(outputs)) |
|
``` |
|
|
|
``` |
|
[3731/5850 3:38:52 < 2:04:22, 0.28 it/s, Epoch 6.37/10] |
|
Step Training Loss |
|
1 10.109800 |
|
2 9.924600 |
|
3 9.919700 |
|
4 9.919100 |
|
5 9.917400 |
|
6 9.895900 |
|
7 9.891700 |
|
8 9.893500 |
|
9 9.917200 |
|
10 9.918800 |
|
11 10.056100 |
|
12 9.916200 |
|
13 9.911200 |
|
14 9.884300 |
|
15 9.909800 |
|
16 9.883800 |
|
17 9.883800 |
|
18 9.878300 |
|
19 9.904400 |
|
20 9.976400 |
|
21 10.061600 |
|
22 10.063300 |
|
23 9.876200 |
|
24 9.890900 |
|
25 9.873100 |
|
26 9.893700 |
|
27 9.869400 |
|
28 9.867100 |
|
29 9.863400 |
|
30 9.910400 |
|
31 9.882300 |
|
32 9.884100 |
|
33 10.023100 |
|
34 9.883500 |
|
35 9.854800 |
|
36 9.847400 |
|
37 9.851400 |
|
38 9.879200 |
|
39 9.845300 |
|
40 9.845700 |
|
41 9.876800 |
|
42 9.844600 |
|
43 9.848000 |
|
44 9.851900 |
|
45 10.038100 |
|
46 9.865000 |
|
47 9.845400 |
|
48 9.838900 |
|
49 9.860100 |
|
50 9.842500 |
|
51 9.830200 |
|
52 10.144100 |
|
53 9.825600 |
|
54 9.832000 |
|
55 9.835000 |
|
56 9.850900 |
|
57 9.990500 |
|
58 10.020100 |
|
59 10.014500 |
|
60 9.849600 |
|
61 9.877500 |
|
62 9.819900 |
|
63 9.818800 |
|
64 9.987100 |
|
65 9.952300 |
|
66 9.861900 |
|
67 9.814100 |
|
68 9.840600 |
|
69 9.809600 |
|
70 9.809600 |
|
71 9.976200 |
|
72 9.810600 |
|
73 9.805900 |
|
74 9.829400 |
|
75 9.830300 |
|
76 9.831500 |
|
77 9.802800 |
|
78 9.798200 |
|
79 9.824900 |
|
80 9.795100 |
|
81 9.794400 |
|
82 9.801200 |
|
83 9.794000 |
|
84 9.820400 |
|
85 9.790100 |
|
86 9.840400 |
|
87 9.809500 |
|
88 9.860000 |
|
89 9.807000 |
|
90 9.948200 |
|
91 9.779500 |
|
92 9.781800 |
|
93 9.802700 |
|
94 9.827700 |
|
95 9.798000 |
|
96 9.825900 |
|
97 9.966000 |
|
98 9.773000 |
|
99 9.775400 |
|
100 9.764400 |
|
101 9.766000 |
|
102 9.817500 |
|
103 9.795200 |
|
104 9.757900 |
|
105 9.753000 |
|
106 9.758200 |
|
107 9.753000 |
|
108 9.751700 |
|
109 9.784200 |
|
110 9.749700 |
|
111 9.748200 |
|
112 9.746200 |
|
113 9.797200 |
|
114 9.747000 |
|
115 9.913200 |
|
116 9.739100 |
|
117 9.769800 |
|
118 9.764500 |
|
119 9.736900 |
|
120 9.760500 |
|
121 9.795500 |
|
122 9.935300 |
|
123 10.079200 |
|
124 9.727200 |
|
125 9.732400 |
|
126 9.755800 |
|
127 9.755500 |
|
128 9.758900 |
|
129 9.732800 |
|
130 9.749600 |
|
131 9.922100 |
|
132 9.719800 |
|
133 9.716600 |
|
134 9.721900 |
|
135 9.718100 |
|
136 9.746300 |
|
137 9.868900 |
|
138 9.740800 |
|
139 9.715600 |
|
140 9.711000 |
|
141 9.744000 |
|
142 9.705100 |
|
143 9.734300 |
|
144 9.881400 |
|
145 9.764000 |
|
146 9.699800 |
|
147 9.855700 |
|
148 9.705600 |
|
149 9.903000 |
|
150 9.697000 |
|
151 9.732500 |
|
152 9.695000 |
|
153 9.901200 |
|
154 9.865600 |
|
155 9.686900 |
|
156 9.890300 |
|
157 9.714300 |
|
158 9.683900 |
|
159 9.856900 |
|
160 10.032500 |
|
161 9.677200 |
|
162 9.683600 |
|
163 9.679800 |
|
164 9.670600 |
|
165 9.698900 |
|
166 9.763100 |
|
167 9.669600 |
|
168 9.713800 |
|
169 9.699100 |
|
170 9.869700 |
|
171 9.844000 |
|
172 9.697700 |
|
173 9.667200 |
|
174 9.692600 |
|
175 9.670400 |
|
176 9.664200 |
|
177 9.689400 |
|
178 9.667900 |
|
179 9.685200 |
|
180 9.664700 |
|
181 9.861600 |
|
182 9.653600 |
|
183 9.652500 |
|
184 9.652700 |
|
185 9.643500 |
|
186 9.675400 |
|
187 9.685200 |
|
188 9.648800 |
|
189 9.671700 |
|
190 9.656900 |
|
191 9.734500 |
|
192 9.637900 |
|
193 9.635800 |
|
194 9.681400 |
|
195 9.669400 |
|
196 9.635200 |
|
197 9.667900 |
|
198 9.662100 |
|
199 9.809700 |
|
200 9.627500 |
|
201 9.691600 |
|
202 9.657200 |
|
203 9.689900 |
|
204 9.633700 |
|
205 9.624900 |
|
206 9.621900 |
|
207 9.655200 |
|
208 9.620300 |
|
209 9.619600 |
|
210 9.616800 |
|
211 9.614600 |
|
212 9.646700 |
|
213 9.612400 |
|
214 9.676200 |
|
215 9.672100 |
|
216 9.788300 |
|
217 9.611000 |
|
218 9.613900 |
|
219 9.632700 |
|
220 9.785800 |
|
221 9.595400 |
|
222 9.599600 |
|
223 9.627600 |
|
224 9.631600 |
|
225 9.627400 |
|
226 9.637000 |
|
227 9.626000 |
|
228 9.600800 |
|
229 9.658900 |
|
230 9.584400 |
|
231 9.621600 |
|
232 9.583600 |
|
233 9.582800 |
|
234 9.613900 |
|
235 9.580700 |
|
236 9.580600 |
|
237 9.580800 |
|
238 9.581300 |
|
239 9.788600 |
|
240 9.574100 |
|
241 9.580500 |
|
242 9.783500 |
|
243 9.574300 |
|
244 9.785300 |
|
245 9.599800 |
|
246 9.565500 |
|
247 9.563900 |
|
248 9.592900 |
|
249 9.592700 |
|
250 9.592200 |
|
251 9.573000 |
|
252 9.769800 |
|
253 9.913400 |
|
254 9.553100 |
|
255 9.549500 |
|
256 9.616300 |
|
257 9.566200 |
|
258 9.766200 |
|
259 9.592900 |
|
260 9.547900 |
|
261 9.576800 |
|
262 9.543000 |
|
263 9.543600 |
|
264 9.978600 |
|
265 9.570100 |
|
266 9.570400 |
|
267 9.716600 |
|
268 9.529900 |
|
269 9.579200 |
|
270 9.545500 |
|
271 9.531600 |
|
272 9.555500 |
|
273 9.559900 |
|
274 9.524000 |
|
275 9.889300 |
|
276 9.553700 |
|
277 9.534400 |
|
278 9.566800 |
|
279 9.518700 |
|
280 9.510600 |
|
281 9.528800 |
|
282 9.545800 |
|
283 9.693700 |
|
284 9.507500 |
|
285 9.511300 |
|
286 9.500100 |
|
|
|
3509 6.093600 |
|
3510 6.874700 |
|
3511 6.239500 |
|
3512 6.262400 |
|
3513 6.262000 |
|
3514 6.093200 |
|
3515 6.095400 |
|
3516 6.429600 |
|
3517 6.090800 |
|
3518 6.548000 |
|
3519 6.237100 |
|
3520 6.237000 |
|
3521 6.088900 |
|
3522 6.279700 |
|
3523 7.310300 |
|
3524 6.695300 |
|
3525 6.243000 |
|
3526 6.087100 |
|
3527 6.697000 |
|
3528 6.412400 |
|
3529 6.087100 |
|
3530 6.087000 |
|
3531 6.227500 |
|
3532 6.085900 |
|
3533 6.376200 |
|
3534 6.231600 |
|
3535 6.080500 |
|
3536 6.079100 |
|
3537 6.082800 |
|
3538 6.535800 |
|
3539 6.082300 |
|
3540 6.081300 |
|
3541 6.080600 |
|
3542 6.437900 |
|
3543 6.071800 |
|
3544 6.072500 |
|
3545 6.078300 |
|
3546 6.076700 |
|
3547 6.226500 |
|
3548 6.081000 |
|
3549 6.071000 |
|
3550 6.066900 |
|
3551 6.370600 |
|
3552 6.077900 |
|
3553 6.854100 |
|
3554 6.077300 |
|
3555 6.265500 |
|
3556 6.065600 |
|
3557 6.389000 |
|
3558 6.072500 |
|
3559 6.522500 |
|
3560 6.072400 |
|
3561 6.216900 |
|
3562 6.213700 |
|
3563 6.067200 |
|
3564 6.696500 |
|
3565 6.237500 |
|
3566 6.935300 |
|
3567 6.213700 |
|
3568 6.236400 |
|
3569 6.061000 |
|
3570 7.399200 |
|
3571 6.249000 |
|
3572 6.235700 |
|
3573 6.059400 |
|
3574 6.238300 |
|
3575 6.058600 |
|
3576 6.064600 |
|
3577 6.063100 |
|
3578 6.220400 |
|
3579 6.071700 |
|
3580 6.249400 |
|
3581 6.708400 |
|
3582 6.060400 |
|
3583 6.062800 |
|
3584 6.358300 |
|
3585 6.057700 |
|
3586 6.053700 |
|
3587 6.251000 |
|
3588 6.513700 |
|
3589 6.208500 |
|
3590 7.053200 |
|
3591 6.048200 |
|
3592 6.230400 |
|
3593 6.201200 |
|
3594 7.549800 |
|
3595 6.058900 |
|
3596 6.207100 |
|
3597 6.206900 |
|
3598 6.042500 |
|
3599 6.189200 |
|
3600 6.354800 |
|
3601 6.219600 |
|
3602 6.238400 |
|
3603 6.206500 |
|
3604 7.172000 |
|
3605 6.040700 |
|
3606 6.215000 |
|
3607 6.216300 |
|
3608 6.045200 |
|
3609 7.134800 |
|
3610 6.230800 |
|
3611 6.037500 |
|
3612 6.499700 |
|
3613 6.791900 |
|
3614 6.034000 |
|
3615 6.957900 |
|
3616 6.180000 |
|
3617 6.041000 |
|
3618 6.642900 |
|
3619 6.651100 |
|
3620 6.225300 |
|
3621 6.034700 |
|
3622 6.510700 |
|
3623 6.227100 |
|
3624 6.208200 |
|
3625 6.336000 |
|
3626 6.027800 |
|
3627 6.489200 |
|
3628 6.591400 |
|
3629 6.030200 |
|
3630 6.796800 |
|
3631 6.027400 |
|
3632 6.374700 |
|
3633 6.032100 |
|
3634 6.025900 |
|
3635 6.369400 |
|
3636 6.634500 |
|
3637 6.481200 |
|
3638 6.220300 |
|
3639 6.217200 |
|
3640 6.025200 |
|
3641 6.016900 |
|
3642 6.491400 |
|
3643 6.025600 |
|
3644 6.483400 |
|
3645 6.478600 |
|
3646 6.387600 |
|
3647 6.168300 |
|
3648 6.654600 |
|
3649 6.809700 |
|
3650 6.193000 |
|
3651 6.194500 |
|
3652 6.349200 |
|
3653 6.172500 |
|
3654 6.174200 |
|
3655 6.014800 |
|
3656 6.626400 |
|
3657 6.011500 |
|
3658 6.162000 |
|
3659 6.504300 |
|
3660 7.084900 |
|
3661 6.622300 |
|
3662 6.470700 |
|
3663 6.011600 |
|
3664 6.188300 |
|
3665 6.198700 |
|
3666 6.009900 |
|
3667 6.644700 |
|
3668 6.185000 |
|
3669 6.008600 |
|
3670 6.005900 |
|
3671 6.009200 |
|
3672 6.614900 |
|
3673 6.198300 |
|
3674 6.933100 |
|
3675 6.171800 |
|
3676 6.147500 |
|
3677 6.464300 |
|
3678 6.009500 |
|
3679 6.371400 |
|
3680 6.162100 |
|
3681 5.998900 |
|
3682 6.645100 |
|
3683 6.192900 |
|
3684 6.813800 |
|
3685 6.331100 |
|
3686 6.832200 |
|
3687 6.480900 |
|
3688 5.993200 |
|
3689 6.156100 |
|
3690 6.172600 |
|
3691 6.185400 |
|
3692 5.999600 |
|
3693 6.151900 |
|
3694 6.187100 |
|
3695 6.459900 |
|
3696 5.993100 |
|
3697 5.989900 |
|
3698 6.348300 |
|
3699 5.992500 |
|
3700 5.995900 |
|
3701 5.994900 |
|
3702 5.984900 |
|
3703 6.161600 |
|
3704 6.170100 |
|
3705 6.507000 |
|
3706 5.989200 |
|
3707 6.138800 |
|
3708 6.890600 |
|
3709 5.984500 |
|
3710 6.157900 |
|
3711 5.991600 |
|
3712 5.992200 |
|
3713 6.135400 |
|
3714 6.133900 |
|
3715 6.164000 |
|
3716 5.988100 |
|
3717 6.351000 |
|
3718 5.981300 |
|
3719 5.981000 |
|
3720 7.087300 |
|
3721 6.135400 |
|
3722 6.280900 |
|
3723 5.982800 |
|
3724 5.983800 |
|
3725 6.350100 |
|
3726 6.618500 |
|
3727 6.600100 |
|
3728 6.440600 |
|
3729 5.973800 |
|
``` |