Jamba-Nexus-4xMoE / README.md
Severian's picture
Upload JambaForCausalLM
31578d7 verified
|
raw
history blame
8.72 kB
metadata
language:
  - en
license: mit
tags:
  - text-generation-inference
  - transformers
  - unsloth
  - mixture of experts
  - jamba
datasets:
  - Severian/Internal-Knowledge-Map
base_model: ai21labs/Jamba-v0.1
pipeline_tag: text-generation

Jamba-Nexus-IKM-v1

This model has been trained for 6.3 Epochs (2hrs/~3700 Steps) using Unsloth on the Internal Knowledge Map dataset.


I haven't had the chance to truly test this model, so it could work, could not. Some outputs are fine others are wonky. Training new version right now.

Since this is a base model the IKM dataset greatly affects the output. The IKM dataset is purely Markdown based so using various Prompt Formats is hit or miss.

{System}
### Prompt:
{User}
### Response:

Inference

!pip install -qqq transformers>=4.39.0 mamba-ssm causal-conv1d>=1.2.0 accelerate bitsandbytes --progress-bar off
!pip install flash-attn --no-build-isolation

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# Load model in 4-bit precision
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    llm_int8_skip_modules=["mamba"]
)
model = AutoModelForCausalLM.from_pretrained(
    "Severian/Jamba-Nexus-IKM-v1",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained("Severian/Jamba-Nexus-IKM-v1")

# Tokenize input
prompt = """How could we use cheese to reignite the sun? Answer:"""
input_ids = tokenizer(
    prompt,
    return_tensors='pt'
).to(model.device)["input_ids"]

# Generate answer
outputs = model.generate(input_ids, max_new_tokens=216)

# Print output
print(tokenizer.batch_decode(outputs))
 [3731/5850 3:38:52 < 2:04:22, 0.28 it/s, Epoch 6.37/10]
Step	Training Loss
1	10.109800
2	9.924600
3	9.919700
4	9.919100
5	9.917400
6	9.895900
7	9.891700
8	9.893500
9	9.917200
10	9.918800
11	10.056100
12	9.916200
13	9.911200
14	9.884300
15	9.909800
16	9.883800
17	9.883800
18	9.878300
19	9.904400
20	9.976400
21	10.061600
22	10.063300
23	9.876200
24	9.890900
25	9.873100
26	9.893700
27	9.869400
28	9.867100
29	9.863400
30	9.910400
31	9.882300
32	9.884100
33	10.023100
34	9.883500
35	9.854800
36	9.847400
37	9.851400
38	9.879200
39	9.845300
40	9.845700
41	9.876800
42	9.844600
43	9.848000
44	9.851900
45	10.038100
46	9.865000
47	9.845400
48	9.838900
49	9.860100
50	9.842500
51	9.830200
52	10.144100
53	9.825600
54	9.832000
55	9.835000
56	9.850900
57	9.990500
58	10.020100
59	10.014500
60	9.849600
61	9.877500
62	9.819900
63	9.818800
64	9.987100
65	9.952300
66	9.861900
67	9.814100
68	9.840600
69	9.809600
70	9.809600
71	9.976200
72	9.810600
73	9.805900
74	9.829400
75	9.830300
76	9.831500
77	9.802800
78	9.798200
79	9.824900
80	9.795100
81	9.794400
82	9.801200
83	9.794000
84	9.820400
85	9.790100
86	9.840400
87	9.809500
88	9.860000
89	9.807000
90	9.948200
91	9.779500
92	9.781800
93	9.802700
94	9.827700
95	9.798000
96	9.825900
97	9.966000
98	9.773000
99	9.775400
100	9.764400
101	9.766000
102	9.817500
103	9.795200
104	9.757900
105	9.753000
106	9.758200
107	9.753000
108	9.751700
109	9.784200
110	9.749700
111	9.748200
112	9.746200
113	9.797200
114	9.747000
115	9.913200
116	9.739100
117	9.769800
118	9.764500
119	9.736900
120	9.760500
121	9.795500
122	9.935300
123	10.079200
124	9.727200
125	9.732400
126	9.755800
127	9.755500
128	9.758900
129	9.732800
130	9.749600
131	9.922100
132	9.719800
133	9.716600
134	9.721900
135	9.718100
136	9.746300
137	9.868900
138	9.740800
139	9.715600
140	9.711000
141	9.744000
142	9.705100
143	9.734300
144	9.881400
145	9.764000
146	9.699800
147	9.855700
148	9.705600
149	9.903000
150	9.697000
151	9.732500
152	9.695000
153	9.901200
154	9.865600
155	9.686900
156	9.890300
157	9.714300
158	9.683900
159	9.856900
160	10.032500
161	9.677200
162	9.683600
163	9.679800
164	9.670600
165	9.698900
166	9.763100
167	9.669600
168	9.713800
169	9.699100
170	9.869700
171	9.844000
172	9.697700
173	9.667200
174	9.692600
175	9.670400
176	9.664200
177	9.689400
178	9.667900
179	9.685200
180	9.664700
181	9.861600
182	9.653600
183	9.652500
184	9.652700
185	9.643500
186	9.675400
187	9.685200
188	9.648800
189	9.671700
190	9.656900
191	9.734500
192	9.637900
193	9.635800
194	9.681400
195	9.669400
196	9.635200
197	9.667900
198	9.662100
199	9.809700
200	9.627500
201	9.691600
202	9.657200
203	9.689900
204	9.633700
205	9.624900
206	9.621900
207	9.655200
208	9.620300
209	9.619600
210	9.616800
211	9.614600
212	9.646700
213	9.612400
214	9.676200
215	9.672100
216	9.788300
217	9.611000
218	9.613900
219	9.632700
220	9.785800
221	9.595400
222	9.599600
223	9.627600
224	9.631600
225	9.627400
226	9.637000
227	9.626000
228	9.600800
229	9.658900
230	9.584400
231	9.621600
232	9.583600
233	9.582800
234	9.613900
235	9.580700
236	9.580600
237	9.580800
238	9.581300
239	9.788600
240	9.574100
241	9.580500
242	9.783500
243	9.574300
244	9.785300
245	9.599800
246	9.565500
247	9.563900
248	9.592900
249	9.592700
250	9.592200
251	9.573000
252	9.769800
253	9.913400
254	9.553100
255	9.549500
256	9.616300
257	9.566200
258	9.766200
259	9.592900
260	9.547900
261	9.576800
262	9.543000
263	9.543600
264	9.978600
265	9.570100
266	9.570400
267	9.716600
268	9.529900
269	9.579200
270	9.545500
271	9.531600
272	9.555500
273	9.559900
274	9.524000
275	9.889300
276	9.553700
277	9.534400
278	9.566800
279	9.518700
280	9.510600
281	9.528800
282	9.545800
283	9.693700
284	9.507500
285	9.511300
286	9.500100

3509	6.093600
3510	6.874700
3511	6.239500
3512	6.262400
3513	6.262000
3514	6.093200
3515	6.095400
3516	6.429600
3517	6.090800
3518	6.548000
3519	6.237100
3520	6.237000
3521	6.088900
3522	6.279700
3523	7.310300
3524	6.695300
3525	6.243000
3526	6.087100
3527	6.697000
3528	6.412400
3529	6.087100
3530	6.087000
3531	6.227500
3532	6.085900
3533	6.376200
3534	6.231600
3535	6.080500
3536	6.079100
3537	6.082800
3538	6.535800
3539	6.082300
3540	6.081300
3541	6.080600
3542	6.437900
3543	6.071800
3544	6.072500
3545	6.078300
3546	6.076700
3547	6.226500
3548	6.081000
3549	6.071000
3550	6.066900
3551	6.370600
3552	6.077900
3553	6.854100
3554	6.077300
3555	6.265500
3556	6.065600
3557	6.389000
3558	6.072500
3559	6.522500
3560	6.072400
3561	6.216900
3562	6.213700
3563	6.067200
3564	6.696500
3565	6.237500
3566	6.935300
3567	6.213700
3568	6.236400
3569	6.061000
3570	7.399200
3571	6.249000
3572	6.235700
3573	6.059400
3574	6.238300
3575	6.058600
3576	6.064600
3577	6.063100
3578	6.220400
3579	6.071700
3580	6.249400
3581	6.708400
3582	6.060400
3583	6.062800
3584	6.358300
3585	6.057700
3586	6.053700
3587	6.251000
3588	6.513700
3589	6.208500
3590	7.053200
3591	6.048200
3592	6.230400
3593	6.201200
3594	7.549800
3595	6.058900
3596	6.207100
3597	6.206900
3598	6.042500
3599	6.189200
3600	6.354800
3601	6.219600
3602	6.238400
3603	6.206500
3604	7.172000
3605	6.040700
3606	6.215000
3607	6.216300
3608	6.045200
3609	7.134800
3610	6.230800
3611	6.037500
3612	6.499700
3613	6.791900
3614	6.034000
3615	6.957900
3616	6.180000
3617	6.041000
3618	6.642900
3619	6.651100
3620	6.225300
3621	6.034700
3622	6.510700
3623	6.227100
3624	6.208200
3625	6.336000
3626	6.027800
3627	6.489200
3628	6.591400
3629	6.030200
3630	6.796800
3631	6.027400
3632	6.374700
3633	6.032100
3634	6.025900
3635	6.369400
3636	6.634500
3637	6.481200
3638	6.220300
3639	6.217200
3640	6.025200
3641	6.016900
3642	6.491400
3643	6.025600
3644	6.483400
3645	6.478600
3646	6.387600
3647	6.168300
3648	6.654600
3649	6.809700
3650	6.193000
3651	6.194500
3652	6.349200
3653	6.172500
3654	6.174200
3655	6.014800
3656	6.626400
3657	6.011500
3658	6.162000
3659	6.504300
3660	7.084900
3661	6.622300
3662	6.470700
3663	6.011600
3664	6.188300
3665	6.198700
3666	6.009900
3667	6.644700
3668	6.185000
3669	6.008600
3670	6.005900
3671	6.009200
3672	6.614900
3673	6.198300
3674	6.933100
3675	6.171800
3676	6.147500
3677	6.464300
3678	6.009500
3679	6.371400
3680	6.162100
3681	5.998900
3682	6.645100
3683	6.192900
3684	6.813800
3685	6.331100
3686	6.832200
3687	6.480900
3688	5.993200
3689	6.156100
3690	6.172600
3691	6.185400
3692	5.999600
3693	6.151900
3694	6.187100
3695	6.459900
3696	5.993100
3697	5.989900
3698	6.348300
3699	5.992500
3700	5.995900
3701	5.994900
3702	5.984900
3703	6.161600
3704	6.170100
3705	6.507000
3706	5.989200
3707	6.138800
3708	6.890600
3709	5.984500
3710	6.157900
3711	5.991600
3712	5.992200
3713	6.135400
3714	6.133900
3715	6.164000
3716	5.988100
3717	6.351000
3718	5.981300
3719	5.981000
3720	7.087300
3721	6.135400
3722	6.280900
3723	5.982800
3724	5.983800
3725	6.350100
3726	6.618500
3727	6.600100
3728	6.440600
3729	5.973800