File size: 1,447 Bytes
e0058bf
e12d2f9
 
 
 
 
 
 
 
 
 
 
e0058bf
 
e12d2f9
e0058bf
e12d2f9
e0058bf
4280056
e0058bf
0ad3b0f
 
 
 
12d45b1
 
e12d2f9
e0058bf
e12d2f9
e0058bf
e12d2f9
 
 
 
e0058bf
2fc84c9
e0058bf
2fc84c9
e12d2f9
2fc84c9
e0058bf
e12d2f9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
license: apache-2.0
base_model: pszemraj/jamba-900M-v0.13-KIx2
tags:
- textbook
- '16384'
- long document
metrics:
- accuracy
language:
- en
inference: false
---

# BEE-spoke-data/Jamba-900M-doc-writer

> to test it out, try [this notebook](https://colab.research.google.com/gist/pszemraj/28985fdbbb2460f8375d2d84b8babe9a/jamba-test-sandbox.ipynb)

This model produces long, surprisingly coherent output that extends some input text; you can see an example [here](https://gist.github.com/pszemraj/b7c7ac65e56365cf5eab69622f16b356), which is a generated textbook about underwater city design.


![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/wWCnoAQ1NSoa3k4w3xvP9.png)


Thanks to the Jamba arch, it uses low VRAM while generating outputs: about 2.5 GB VRAM to generate 12,288 tokens.

## Model description

This model is a fine-tuned version of [pszemraj/jamba-900M-v0.13-KIx2](https://huggingface.co/pszemraj/jamba-900M-v0.13-KIx2) on some textbook data.

It achieves the following results on the evaluation set:
- Loss: 3.0200
- Accuracy: 0.4544
- Num Input Tokens Seen: 4940890112

## Intended Uses & Limitations

- Long context generation
- It requires a rather long prompt (aka 'Introduction') to be coaxed into consistently producing long, textbook-like text
- this model itself is small, so its reasoning, knowledge, etc. is limited, but still impressive for the size (hidden size 1024)

---