|
--- |
|
license: apache-2.0 |
|
base_model: pszemraj/jamba-900M-v0.13-KIx2 |
|
tags: |
|
- textbook |
|
- '16384' |
|
- long document |
|
metrics: |
|
- accuracy |
|
language: |
|
- en |
|
inference: false |
|
--- |
|
|
|
# BEE-spoke-data/Jamba-900M-doc-writer |
|
|
|
> to test it out, try [this notebook](https://colab.research.google.com/gist/pszemraj/28985fdbbb2460f8375d2d84b8babe9a/jamba-test-sandbox.ipynb) |
|
|
|
This model produces long, surprisingly coherent output that extends some input text; you can see an example [here](https://gist.github.com/pszemraj/b7c7ac65e56365cf5eab69622f16b356), which is a generated textbook about underwater city design. |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/60bccec062080d33f875cd0c/wWCnoAQ1NSoa3k4w3xvP9.png) |
|
|
|
|
|
Thanks to the Jamba arch, it uses low VRAM while generating outputs: about 2.5 GB VRAM to generate 12,288 tokens. |
|
|
|
## Model description |
|
|
|
This model is a fine-tuned version of [pszemraj/jamba-900M-v0.13-KIx2](https://huggingface.co/pszemraj/jamba-900M-v0.13-KIx2) on some textbook data. |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 3.0200 |
|
- Accuracy: 0.4544 |
|
- Num Input Tokens Seen: 4940890112 |
|
|
|
## Intended Uses & Limitations |
|
|
|
- Long context generation |
|
- It requires a rather long prompt (aka 'Introduction') to be coaxed into consistently producing long, textbook-like text |
|
- this model itself is small, so its reasoning, knowledge, etc. is limited, but still impressive for the size (hidden size 1024) |
|
|
|
--- |