theuerc commited on
Commit
063f41d
·
verified ·
1 Parent(s): 026c73e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md CHANGED
@@ -1,3 +1,32 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
  ---
6
+ <div align="center">
7
+
8
+ # TinyLlama-1.1B
9
+ </div>
10
+
11
+ We used this version of TinyLlama as a base model:
12
+ https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0
13
+
14
+ The goal was to improve performance on basic algebra (i.e. solving systems of linear equations).
15
+
16
+ The base model was fine tuned on 8k rows synthetic solution data generated by [OpenMath-Mistral-7B-v0.1-hf](https://huggingface.co/nvidia/OpenMath-Mistral-7B-v0.1-hf) on [ALG-514](https://paperswithcode.com/sota/math-word-problem-solving-on-alg514).
17
+
18
+ We used the [NeMo Skills](https://github.com/Kipok/NeMo-Skills) pipeline for inference with code execution and generating the synthetic data. HuggingFace's SFTTrainer was used for fine tuning, as the NeMo Skills pipeline is a buggy mess. It took 30 minutes to fine tune on an RTX3090.
19
+
20
+
21
+ Notes from previous model cards:
22
+ > We adopted exactly the same architecture and tokenizer as Llama 2. This means TinyLlama can be plugged and played in many open-source projects built upon Llama. Besides, TinyLlama is compact with only 1.1B parameters. This compactness allows it to cater to a multitude of applications demanding a restricted computation and memory footprint.
23
+
24
+ #### Eval
25
+
26
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64388bdd43d932c4623e4983/H07dGzwOfzcvP1GFA1GUq.png)
27
+
28
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64388bdd43d932c4623e4983/Qr7rvIms3AL67jltHBXnr.png)
29
+
30
+ Note that checkpoint-0 is the base model and checkpoint-mistral is OpenMath-Mistral-7B-v0.1-hf.
31
+
32
+ The performance is _not good_™, but this model could be used to quickly generate synthetic data, as the coverage is decent. The uploaded model is checkpoint-2.6k.