ETH-HELIOS-AI commited on
Commit
21c3e0b
·
verified ·
1 Parent(s): 280c8aa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -3
README.md CHANGED
@@ -1,3 +1,50 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # helios-314b-alpha
2
+
3
+ This repository contains JAX example code for loading and running the Helios-314B-Alpha open-weights model.
4
+
5
+ The Helios-314B-Alpha model is a trained version of the Grok-V1 open source model released by TwitterAI.
6
+ We have fine-tuned the model to perform on crypto-related queries.
7
+
8
+ Make sure to download the checkpoint and place the `ckpt-0` directory in `checkpoints` - see [Downloading the weights](#downloading-the-weights)
9
+
10
+ Then, run
11
+
12
+ ```shell
13
+ pip install -r requirements.txt
14
+ python run.py
15
+ ```
16
+
17
+ to test the code.
18
+
19
+ The script loads the checkpoint and samples from the model on a test input.
20
+
21
+ Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code.
22
+ The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model.
23
+
24
+ # Model Specifications
25
+
26
+ Helios is currently designed with the following specifications:
27
+
28
+ - **Parameters:** 314B
29
+ - **Architecture:** Mixture of 8 Experts (MoE)
30
+ - **Experts Utilization:** 2 experts used per token
31
+ - **Layers:** 64
32
+ - **Attention Heads:** 48 for queries, 8 for keys/values
33
+ - **Embedding Size:** 6,144
34
+ - **Tokenization:** SentencePiece tokenizer with 131,072 tokens
35
+ - **Additional Features:**
36
+ - Rotary embeddings (RoPE)
37
+ - Supports activation sharding and 8-bit quantization
38
+ - **Maximum Sequence Length (context):** 8,192 tokens
39
+
40
+ # Downloading the weights
41
+
42
+ ```
43
+ git clone https://github.com/ETH-HELIOS-AI/helios-314b-alpha && cd helios
44
+ pip install huggingface_hub[hf_transfer]
45
+ huggingface-cli download ETH-HELIOS-AI/helios-314b-alpha --repo-type model --include ckpt-0/* --local-dir checkpoints --local-dir-use-symlinks False
46
+ ```
47
+
48
+ # License
49
+
50
+ The code and weights for the Helios-314B-Alpha model are licensed under the apache-2.0 open source license