|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# BitLLama Micro (Experimental + untrained) |
|
This model contains the modeling code for the 1.58-bit Llama Model following the reference paper: https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf |
|
|
|
For more details see: https://github.com/bjoernpl/bitllama |
|
|
|
The model was initialized with the following config: |
|
|
|
```python |
|
from transformers.models.bitllama import BitLlamaForCausalLM, LlamaConfig |
|
|
|
model_config = LlamaConfig( |
|
bos_token_id=1, |
|
eos_token_id=2, |
|
hidden_act="silu", |
|
hidden_size=512, |
|
initializer_range=0.02, |
|
intermediate_size=1365, |
|
max_position_embeddings=32000, |
|
num_attention_heads=8, |
|
num_hidden_layers=12, |
|
num_key_value_heads=4, |
|
pretraining_tp=1, |
|
rms_norm_eps=1e-05, |
|
rope_scaling=None, |
|
tie_word_embeddings=True, |
|
use_cache=True, |
|
vocab_size=32000, |
|
) |
|
|
|
model = BitLlamaForCausalLM._from_config(model_config) |
|
model.push_to_hub("bjoernp/micro-bitllama") |
|
``` |