Mrw33554432
commited on
Commit
•
2cccfa4
1
Parent(s):
2b7395e
Upload readme.md
Browse files
readme.md
ADDED
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
datasets:
|
4 |
+
- wikipedia
|
5 |
+
---
|
6 |
+
# BitLinear-phi-1.5
|
7 |
+
|
8 |
+
BitLinear-phi-1.5 is a model trained partially using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764).
|
9 |
+
|
10 |
+
Our BitLinear layer will only apply 1-bit quantization to the weight, all other computations in the paper is discarded.
|
11 |
+
|
12 |
+
The model structure is from [phi-1.5](https://huggingface.co/microsoft/phi-1_5), with all linear layers except lm_head replaced with our custom BitLinear layer.
|
13 |
+
|
14 |
+
It was trained on a small subset of the [wikipedia dataset](https://huggingface.co/datasets/wikipedia) dataset, for research validation purpose only.
|
15 |
+
|
16 |
+
```python
|
17 |
+
dataset = load_dataset("wikipedia", "20220301.en")
|
18 |
+
dataset = dataset['train'].select(range(int(1e5)))
|
19 |
+
```
|
20 |
+
The model is trained on a 3090(24GB) for 16 hours.
|
21 |
+
|
22 |
+
### For training code, check --placeholder--.
|
23 |
+
|
24 |
+
The training code should be compatible with most of the LLMs in huggingface, but you have to start from scratch.
|
25 |
+
|
26 |
+
Using pretrained model weight will not work due to gradient explosion.
|
27 |
+
|
28 |
+
## Sample inference code
|
29 |
+
|
30 |
+
|
31 |
+
```python
|
32 |
+
import torch
|
33 |
+
from replace_hf import replace_linear_in_hf
|
34 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
35 |
+
|
36 |
+
|
37 |
+
def quick_test(model, tokenizer, prompt: str):
|
38 |
+
# Encode the inputs
|
39 |
+
inputs = tokenizer.encode(prompt, return_tensors="pt")
|
40 |
+
|
41 |
+
# Generate outputs
|
42 |
+
outputs = model.generate(inputs, max_length=64)
|
43 |
+
|
44 |
+
# Decode and print the outputs
|
45 |
+
print(tokenizer.decode(outputs[0]))
|
46 |
+
|
47 |
+
|
48 |
+
torch.set_default_device("cuda")
|
49 |
+
|
50 |
+
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
|
51 |
+
model = AutoModelForCausalLM.from_pretrained("Mrw33554432/bitLinear-phi-1.5", trust_remote_code=True)
|
52 |
+
tokenizer.pad_token = tokenizer.eos_token
|
53 |
+
|
54 |
+
print(model)
|
55 |
+
# Replace Linear layers with BitLinear
|
56 |
+
replace_linear_in_hf(model, keep_param=True)
|
57 |
+
print(model)
|
58 |
+
|
59 |
+
quick_test(model, tokenizer, prompt="Tom is the")
|
60 |
+
```
|