Mrw33554432 commited on
Commit
2cccfa4
1 Parent(s): 2b7395e

Upload readme.md

Browse files
Files changed (1) hide show
  1. readme.md +60 -0
readme.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - wikipedia
5
+ ---
6
+ # BitLinear-phi-1.5
7
+
8
+ BitLinear-phi-1.5 is a model trained partially using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764).
9
+
10
+ Our BitLinear layer will only apply 1-bit quantization to the weight, all other computations in the paper is discarded.
11
+
12
+ The model structure is from [phi-1.5](https://huggingface.co/microsoft/phi-1_5), with all linear layers except lm_head replaced with our custom BitLinear layer.
13
+
14
+ It was trained on a small subset of the [wikipedia dataset](https://huggingface.co/datasets/wikipedia) dataset, for research validation purpose only.
15
+
16
+ ```python
17
+ dataset = load_dataset("wikipedia", "20220301.en")
18
+ dataset = dataset['train'].select(range(int(1e5)))
19
+ ```
20
+ The model is trained on a 3090(24GB) for 16 hours.
21
+
22
+ ### For training code, check --placeholder--.
23
+
24
+ The training code should be compatible with most of the LLMs in huggingface, but you have to start from scratch.
25
+
26
+ Using pretrained model weight will not work due to gradient explosion.
27
+
28
+ ## Sample inference code
29
+
30
+
31
+ ```python
32
+ import torch
33
+ from replace_hf import replace_linear_in_hf
34
+ from transformers import AutoModelForCausalLM, AutoTokenizer
35
+
36
+
37
+ def quick_test(model, tokenizer, prompt: str):
38
+ # Encode the inputs
39
+ inputs = tokenizer.encode(prompt, return_tensors="pt")
40
+
41
+ # Generate outputs
42
+ outputs = model.generate(inputs, max_length=64)
43
+
44
+ # Decode and print the outputs
45
+ print(tokenizer.decode(outputs[0]))
46
+
47
+
48
+ torch.set_default_device("cuda")
49
+
50
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
51
+ model = AutoModelForCausalLM.from_pretrained("Mrw33554432/bitLinear-phi-1.5", trust_remote_code=True)
52
+ tokenizer.pad_token = tokenizer.eos_token
53
+
54
+ print(model)
55
+ # Replace Linear layers with BitLinear
56
+ replace_linear_in_hf(model, keep_param=True)
57
+ print(model)
58
+
59
+ quick_test(model, tokenizer, prompt="Tom is the")
60
+ ```