|
--- |
|
license: mit |
|
datasets: |
|
- wikipedia |
|
--- |
|
# BitLinear-phi-1.5 |
|
|
|
BitLinear-phi-1.5 is a model trained partially using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764). |
|
|
|
Our BitLinear layer will only apply 1-bit quantization to the weight, all other computations in the paper is discarded. |
|
|
|
The model structure is from [phi-1.5](https://huggingface.co/microsoft/phi-1_5), with all linear layers except lm_head replaced with our custom BitLinear layer. |
|
|
|
It was trained on a small subset of the [wikipedia dataset](https://huggingface.co/datasets/wikipedia) dataset, for research validation purpose only. |
|
|
|
```python |
|
dataset = load_dataset("wikipedia", "20220301.en") |
|
dataset = dataset['train'].select(range(int(1e5))) |
|
``` |
|
The model is trained on a 3090(24GB) for 16 hours. |
|
|
|
### For training code, check --placeholder--. |
|
|
|
The training code should be compatible with most of the LLMs in huggingface, but you have to start from scratch. |
|
|
|
Using pretrained model weight will not work due to gradient explosion. |
|
|
|
## Sample inference code |
|
|
|
|
|
```python |
|
import torch |
|
from replace_hf import replace_linear_in_hf |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
def quick_test(model, tokenizer, prompt: str): |
|
# Encode the inputs |
|
inputs = tokenizer.encode(prompt, return_tensors="pt") |
|
|
|
# Generate outputs |
|
outputs = model.generate(inputs, max_length=64) |
|
|
|
# Decode and print the outputs |
|
print(tokenizer.decode(outputs[0])) |
|
|
|
|
|
torch.set_default_device("cuda") |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained("Mrw33554432/bitLinear-phi-1.5", trust_remote_code=True) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
print(model) |
|
# Replace Linear layers with BitLinear |
|
replace_linear_in_hf(model, keep_param=True) |
|
print(model) |
|
|
|
quick_test(model, tokenizer, prompt="Tom is the") |
|
``` |