Mrw33554432
/

bitLinear-phi-1.5

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Mrw33554432 commited on Apr 12

Commit

2cccfa4

•

1 Parent(s): 2b7395e

Upload readme.md

Files changed (1) hide show

readme.md +60 -0

readme.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+license: mit
+datasets:
+- wikipedia
+---
+# BitLinear-phi-1.5
+BitLinear-phi-1.5 is a model trained partially using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764).
+Our BitLinear layer will only apply 1-bit quantization to the weight, all other computations in the paper is discarded.
+The model structure is from [phi-1.5](https://huggingface.co/microsoft/phi-1_5), with all linear layers except lm_head replaced with our custom BitLinear layer.
+It was trained on a small subset of the [wikipedia dataset](https://huggingface.co/datasets/wikipedia) dataset, for research validation purpose only.
+```python
+dataset = load_dataset("wikipedia", "20220301.en")
+dataset = dataset['train'].select(range(int(1e5)))
+```
+The model is trained on a 3090(24GB) for 16 hours.
+### For training code, check --placeholder--.
+The training code should be compatible with most of the LLMs in huggingface, but you have to start from scratch.
+Using pretrained model weight will not work due to gradient explosion.
+## Sample inference code
+```python
+import torch
+from replace_hf import replace_linear_in_hf
+from transformers import AutoModelForCausalLM, AutoTokenizer
+def quick_test(model, tokenizer, prompt: str):
+    # Encode the inputs
+    inputs = tokenizer.encode(prompt, return_tensors="pt")
+    # Generate outputs
+    outputs = model.generate(inputs, max_length=64)
+    # Decode and print the outputs
+    print(tokenizer.decode(outputs[0]))
+torch.set_default_device("cuda")
+tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained("Mrw33554432/bitLinear-phi-1.5", trust_remote_code=True)
+tokenizer.pad_token = tokenizer.eos_token
+print(model)
+# Replace Linear layers with BitLinear
+replace_linear_in_hf(model, keep_param=True)
+print(model)
+quick_test(model, tokenizer, prompt="Tom is the")
+```