metadata
tags:
- text-generation
- 8bit
- 8-bit
- quantization
- compression
inference: false
license: apache-2.0
ethzanalytics/gpt-j-6B-8bit-sharded
This is a version of hivemind/gpt-j-6B-8bit
for low-RAM loading, i.e., free Colab runtimes :)
- shards are <= 1000MB each
- a demo notebook of how to use it is here
Please refer to the original model card for hivemind/gpt-j-6B-8bit for all details.
Usage
NOTE: PRIOR to loading the model, you need to "patch" it to be compatible with loading 8bit weights etc. See the original model card above for details on how to do this.
install transformers
, accelerate
, and bitsandbytes
if needed:
pip install transformers accelerate bitsandbytes
Patch the model, load using device_map="auto"
:
import transformers
from transformers import AutoTokenizer
"""
CODE TO PATCH GPTJForCausalLM GOES HERE
"""
tokenizer = AutoTokenizer.from_pretrained("ethzanalytics/gpt-j-6B-8bit-sharded")
model = GPTJForCausalLM.from_pretrained(
"ethzanalytics/gpt-j-6B-8bit-sharded",
device_map="auto",
)
Take a look at the notebook for details.