Model Card for Alpaca Cerebras-6.7B LoRA
This repository contains the adapter weights for the Cerebras-6.7B model finetuned on the cleaned version of the alpaca dataset following github.com/tloen/alpaca-lora. Find the code used for finetuning at our fork: github.com/bjoernpl/cerebras-lora.
Model Details
Model Description
Copied from cerebras/Cerebras-GPT-6.7B model card:
The Cerebras-GPT family is released to facilitate research into LLM scaling laws using open architectures and data sets and demonstrate the simplicity of and scalability of training LLMs on the Cerebras software and hardware stack. All Cerebras-GPT models are available on Hugging Face.
The family includes 111M, 256M, 590M, 1.3B, 2.7B, 6.7B, and 13B models.
All models in the Cerebras-GPT family have been trained in accordance with Chinchilla scaling laws (20 tokens per model parameter) which is compute-optimal.
These models were trained on the Andromeda AI supercomputer comprised of 16 CS-2 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
Cerebras systems for pre-training and fine tuning are available in the cloud via the Cerebras Model Studio. Cerebras CS-2 compatible checkpoints are available in Cerebras Model Zoo.
- Developed by: Cerebras Systems finetuned by Björn P..
- License: Apache 2.0
- Model type: Transformer-based Language Model
- Architecture: GPT-3 style architecture with LoRA adapter
- Data set: The Pile
- Tokenizer: Byte Pair Encoding
- Vocabulary Size: 50257
- Sequence Length: 2048
- Optimizer: AdamW, (β1, β2) = (0.9, 0.95), adam_eps = 1e−8 (1e−9 for larger models)
- Positional Encoding: Learned
- Language: English
- Learn more: Dense Scaling Laws Paper for training procedure, config files, and details on how to use.
Quickstart
See github.com/bjoernpl/cerebras-lora for a Gradio demo and more code.
This model can be easily loaded using the AutoModelForCausalLM functionality:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
tokenizer = AutoTokenizer.from_pretrained("cerebras/Cerebras-GPT-6.7B")
model = AutoModelForCausalLM.from_pretrained("cerebras/Cerebras-GPT-6.7B", torch_dtype=torch.float16, device_map='auto', load_in_8bit=True)
model = PeftModel.from_pretrained(model, "bjoernp/alpaca-cerebras-6.7B", torch_dtype=torch.float16, device_map='auto')
text = "Generative AI is "
And can be used with Hugging Face Pipelines
from transformers import pipeline
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
generated_text = pipe(text, max_length=50, do_sample=False, no_repeat_ngram_size=2)[0]
print(generated_text['generated_text'])
or with model.generate()
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, num_beams=5,
max_new_tokens=50, early_stopping=True,
no_repeat_ngram_size=2)
text_output = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(text_output[0])
Environmental Impact
Experiments were conducted using a private infrastructure, which has a carbon efficiency of 0.432 kgCO2eq/kWh. A cumulative of 5 hours of computation was performed on hardware of type RTX 3090Ti (TDP of 450W).
Total emissions are estimated to be 0.97 kgCO2eq of which 0 percents were directly offset.
Carbon emissions were estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: RTX 3090Ti
- Hours used: 5
- Carbon Emitted: 0.97 kgCO2eq