license: apache-2.0
tags:
- snowflake
- arctic
Model Details
Arctic is a Dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI Research Team. We are releasing model checkpoints for both the base and instruct-tuned versions of Arctic under an Apache-2.0 license. This means you can use them freely in your own research, prototypes, and products.
Model developers Snowflake
License Apache-2.0
Input Models input text only.
Output Models generate text and code only.
Model Release Date April, 24th 2024.
Model Architecture
Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B total and 17B active parameters chosen using a top-2 gating. For more details about Arctic's model architecture please see our cookbook
Usage
As of 4/24/2024 we are actively working with the maintainers of transformers
to include the Arctic
model implementation. Until this support is released please follow these instructions to get the
required dependencies for using Arctic:
pip install git+https://github.com/Snowflake-Labs/transformers.git
Arctic leverages several features from DeepSpeed, you will need to install the latest version of DeepSpeed to get all of these required features:
pip install "deepspeed>=0.15.0"
Inference
To get the best performance with Arctic we highly recommend using TRT-LLM or vLLM for inference. However you
can also use transformers
to load
the model for text generation. Due to the model size we recommend using a single 8xH100 instance from your
favorite cloud provider such as: AWS p5.48xlarge,
Azure ND96isr_H100_v5, etc.
In addition, if you would like to access Acrtic via API we have colloborated with several inference API providers to host Acrtic such as AWS, Microsoft Azure, NVIDIA Foundry, Lamini, Perplexity, Replicate and Together.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("snowflake/arctic")
model = AutoModelForCausalLM.from_pretrained("snowflake/arctic", device_map="auto", torch_dtype=torch.bfloat16)
input_text = "Hello my name is "
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))
Fine-Tuning
TODO: add link and extra details about fine-tuning scripts
Metrics
TODO: add summary of metrics here, we don't necessarily need to compare to others but we can if we want
Training Data
TODO: add short description and links to training data related cookbook(s)