File size: 2,920 Bytes
89831d5
 
a519f10
 
 
5034d89
89831d5
a519f10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5034d89
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: apache-2.0
tags:
- snowflake
- arctic
- moe
---

## Model Details

Arctic is a Dense-MoE Hybrid transformer architecture pre-trained from scratch by the Snowflake AI 
Research Team. We are releasing model checkpoints for both the base and instruct-tuned versions of 
Arctic under an Apache-2.0 license. This means you can use them freely in your own research, 
prototypes, and products.

* [Arctic-Base](link-here)
* [Acrtic-Instruct](link-to-instruct)

**Model developers** Snowflake

**License** Apache-2.0

**Input** Models input text only.

**Output** Models generate text and code only.

**Model Release Date** April, 24th 2024.

## Model Architecture

Arctic combines a 10B dense transformer model with a residual 128x3.66B MoE MLP resulting in 480B 
total and 17B active parameters chosen using a top-2 gating. For more details about Arctic's model
architecture please see our cookbook 


## Usage

As of 4/24/2024 we are actively working with the maintainers of `transformers` to include the Arctic 
model implementation. Until this support is released please follow these instructions to get the 
required dependencies for using Arctic:

```python
pip install git+https://github.com/Snowflake-Labs/transformers.git
```

Arctic leverages several features from [DeepSpeed](https://github.com/microsoft/DeepSpeed), you will need to 
install the latest version of DeepSpeed to get all of these required features:

```python
pip install "deepspeed>=0.15.0"
```

### Inference

To get the best performance with Arctic we highly recommend using TRT-LLM or vLLM for inference. However you 
can also use `transformers` to load 
the model for text generation. Due to the model size we recommend using a single 8xH100 instance from your
favorite cloud provider such as: AWS [p5.48xlarge](https://aws.amazon.com/ec2/instance-types/p5/), 
Azure [ND96isr_H100_v5](https://learn.microsoft.com/en-us/azure/virtual-machines/nd-h100-v5-series), etc.

In addition, if you would like to access Acrtic via API we have colloborated with several inference API 
providers to host Acrtic such as AWS, Microsoft Azure, NVIDIA Foundry, Lamini, Perplexity, Replicate and Together.

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("snowflake/arctic")
model = AutoModelForCausalLM.from_pretrained("snowflake/arctic", device_map="auto", torch_dtype=torch.bfloat16)

input_text = "Hello my name is "
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))
```

### Fine-Tuning

TODO: add link and extra details about fine-tuning scripts

## Metrics

TODO: add summary of metrics here, we don't necessarily need to compare to others but we can if we want

## Training Data

TODO: add short description and links to training data related cookbook(s)