yxue-jamandtea commited on
Commit
beb6d6b
·
verified ·
1 Parent(s): 13441db

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -3
README.md CHANGED
@@ -1,3 +1,103 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ base_model:
7
+ - deepseek-ai/DeepSeek-R1-Distill-Llama-70B
8
+ - meta-llama/Llama-3.3-70B-Instruct
9
+ tags:
10
+ - chat
11
+ library_name: transformers
12
+ ---
13
+
14
+ # Model Overview
15
+
16
+ - **Model Optimizations:**
17
+ - **Weight quantization:** FP8
18
+ - **Activation quantization:** FP8
19
+ - **Release Date:** 1/28/2025
20
+
21
+ Quantized version of [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) to FP8 data type, ready for inference with SGLang >= 0.3 or vLLM >= 0.5.2.
22
+ This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. Only the weights and activations of the linear operators within transformers blocks are quantized.
23
+
24
+ ## License
25
+
26
+ We notice that [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) is licensed under MIT, however the original LLama model is licensed under llama-3. We are adopting llama-3 for this model to be consistent.
27
+
28
+ ## Deployment
29
+
30
+ ### Use with SGLang
31
+
32
+ ```bash
33
+ python -m sglang.launch_server --model-path JamAndTeaStudios/DeepSeek-R1-Distill-Llama-70B-FP8-Dynamic \
34
+ --port 30000 --host 0.0.0.0
35
+ ```
36
+
37
+ ## Creation
38
+
39
+ This model was created with [llm-compressor](https://github.com/vllm-project/llm-compressor) by running the code snippet below.
40
+
41
+ <details>
42
+ <summary>Model Creation Code</summary>
43
+
44
+ ```python
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer
46
+
47
+ from llmcompressor.modifiers.quantization import QuantizationModifier
48
+ from llmcompressor.transformers import oneshot
49
+
50
+ MODEL_ID = "google/gemma-2-27b-it"
51
+
52
+ # 1) Load model.
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ MODEL_ID, device_map="auto", torch_dtype="auto"
55
+ )
56
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
57
+
58
+ # 2) Configure the quantization algorithm and scheme.
59
+ # In this case, we:
60
+ # * quantize the weights to fp8 with per channel via ptq
61
+ # * quantize the activations to fp8 with dynamic per token
62
+ recipe = QuantizationModifier(
63
+ targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"]
64
+ )
65
+
66
+ # 3) Apply quantization and save in compressed-tensors format.
67
+ OUTPUT_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
68
+ oneshot(
69
+ model=model,
70
+ recipe=recipe,
71
+ tokenizer=tokenizer,
72
+ output_dir=OUTPUT_DIR,
73
+ )
74
+
75
+ # Confirm generations of the quantized model look sane.
76
+ print("========== SAMPLE GENERATION ==============")
77
+ input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to("cuda")
78
+ output = model.generate(input_ids, max_new_tokens=20)
79
+ print(tokenizer.decode(output[0]))
80
+ print("==========================================")
81
+ ```
82
+ </details>
83
+
84
+ ## Evaluation
85
+
86
+ TBA
87
+
88
+ ## Play Retail Mage
89
+
90
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/64f908994110f1806f2c356a/vsWXpQqgHIqN4f4BM-RfS.png)
91
+
92
+ [Retail Mage (Steam)](https://store.steampowered.com/app/3224380/Retail_Mage/) is an immersive sim that uses online LLM inference in almost all features in the gameplay!
93
+
94
+ Reviews
95
+
96
+ “A true to life experience detailing how customer service really works.”
97
+ 10/10 – kpolupo
98
+
99
+ “I enjoyed how many things were flammable in the store.”
100
+ 5/5 – mr_srsbsns
101
+
102
+ “I've only known that talking little crow plushie in MageMart for a day and a half but if anything happened to him I would petrify everyone in this store and then myself.”
103
+ 7/7 – neondenki