danielhanchen
commited on
Upload LlamaForCausalLM
Browse files
README.md
CHANGED
@@ -1,212 +1,199 @@
|
|
1 |
---
|
2 |
-
base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
|
3 |
-
language:
|
4 |
-
- en
|
5 |
-
license: llama3.3
|
6 |
library_name: transformers
|
7 |
-
tags:
|
8 |
-
- deepseek
|
9 |
-
- unsloth
|
10 |
-
- transformers
|
11 |
-
- llama
|
12 |
-
- llama-3
|
13 |
-
- meta
|
14 |
---
|
15 |
|
16 |
-
|
17 |
|
|
|
18 |
|
19 |
-
# Finetune LLMs 2-5x faster with 70% less memory via Unsloth!
|
20 |
-
We have a free Google Colab Tesla T4 notebook for Llama 3.1 (8B) here: https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb
|
21 |
|
22 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/Discord%20button.png" width="200"/>](https://discord.gg/unsloth)
|
23 |
-
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
24 |
|
|
|
25 |
|
26 |
-
|
27 |
|
28 |
-
|
29 |
|
30 |
-
|
31 |
-
|-----------------|--------------------------------------------------------------------------------------------------------------------------|-------------|----------|
|
32 |
-
| **Llama-3.2 (3B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(1B_and_3B)-Conversational.ipynb) | 2.4x faster | 58% less |
|
33 |
-
| **Llama-3.2 (11B vision)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb) | 2x faster | 60% less |
|
34 |
-
| **Qwen2 VL (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2_VL_(7B)-Vision.ipynb) | 1.8x faster | 60% less |
|
35 |
-
| **Qwen2.5 (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Qwen2.5_(7B)-Alpaca.ipynb) | 2x faster | 60% less |
|
36 |
-
| **Llama-3.1 (8B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb) | 2.4x faster | 58% less |
|
37 |
-
| **Phi-3.5 (mini)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Phi_3.5_Mini-Conversational.ipynb) | 2x faster | 50% less |
|
38 |
-
| **Gemma 2 (9B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Gemma2_(9B)-Alpaca.ipynb) | 2.4x faster | 58% less |
|
39 |
-
| **Mistral (7B)** | [▶️ Start on Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_v0.3_(7B)-Conversational.ipynb) | 2.2x faster | 62% less |
|
40 |
|
41 |
-
[
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
-
|
44 |
-
- This [text completion notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Mistral_(7B)-Text_Completion.ipynb) is for raw text. This [DPO notebook](https://colab.research.google.com/drive/15vttTpzzVXv_tJwEk-hIcQ0S9FcEWvwP?usp=sharing) replicates Zephyr.
|
45 |
-
- \* Kaggle has 2x T4s, but we use 1. Due to overhead, 1x T4 is 5x faster.
|
46 |
|
47 |
-
|
48 |
-
A huge thank you to the DeepSeek team for creating and releasing these models.
|
49 |
|
|
|
|
|
|
|
50 |
|
|
|
51 |
|
52 |
-
|
53 |
|
54 |
-
|
55 |
-
DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning.
|
56 |
-
With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors.
|
57 |
-
However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance,
|
58 |
-
we introduce DeepSeek-R1, which incorporates cold-start data before RL.
|
59 |
-
DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.
|
60 |
-
To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
|
61 |
|
62 |
-
|
63 |
-
<img width="80%" src="figures/benchmark.jpg">
|
64 |
-
</p>
|
65 |
|
66 |
-
|
67 |
|
68 |
-
|
69 |
|
70 |
-
|
71 |
|
72 |
-
|
73 |
|
74 |
-
|
75 |
-
We believe the pipeline will benefit the industry by creating better models.
|
76 |
|
77 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
78 |
|
79 |
-
|
80 |
|
81 |
-
|
82 |
-
- Using the reasoning data generated by DeepSeek-R1, we fine-tuned several dense models that are widely used in the research community. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks. We open-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.
|
83 |
|
84 |
-
|
85 |
|
86 |
-
###
|
87 |
|
88 |
-
|
89 |
|
90 |
-
|
91 |
-
| :------------: | :------------: | :------------: | :------------: | :------------: |
|
92 |
-
| DeepSeek-R1-Zero | 671B | 37B | 128K | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero) |
|
93 |
-
| DeepSeek-R1 | 671B | 37B | 128K | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1) |
|
94 |
|
95 |
-
</div>
|
96 |
|
97 |
-
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
|
98 |
-
For more details regrading the model architecture, please refer to [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repository.
|
99 |
|
100 |
-
|
101 |
|
102 |
-
|
103 |
|
104 |
-
|
105 |
-
| :------------: | :------------: | :------------: |
|
106 |
-
| DeepSeek-R1-Distill-Qwen-1.5B | [Qwen2.5-Math-1.5B](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) |
|
107 |
-
| DeepSeek-R1-Distill-Qwen-7B | [Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) |
|
108 |
-
| DeepSeek-R1-Distill-Llama-8B | [Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) |
|
109 |
-
| DeepSeek-R1-Distill-Qwen-14B | [Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) |
|
110 |
-
|DeepSeek-R1-Distill-Qwen-32B | [Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) |
|
111 |
-
| DeepSeek-R1-Distill-Llama-70B | [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | [🤗 HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B) |
|
112 |
|
113 |
-
|
114 |
|
115 |
-
|
116 |
-
We slightly change their configs and tokenizers. Please use our setting to run these models.
|
117 |
|
118 |
-
|
119 |
|
120 |
-
|
121 |
-
|
122 |
-
|
|
|
|
|
123 |
|
|
|
124 |
|
125 |
-
|
126 |
-
|----------|-------------------|----------------------|------------|--------------|----------------|------------|--------------|
|
127 |
-
| | Architecture | - | - | MoE | - | - | MoE |
|
128 |
-
| | # Activated Params | - | - | 37B | - | - | 37B |
|
129 |
-
| | # Total Params | - | - | 671B | - | - | 671B |
|
130 |
-
| English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | **91.8** | 90.8 |
|
131 |
-
| | MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | - | **92.9** |
|
132 |
-
| | MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | - | **84.0** |
|
133 |
-
| | DROP (3-shot F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | **92.2** |
|
134 |
-
| | IF-Eval (Prompt Strict) | **86.5** | 84.3 | 86.1 | 84.8 | - | 83.3 |
|
135 |
-
| | GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | **75.7** | 71.5 |
|
136 |
-
| | SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | **47.0** | 30.1 |
|
137 |
-
| | FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | - | **82.5** |
|
138 |
-
| | AlpacaEval2.0 (LC-winrate) | 52.0 | 51.1 | 70.0 | 57.8 | - | **87.6** |
|
139 |
-
| | ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | - | **92.3** |
|
140 |
-
| Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | - | 53.8 | 63.4 | **65.9** |
|
141 |
-
| | Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | **96.6** | 96.3 |
|
142 |
-
| | Codeforces (Rating) | 717 | 759 | 1134 | 1820 | **2061** | 2029 |
|
143 |
-
| | SWE Verified (Resolved) | **50.8** | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 |
|
144 |
-
| | Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | **61.7** | 53.3 |
|
145 |
-
| Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | **79.8** |
|
146 |
-
| | MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | **97.3** |
|
147 |
-
| | CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | - | **78.8** |
|
148 |
-
| Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | - | **92.8** |
|
149 |
-
| | C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | - | **91.8** |
|
150 |
-
| | C-SimpleQA (Correct) | 55.4 | 58.7 | **68.0** | 40.3 | - | 63.7 |
|
151 |
|
152 |
-
|
153 |
|
|
|
154 |
|
155 |
-
|
156 |
|
|
|
157 |
|
158 |
-
|
159 |
|
160 |
-
|
161 |
-
|------------------------------------------|------------------|-------------------|-----------------|----------------------|----------------------|-------------------|
|
162 |
-
| GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 |
|
163 |
-
| Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 |
|
164 |
-
| o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | **1820** |
|
165 |
-
| QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 |
|
166 |
-
| DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 |
|
167 |
-
| DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 |
|
168 |
-
| DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 |
|
169 |
-
| DeepSeek-R1-Distill-Qwen-32B | **72.6** | 83.3 | 94.3 | 62.1 | 57.2 | 1691 |
|
170 |
-
| DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 |
|
171 |
-
| DeepSeek-R1-Distill-Llama-70B | 70.0 | **86.7** | **94.5** | **65.2** | **57.5** | 1633 |
|
172 |
|
173 |
-
|
174 |
|
|
|
175 |
|
176 |
-
|
177 |
-
You can chat with DeepSeek-R1 on DeepSeek's official website: [chat.deepseek.com](https://chat.deepseek.com), and switch on the button "DeepThink"
|
178 |
|
179 |
-
|
180 |
|
181 |
-
|
182 |
|
183 |
-
|
184 |
|
185 |
-
|
186 |
|
187 |
-
|
188 |
|
189 |
-
|
190 |
|
191 |
-
|
192 |
|
193 |
-
|
194 |
-
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager
|
195 |
-
```
|
196 |
|
197 |
-
|
198 |
|
199 |
-
##
|
200 |
-
This code repository and the model weights are licensed under the [MIT License](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/LICENSE).
|
201 |
-
DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
|
202 |
-
- DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from [Qwen-2.5 series](https://github.com/QwenLM/Qwen2.5), which are originally licensed under [Apache 2.0 License](https://huggingface.co/Qwen/Qwen2.5-1.5B/blob/main/LICENSE), and now finetuned with 800k samples curated with DeepSeek-R1.
|
203 |
-
- DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originally licensed under [llama3.1 license](https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/LICENSE).
|
204 |
-
- DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originally licensed under [llama3.3 license](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct/blob/main/LICENSE).
|
205 |
|
206 |
-
|
207 |
-
```
|
208 |
|
209 |
-
|
210 |
|
211 |
-
|
212 |
-
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]).
|
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
library_name: transformers
|
3 |
+
tags: []
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
+
# Model Card for Model ID
|
7 |
|
8 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
|
|
|
|
|
10 |
|
|
|
|
|
11 |
|
12 |
+
## Model Details
|
13 |
|
14 |
+
### Model Description
|
15 |
|
16 |
+
<!-- Provide a longer summary of what this model is. -->
|
17 |
|
18 |
+
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
|
20 |
+
- **Developed by:** [More Information Needed]
|
21 |
+
- **Funded by [optional]:** [More Information Needed]
|
22 |
+
- **Shared by [optional]:** [More Information Needed]
|
23 |
+
- **Model type:** [More Information Needed]
|
24 |
+
- **Language(s) (NLP):** [More Information Needed]
|
25 |
+
- **License:** [More Information Needed]
|
26 |
+
- **Finetuned from model [optional]:** [More Information Needed]
|
27 |
|
28 |
+
### Model Sources [optional]
|
|
|
|
|
29 |
|
30 |
+
<!-- Provide the basic links for the model. -->
|
|
|
31 |
|
32 |
+
- **Repository:** [More Information Needed]
|
33 |
+
- **Paper [optional]:** [More Information Needed]
|
34 |
+
- **Demo [optional]:** [More Information Needed]
|
35 |
|
36 |
+
## Uses
|
37 |
|
38 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
|
40 |
+
### Direct Use
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
|
|
|
|
43 |
|
44 |
+
[More Information Needed]
|
45 |
|
46 |
+
### Downstream Use [optional]
|
47 |
|
48 |
+
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
|
50 |
+
[More Information Needed]
|
51 |
|
52 |
+
### Out-of-Scope Use
|
|
|
53 |
|
54 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
55 |
+
|
56 |
+
[More Information Needed]
|
57 |
+
|
58 |
+
## Bias, Risks, and Limitations
|
59 |
+
|
60 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
+
|
62 |
+
[More Information Needed]
|
63 |
+
|
64 |
+
### Recommendations
|
65 |
+
|
66 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
+
|
68 |
+
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
+
|
70 |
+
## How to Get Started with the Model
|
71 |
+
|
72 |
+
Use the code below to get started with the model.
|
73 |
+
|
74 |
+
[More Information Needed]
|
75 |
+
|
76 |
+
## Training Details
|
77 |
+
|
78 |
+
### Training Data
|
79 |
+
|
80 |
+
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
+
|
82 |
+
[More Information Needed]
|
83 |
+
|
84 |
+
### Training Procedure
|
85 |
+
|
86 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
87 |
+
|
88 |
+
#### Preprocessing [optional]
|
89 |
+
|
90 |
+
[More Information Needed]
|
91 |
+
|
92 |
+
|
93 |
+
#### Training Hyperparameters
|
94 |
+
|
95 |
+
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
+
|
97 |
+
#### Speeds, Sizes, Times [optional]
|
98 |
+
|
99 |
+
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
+
|
101 |
+
[More Information Needed]
|
102 |
+
|
103 |
+
## Evaluation
|
104 |
+
|
105 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
106 |
+
|
107 |
+
### Testing Data, Factors & Metrics
|
108 |
+
|
109 |
+
#### Testing Data
|
110 |
+
|
111 |
+
<!-- This should link to a Dataset Card if possible. -->
|
112 |
+
|
113 |
+
[More Information Needed]
|
114 |
+
|
115 |
+
#### Factors
|
116 |
+
|
117 |
+
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
+
|
119 |
+
[More Information Needed]
|
120 |
|
121 |
+
#### Metrics
|
122 |
|
123 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
|
|
124 |
|
125 |
+
[More Information Needed]
|
126 |
|
127 |
+
### Results
|
128 |
|
129 |
+
[More Information Needed]
|
130 |
|
131 |
+
#### Summary
|
|
|
|
|
|
|
132 |
|
|
|
133 |
|
|
|
|
|
134 |
|
135 |
+
## Model Examination [optional]
|
136 |
|
137 |
+
<!-- Relevant interpretability work for the model goes here -->
|
138 |
|
139 |
+
[More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
140 |
|
141 |
+
## Environmental Impact
|
142 |
|
143 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
|
|
144 |
|
145 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
|
147 |
+
- **Hardware Type:** [More Information Needed]
|
148 |
+
- **Hours used:** [More Information Needed]
|
149 |
+
- **Cloud Provider:** [More Information Needed]
|
150 |
+
- **Compute Region:** [More Information Needed]
|
151 |
+
- **Carbon Emitted:** [More Information Needed]
|
152 |
|
153 |
+
## Technical Specifications [optional]
|
154 |
|
155 |
+
### Model Architecture and Objective
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
156 |
|
157 |
+
[More Information Needed]
|
158 |
|
159 |
+
### Compute Infrastructure
|
160 |
|
161 |
+
[More Information Needed]
|
162 |
|
163 |
+
#### Hardware
|
164 |
|
165 |
+
[More Information Needed]
|
166 |
|
167 |
+
#### Software
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
168 |
|
169 |
+
[More Information Needed]
|
170 |
|
171 |
+
## Citation [optional]
|
172 |
|
173 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
|
|
174 |
|
175 |
+
**BibTeX:**
|
176 |
|
177 |
+
[More Information Needed]
|
178 |
|
179 |
+
**APA:**
|
180 |
|
181 |
+
[More Information Needed]
|
182 |
|
183 |
+
## Glossary [optional]
|
184 |
|
185 |
+
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
186 |
|
187 |
+
[More Information Needed]
|
188 |
|
189 |
+
## More Information [optional]
|
|
|
|
|
190 |
|
191 |
+
[More Information Needed]
|
192 |
|
193 |
+
## Model Card Authors [optional]
|
|
|
|
|
|
|
|
|
|
|
194 |
|
195 |
+
[More Information Needed]
|
|
|
196 |
|
197 |
+
## Model Card Contact
|
198 |
|
199 |
+
[More Information Needed]
|
|