File size: 2,310 Bytes
6da791b
 
 
 
 
 
 
 
 
41566da
866c7b0
41566da
 
 
6da791b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: mit
base_model:
- deepseek-ai/DeepSeek-R1
---
#  LightWeight Deepseek R1 (2 Hidden Layers Version with Smaller Dimensions)

This project is created using the official **Deepseek R1** model script (`modeling_deepseek.py`) from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-R1/blob/main/modeling_deepseek.py). It implements a **2-layer version** of Deepseek R1 with randomly initialized weights and smaller dimensions.

## Purpose
The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run local quickly.

The original **Deepseek R1 model** requires an **8x H200 GPU setup** and runs on the **vLLM/SGLang framework**, making it difficult to deploy on standard hardware.

## Model Structure
The three hidden layers consist of:
- **A hidden layer: MLA + Dense MLP**
- **A hidden layer: MLA + MoE (Mixture of Experts) MLP**

The difference between this model and the original **Deepseek R1** is shown below:
```json
{
	"first_k_dense_replace": 1,
	"intermediate_size": 1024,
	"n_routed_experts": 64,
	"num_experts_per_tok": 4,
	"moe_intermediate_size": 128,
	"num_hidden_layers": 2,
	"num_nextn_predict_layers": 0
}
```

## Usage

```python
from transformers import AutoConfig, AutoModelForCausalLM
from transformers import AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained('silence09/DeepSeek-R1-Small-2layers', torch_dtype=torch.bfloat16).cuda()
tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-R1-Small-2layers')

prompt = "Who are u?"
messages = []
messages.append({"role": "user", "content": prompt})
prompt_tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(prompt_tokens, generated_ids)
]
completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(completion)
messages.append({"role": "assistant", "content": completion})

```

## More Info
It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_r1_small_2layers.py)