File size: 3,672 Bytes
36e9ebc
 
ebdf7a5
 
36e9ebc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5aa7df9
36e9ebc
 
ba52fc2
36e9ebc
ebdf7a5
4a54b3a
36e9ebc
dc364ff
 
36e9ebc
 
 
 
ebdf7a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba52fc2
ebdf7a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ba52fc2
 
 
ebdf7a5
ba52fc2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: other
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
datasets:
- rubenroy/GammaCorpus-v2-5m
- rubenroy/GammaCorpus-CoT-Math-170k
- rubenroy/GammaCorpus-Fact-QA-450k
language:
- en
base_model:
- Qwen/Qwen2.5-72B-Instruct
pipeline_tag: text-generation
tags:
- qwen2
- chat
- conversational
- gilgamesh
- gammacorpus
library_name: transformers
---

# 🔥 Gilgamesh 72B 🔥

> [!NOTE]
> Gilgamesh (GGM) 72B is a finetune of Alibaba's **Qwen 2.5 72B Instruct** model. 

![GIlgamesh AI Art](https://cdn.ruben-roy.com/AI/Gilgamesh/img/art.png)

## Model Details
- **Developed by:** [Ruben Roy](https://huggingface.co/rubenroy)
- **Funded by:** [The Ovantage Society](https://huggingface.co/Ovantage)
- **License:** Qwen
- **Base Model:** [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
- **Type:** Causal Language Models
- **Architecture:** transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
- **Number of Parameters:** 72.7B
- **Number of Paramaters (Non-Embedding):** 70.0B
- **Number of Layers:** 80
- **Number of Attention Heads (GQA):** 64 for Q and 8 for KV

> [!IMPORTANT]
> Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.

## Datasets used

Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:

- **[GammaCorpus-v2-5m](https://huggingface.co/datasets/rubenroy/GammaCorpus-v2-5m)**: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
- **[GammaCorpus-CoT-Math-170k](https://huggingface.co/datasets/rubenroy/GammaCorpus-CoT-Math-170k)**: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics made to help the model improve step-by-step problem-solving.
- **[GammaCorpus-Fact-QA-450k](https://huggingface.co/datasets/rubenroy/GammaCorpus-Fact-QA-450k)**: A dataset containing factual question-answer pairs for enforcing some important current knowledge.

These datasets were all built and curated by me, however I thank my other team members at [Ovantage Labs](https://huggingface.co/Ovantage) for assisting me in the creation and curation of these datasets.

## Usage

You can test out Gilgamesh 72B with the example usage using the Transformers library:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "rubenroy/Gilgamesh-72B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?"

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=2048
)

generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```

## License
This model follows the Qwen License Agreement by Alibaba Cloud. See the [LICENSE file](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE) for more information.

## Special Thanks
A huge thanks to my fellow team members at [Ovantage Labs](https://huggingface.co/Ovantage) for providing the H100s that made this training possible.