File size: 3,672 Bytes
36e9ebc ebdf7a5 36e9ebc 5aa7df9 36e9ebc ba52fc2 36e9ebc ebdf7a5 4a54b3a 36e9ebc dc364ff 36e9ebc ebdf7a5 ba52fc2 ebdf7a5 ba52fc2 ebdf7a5 ba52fc2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
---
license: other
license_name: qwen
license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
datasets:
- rubenroy/GammaCorpus-v2-5m
- rubenroy/GammaCorpus-CoT-Math-170k
- rubenroy/GammaCorpus-Fact-QA-450k
language:
- en
base_model:
- Qwen/Qwen2.5-72B-Instruct
pipeline_tag: text-generation
tags:
- qwen2
- chat
- conversational
- gilgamesh
- gammacorpus
library_name: transformers
---
# 🔥 Gilgamesh 72B 🔥
> [!NOTE]
> Gilgamesh (GGM) 72B is a finetune of Alibaba's **Qwen 2.5 72B Instruct** model.

## Model Details
- **Developed by:** [Ruben Roy](https://huggingface.co/rubenroy)
- **Funded by:** [The Ovantage Society](https://huggingface.co/Ovantage)
- **License:** Qwen
- **Base Model:** [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
- **Type:** Causal Language Models
- **Architecture:** transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
- **Number of Parameters:** 72.7B
- **Number of Paramaters (Non-Embedding):** 70.0B
- **Number of Layers:** 80
- **Number of Attention Heads (GQA):** 64 for Q and 8 for KV
> [!IMPORTANT]
> Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
## Datasets used
Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:
- **[GammaCorpus-v2-5m](https://huggingface.co/datasets/rubenroy/GammaCorpus-v2-5m)**: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
- **[GammaCorpus-CoT-Math-170k](https://huggingface.co/datasets/rubenroy/GammaCorpus-CoT-Math-170k)**: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics made to help the model improve step-by-step problem-solving.
- **[GammaCorpus-Fact-QA-450k](https://huggingface.co/datasets/rubenroy/GammaCorpus-Fact-QA-450k)**: A dataset containing factual question-answer pairs for enforcing some important current knowledge.
These datasets were all built and curated by me, however I thank my other team members at [Ovantage Labs](https://huggingface.co/Ovantage) for assisting me in the creation and curation of these datasets.
## Usage
You can test out Gilgamesh 72B with the example usage using the Transformers library:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "rubenroy/Gilgamesh-72B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=2048
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
```
## License
This model follows the Qwen License Agreement by Alibaba Cloud. See the [LICENSE file](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE) for more information.
## Special Thanks
A huge thanks to my fellow team members at [Ovantage Labs](https://huggingface.co/Ovantage) for providing the H100s that made this training possible. |