rubenroy
/

Gilgamesh-72B

@@ -1,5 +1,7 @@
 ---
 license: other
 datasets:
 - rubenroy/GammaCorpus-v2-5m
 - rubenroy/GammaCorpus-CoT-Math-170k
@@ -19,15 +21,80 @@ tags:
 # Gilgamesh 72B
-The Gilgamesh 72B model was fine-tuned off of Qwen 2.5 72B Instruct. Built with Qwen.
 ![GIlgamesh AI Art](https://cdn.ruben-roy.com/AI/Gilgamesh/img/art.png)
 ## Model Details
-### Model Description
 - **Developed by:** [Ruben Roy](https://huggingface.co/rubenroy)
 - **Funded by:** [The Ovantage Society](https://huggingface.co/Ovantage)
 - **License:** Qwen
-- **Finetuned from model:** [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)

 ---
 license: other
+license_name: qwen
+license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
 datasets:
 - rubenroy/GammaCorpus-v2-5m
 - rubenroy/GammaCorpus-CoT-Math-170k
 # Gilgamesh 72B
+> [!NOTE]
+> Built on Qwen 72B Instruct
+## Overview
+Gilgamesh (GGM) 72B is a heavy fine-tune of Alibaba's **Qwen 2.5 72B Instruct** model.
 ![GIlgamesh AI Art](https://cdn.ruben-roy.com/AI/Gilgamesh/img/art.png)
 ## Model Details
 - **Developed by:** [Ruben Roy](https://huggingface.co/rubenroy)
 - **Funded by:** [The Ovantage Society](https://huggingface.co/Ovantage)
 - **License:** Qwen
+- **Base Model:** [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
+- **Type:** Causal Language Models
+- **Architecture:** transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
+- **Number of Parameters:** 72.7B
+- **Number of Paramaters (Non-Embedding):** 70.0B
+- **Number of Layers:** 80
+- **Number of Attention Heads (GQA):** 64 for Q and 8 for KV
+> [!IMPORTANT]
+> Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
+## Datasets used
+Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:
+- **[GammaCorpus-v2-5m](https://huggingface.co/datasets/rubenroy/GammaCorpus-v2-5m)**: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
+- **[GammaCorpus-CoT-Math-170k](https://huggingface.co/datasets/rubenroy/GammaCorpus-CoT-Math-170k)**: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics, helping the model improve step-by-step problem-solving. Its also important to note that some models using this dataset dataset may experience a minor increase in coding performance!
+- **[GammaCorpus-Fact-QA-450k](https://huggingface.co/datasets/rubenroy/GammaCorpus-Fact-QA-450k)**: A dataset containing factual question-answer pairs for enforcing some important current knowledge.
+These datasets were all built and curated by me, however I thank my other team members at [Ovantage Labs](https://huggingface.co/Ovantage) for assisting me in the creation and curation of these datasets.
+## Usage
+You can test out Gilgamesh 72B with the example usage using the Transformers library:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "rubenroy/Gilgamesh-72B"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?"
+messages = [
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=2048
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+```
+## Special Thanks
+I would like to thank my fellow team members at [Ovantage Labs](https://huggingface.co/Ovantage) for providing me with H100s to train the model with and would also like to thank the Qwen Team for providing such a powerful base model.