rubenroy commited on
Commit
ebdf7a5
·
verified ·
1 Parent(s): dc364ff

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -5
README.md CHANGED
@@ -1,5 +1,7 @@
1
  ---
2
  license: other
 
 
3
  datasets:
4
  - rubenroy/GammaCorpus-v2-5m
5
  - rubenroy/GammaCorpus-CoT-Math-170k
@@ -19,15 +21,80 @@ tags:
19
 
20
  # Gilgamesh 72B
21
 
22
- The Gilgamesh 72B model was fine-tuned off of Qwen 2.5 72B Instruct. Built with Qwen.
 
 
 
 
23
 
24
  ![GIlgamesh AI Art](https://cdn.ruben-roy.com/AI/Gilgamesh/img/art.png)
25
 
26
  ## Model Details
27
-
28
- ### Model Description
29
-
30
  - **Developed by:** [Ruben Roy](https://huggingface.co/rubenroy)
31
  - **Funded by:** [The Ovantage Society](https://huggingface.co/Ovantage)
32
  - **License:** Qwen
33
- - **Finetuned from model:** [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ license_name: qwen
4
+ license_link: https://huggingface.co/Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
5
  datasets:
6
  - rubenroy/GammaCorpus-v2-5m
7
  - rubenroy/GammaCorpus-CoT-Math-170k
 
21
 
22
  # Gilgamesh 72B
23
 
24
+ > [!NOTE]
25
+ > Built on Qwen 72B Instruct
26
+
27
+ ## Overview
28
+ Gilgamesh (GGM) 72B is a heavy fine-tune of Alibaba's **Qwen 2.5 72B Instruct** model.
29
 
30
  ![GIlgamesh AI Art](https://cdn.ruben-roy.com/AI/Gilgamesh/img/art.png)
31
 
32
  ## Model Details
 
 
 
33
  - **Developed by:** [Ruben Roy](https://huggingface.co/rubenroy)
34
  - **Funded by:** [The Ovantage Society](https://huggingface.co/Ovantage)
35
  - **License:** Qwen
36
+ - **Base Model:** [Qwen/Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
37
+ - **Type:** Causal Language Models
38
+ - **Architecture:** transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
39
+ - **Number of Parameters:** 72.7B
40
+ - **Number of Paramaters (Non-Embedding):** 70.0B
41
+ - **Number of Layers:** 80
42
+ - **Number of Attention Heads (GQA):** 64 for Q and 8 for KV
43
+
44
+ > [!IMPORTANT]
45
+ > Qwen is licensed under the Qwen LICENSE AGREEMENT, Copyright (c) Alibaba Cloud. All Rights Reserved.
46
+
47
+ ## Datasets used
48
+
49
+ Gilgamesh 72B was trained on a mixture of specialised datasets designed for factual accuracy, mathematical capabilities and reasoning. The datasets used include:
50
+
51
+ - **[GammaCorpus-v2-5m](https://huggingface.co/datasets/rubenroy/GammaCorpus-v2-5m)**: A large 5 million line general-purpose dataset covering many topics to enhance broad knowledge and conversational abilities.
52
+ - **[GammaCorpus-CoT-Math-170k](https://huggingface.co/datasets/rubenroy/GammaCorpus-CoT-Math-170k)**: A dataset focused on Chain-of-Thought (CoT) reasoning in mathematics, helping the model improve step-by-step problem-solving. Its also important to note that some models using this dataset dataset may experience a minor increase in coding performance!
53
+ - **[GammaCorpus-Fact-QA-450k](https://huggingface.co/datasets/rubenroy/GammaCorpus-Fact-QA-450k)**: A dataset containing factual question-answer pairs for enforcing some important current knowledge.
54
+
55
+ These datasets were all built and curated by me, however I thank my other team members at [Ovantage Labs](https://huggingface.co/Ovantage) for assisting me in the creation and curation of these datasets.
56
+
57
+ ## Usage
58
+
59
+ You can test out Gilgamesh 72B with the example usage using the Transformers library:
60
+
61
+ ```python
62
+ from transformers import AutoModelForCausalLM, AutoTokenizer
63
+
64
+ model_name = "rubenroy/Gilgamesh-72B"
65
+
66
+ model = AutoModelForCausalLM.from_pretrained(
67
+ model_name,
68
+ torch_dtype="auto",
69
+ device_map="auto"
70
+ )
71
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
72
+
73
+ prompt = "What are some largely unsolved questions in philosophy that still affect our lives today?"
74
+
75
+ messages = [
76
+ {"role": "user", "content": prompt}
77
+ ]
78
+
79
+ text = tokenizer.apply_chat_template(
80
+ messages,
81
+ tokenize=False,
82
+ add_generation_prompt=True
83
+ )
84
+
85
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
86
+
87
+ generated_ids = model.generate(
88
+ **model_inputs,
89
+ max_new_tokens=2048
90
+ )
91
+
92
+ generated_ids = [
93
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
94
+ ]
95
+
96
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
97
+ ```
98
+
99
+ ## Special Thanks
100
+ I would like to thank my fellow team members at [Ovantage Labs](https://huggingface.co/Ovantage) for providing me with H100s to train the model with and would also like to thank the Qwen Team for providing such a powerful base model.