lucifertrj Chaitanya890 commited on
Commit
ca09642
·
verified ·
1 Parent(s): 866e00c

Update README.md (#3)

Browse files

- Update README.md (d0d452eb0b473aee0368e5e04b9c9ff250101c57)


Co-authored-by: Chaitanya Singhal <[email protected]>

Files changed (1) hide show
  1. README.md +160 -0
README.md CHANGED
@@ -1,3 +1,163 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ <p align="center" style="font-size:34px;"><b>Buddhi 7B</b></p>
6
+
7
+ # Buddhi-7B vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)
8
+
9
+ # Model Description
10
+
11
+ <!-- Provide a quick summary of what the model is/does. -->
12
+
13
+ Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
14
+
15
+ ## Architecture
16
+
17
+ ### Hardware requirements:
18
+ > For 128k Context Length
19
+ > - 80GB VRAM - A100 Preferred
20
+
21
+ > For 32k Context Length
22
+ > - 40GB VRAM - A100 Preferred
23
+
24
+ ### vLLM - For Faster Inference
25
+
26
+ #### Installation
27
+
28
+ ```
29
+ !pip install vllm
30
+ !pip install flash_attn # If Flash Attention 2 is supported by your System
31
+ ```
32
+ Please check out [Flash Attention 2](https://github.com/Dao-AILab/flash-attention) Github Repository for more instructions on how to Install it.
33
+
34
+ **Implementation**:
35
+
36
+ ```python
37
+ from vllm import LLM, SamplingParams
38
+
39
+ llm = LLM(
40
+ model='aiplanet/Buddhi-128K-Chat',
41
+ gpu_memory_utilization=0.99,
42
+ max_model_len=131072
43
+ )
44
+
45
+ prompts = [
46
+ """<s> [INST] Please tell me a joke. [/INST] """,
47
+ """<s> [INST] What is Machine Learning? [/INST] """
48
+ ]
49
+
50
+ sampling_params = SamplingParams(
51
+ temperature=0.8,
52
+ top_p=0.95,
53
+ max_tokens=1000
54
+ )
55
+
56
+ outputs = llm.generate(prompts, sampling_params)
57
+
58
+ for output in outputs:
59
+ prompt = output.prompt
60
+ generated_text = output.outputs[0].text
61
+ print(generated_text)
62
+ print("\n\n")
63
+ ```
64
+
65
+ ### Transformers - Basic Implementation
66
+
67
+ ```python
68
+ import torch
69
+ import transformers
70
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
71
+
72
+ bnb_config = BitsAndBytesConfig(
73
+ load_in_4bit=True,
74
+ bnb_4bit_use_double_quant=True,
75
+ bnb_4bit_quant_type="nf4",
76
+ bnb_4bit_compute_dtype=torch.bfloat16
77
+ )
78
+
79
+ model_name = "aiplanet/Buddhi-128K-Chat"
80
+
81
+ model = AutoModelForCausalLM.from_pretrained(
82
+ model_name,
83
+ quantization_config=bnb_config,
84
+ device_map="sequential",
85
+ trust_remote_code=True
86
+ )
87
+
88
+ tokenizer = AutoTokenizer.from_pretrained(
89
+ model,
90
+ trust_remote_code=True
91
+ )
92
+
93
+ prompt = "<s> [INST] Please tell me a small joke. [/INST] "
94
+
95
+ tokens = tokenizer(prompt, return_tensors="pt").to("cuda")
96
+ outputs = model.generate(
97
+ **tokens,
98
+ max_new_tokens=100,
99
+ do_sample=True,
100
+ top_p=0.95,
101
+ temperature=0.8,
102
+ )
103
+
104
+ decoded_output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
105
+ print(f"Output:\n{decoded_output[len(prompt):]}")
106
+ ```
107
+
108
+ Output
109
+
110
+ ```
111
+ Output:
112
+ Why don't scientists trust atoms?
113
+
114
+ Because they make up everything.
115
+ ```
116
+
117
+ ## Prompt Template for Panda Coder 13B
118
+
119
+ In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
120
+
121
+ ```
122
+ "<s>[INST] What is your favourite condiment? [/INST]"
123
+ "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
124
+ "[INST] Do you have mayonnaise recipes? [/INST]"
125
+
126
+ ```
127
+ ## 🔗 Key Features:
128
+
129
+ 🎯 Precision and Efficiency: The model is tailored for accuracy, ensuring your code is not just functional but also efficient.
130
+
131
+ ✨ Unleash Creativity: Whether you're a novice or an expert coder, Panda-Coder is here to support your coding journey, offering creative solutions to your programming challenges.
132
+
133
+ 📚 Evol Instruct Code: It's built on the robust Evol Instruct Code 80k-v1 dataset, guaranteeing top-notch code generation.
134
+
135
+ 📢 What's Next?: We believe in continuous improvement and are excited to announce that in our next release, Panda-Coder will be enhanced with a custom dataset. This dataset will not only expand the language support but also include hardware programming languages like MATLAB, Embedded C, and Verilog. 🧰💡
136
+
137
+
138
+ ## Get in Touch
139
+
140
+ You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)
141
+
142
+ Stay tuned for more updates and be a part of the coding evolution. Join us on this exciting journey as we make AI accessible to all at AI Planet!
143
+
144
+
145
+ ### Framework versions
146
+
147
+ - Transformers 4.39.2
148
+ - Pytorch 2.2.1+cu121
149
+ - Datasets 2.18.0
150
+ - Accelerate 0.27.2
151
+ - flash_attn 2.5.6
152
+
153
+ ### Citation
154
+
155
+ ```
156
+ @misc {Chaitanya890,
157
+ author = { {Chaitanya Singhal} },
158
+ title = { Buddhi-128k-Chat by AI Planet},
159
+ year = 2024,
160
+ url = { https://huggingface.co/aiplanet//Buddhi-128K-Chat },
161
+ publisher = { Hugging Face }
162
+ }
163
+ ```