Update README.md (#3)
Browse files- Update README.md (d0d452eb0b473aee0368e5e04b9c9ff250101c57)
Co-authored-by: Chaitanya Singhal <[email protected]>
README.md
CHANGED
@@ -1,3 +1,163 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
+
|
5 |
+
<p align="center" style="font-size:34px;"><b>Buddhi 7B</b></p>
|
6 |
+
|
7 |
+
# Buddhi-7B vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)
|
8 |
+
|
9 |
+
# Model Description
|
10 |
+
|
11 |
+
<!-- Provide a quick summary of what the model is/does. -->
|
12 |
+
|
13 |
+
Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
|
14 |
+
|
15 |
+
## Architecture
|
16 |
+
|
17 |
+
### Hardware requirements:
|
18 |
+
> For 128k Context Length
|
19 |
+
> - 80GB VRAM - A100 Preferred
|
20 |
+
|
21 |
+
> For 32k Context Length
|
22 |
+
> - 40GB VRAM - A100 Preferred
|
23 |
+
|
24 |
+
### vLLM - For Faster Inference
|
25 |
+
|
26 |
+
#### Installation
|
27 |
+
|
28 |
+
```
|
29 |
+
!pip install vllm
|
30 |
+
!pip install flash_attn # If Flash Attention 2 is supported by your System
|
31 |
+
```
|
32 |
+
Please check out [Flash Attention 2](https://github.com/Dao-AILab/flash-attention) Github Repository for more instructions on how to Install it.
|
33 |
+
|
34 |
+
**Implementation**:
|
35 |
+
|
36 |
+
```python
|
37 |
+
from vllm import LLM, SamplingParams
|
38 |
+
|
39 |
+
llm = LLM(
|
40 |
+
model='aiplanet/Buddhi-128K-Chat',
|
41 |
+
gpu_memory_utilization=0.99,
|
42 |
+
max_model_len=131072
|
43 |
+
)
|
44 |
+
|
45 |
+
prompts = [
|
46 |
+
"""<s> [INST] Please tell me a joke. [/INST] """,
|
47 |
+
"""<s> [INST] What is Machine Learning? [/INST] """
|
48 |
+
]
|
49 |
+
|
50 |
+
sampling_params = SamplingParams(
|
51 |
+
temperature=0.8,
|
52 |
+
top_p=0.95,
|
53 |
+
max_tokens=1000
|
54 |
+
)
|
55 |
+
|
56 |
+
outputs = llm.generate(prompts, sampling_params)
|
57 |
+
|
58 |
+
for output in outputs:
|
59 |
+
prompt = output.prompt
|
60 |
+
generated_text = output.outputs[0].text
|
61 |
+
print(generated_text)
|
62 |
+
print("\n\n")
|
63 |
+
```
|
64 |
+
|
65 |
+
### Transformers - Basic Implementation
|
66 |
+
|
67 |
+
```python
|
68 |
+
import torch
|
69 |
+
import transformers
|
70 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
|
71 |
+
|
72 |
+
bnb_config = BitsAndBytesConfig(
|
73 |
+
load_in_4bit=True,
|
74 |
+
bnb_4bit_use_double_quant=True,
|
75 |
+
bnb_4bit_quant_type="nf4",
|
76 |
+
bnb_4bit_compute_dtype=torch.bfloat16
|
77 |
+
)
|
78 |
+
|
79 |
+
model_name = "aiplanet/Buddhi-128K-Chat"
|
80 |
+
|
81 |
+
model = AutoModelForCausalLM.from_pretrained(
|
82 |
+
model_name,
|
83 |
+
quantization_config=bnb_config,
|
84 |
+
device_map="sequential",
|
85 |
+
trust_remote_code=True
|
86 |
+
)
|
87 |
+
|
88 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
89 |
+
model,
|
90 |
+
trust_remote_code=True
|
91 |
+
)
|
92 |
+
|
93 |
+
prompt = "<s> [INST] Please tell me a small joke. [/INST] "
|
94 |
+
|
95 |
+
tokens = tokenizer(prompt, return_tensors="pt").to("cuda")
|
96 |
+
outputs = model.generate(
|
97 |
+
**tokens,
|
98 |
+
max_new_tokens=100,
|
99 |
+
do_sample=True,
|
100 |
+
top_p=0.95,
|
101 |
+
temperature=0.8,
|
102 |
+
)
|
103 |
+
|
104 |
+
decoded_output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
|
105 |
+
print(f"Output:\n{decoded_output[len(prompt):]}")
|
106 |
+
```
|
107 |
+
|
108 |
+
Output
|
109 |
+
|
110 |
+
```
|
111 |
+
Output:
|
112 |
+
Why don't scientists trust atoms?
|
113 |
+
|
114 |
+
Because they make up everything.
|
115 |
+
```
|
116 |
+
|
117 |
+
## Prompt Template for Panda Coder 13B
|
118 |
+
|
119 |
+
In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
|
120 |
+
|
121 |
+
```
|
122 |
+
"<s>[INST] What is your favourite condiment? [/INST]"
|
123 |
+
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
|
124 |
+
"[INST] Do you have mayonnaise recipes? [/INST]"
|
125 |
+
|
126 |
+
```
|
127 |
+
## 🔗 Key Features:
|
128 |
+
|
129 |
+
🎯 Precision and Efficiency: The model is tailored for accuracy, ensuring your code is not just functional but also efficient.
|
130 |
+
|
131 |
+
✨ Unleash Creativity: Whether you're a novice or an expert coder, Panda-Coder is here to support your coding journey, offering creative solutions to your programming challenges.
|
132 |
+
|
133 |
+
📚 Evol Instruct Code: It's built on the robust Evol Instruct Code 80k-v1 dataset, guaranteeing top-notch code generation.
|
134 |
+
|
135 |
+
📢 What's Next?: We believe in continuous improvement and are excited to announce that in our next release, Panda-Coder will be enhanced with a custom dataset. This dataset will not only expand the language support but also include hardware programming languages like MATLAB, Embedded C, and Verilog. 🧰💡
|
136 |
+
|
137 |
+
|
138 |
+
## Get in Touch
|
139 |
+
|
140 |
+
You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)
|
141 |
+
|
142 |
+
Stay tuned for more updates and be a part of the coding evolution. Join us on this exciting journey as we make AI accessible to all at AI Planet!
|
143 |
+
|
144 |
+
|
145 |
+
### Framework versions
|
146 |
+
|
147 |
+
- Transformers 4.39.2
|
148 |
+
- Pytorch 2.2.1+cu121
|
149 |
+
- Datasets 2.18.0
|
150 |
+
- Accelerate 0.27.2
|
151 |
+
- flash_attn 2.5.6
|
152 |
+
|
153 |
+
### Citation
|
154 |
+
|
155 |
+
```
|
156 |
+
@misc {Chaitanya890,
|
157 |
+
author = { {Chaitanya Singhal} },
|
158 |
+
title = { Buddhi-128k-Chat by AI Planet},
|
159 |
+
year = 2024,
|
160 |
+
url = { https://huggingface.co/aiplanet//Buddhi-128K-Chat },
|
161 |
+
publisher = { Hugging Face }
|
162 |
+
}
|
163 |
+
```
|