In [1]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-13b-hf')

Loading the tokenizer from the `special_tokens_map.json` and the `added_tokens.json` will be removed in `transformers 5`, it is kept for forward compatibility, but it is recommended to update your `tokenizer_config.json` by uploading it again. You will see the new `added_tokens_decoder` attribute that will store the relevant information.


In [2]:
nf4_config = BitsAndBytesConfig(
 load_in_4bit=True,
 bnb_4bit_quant_type='nf4',
 bnb_4bit_use_double_quant=True,
 bnb_4bit_compute_dtype=torch.bfloat16
)

In [3]:
base_model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf', quantization_config=nf4_config)

Downloading (…)lve/main/config.json: 0%| | 0.00/614 [00:00, ?B/s]

[2023-09-28 13:30:57,403] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)


2023-09-28 13:30:59.987982: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Downloading (…)fetensors.index.json: 0%| | 0.00/26.8k [00:00, ?B/s]

Downloading shards: 0%| | 0/2 [00:00, ?it/s]

Downloading (…)of-00002.safetensors: 0%| | 0.00/9.98G [00:00, ?B/s]

Downloading (…)of-00002.safetensors: 0%| | 0.00/3.50G [00:00, ?B/s]

Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]

Downloading (…)neration_config.json: 0%| | 0.00/188 [00:00, ?B/s]

In [4]:
fpf_model = AutoModelForCausalLM.from_pretrained('mesolitica/malaysian-llama2-7b-32k-instructions', quantization_config=nf4_config)

Downloading (…)lve/main/config.json: 0%| | 0.00/610 [00:00, ?B/s]

Downloading (…)fetensors.index.json: 0%| | 0.00/23.9k [00:00, ?B/s]

Downloading shards: 0%| | 0/2 [00:00, ?it/s]

Downloading (…)of-00002.safetensors: 0%| | 0.00/9.98G [00:00, ?B/s]

Downloading (…)of-00002.safetensors: 0%| | 0.00/3.50G [00:00, ?B/s]

Loading checkpoint shards: 0%| | 0/2 [00:00, ?it/s]

Downloading (…)neration_config.json: 0%| | 0.00/183 [00:00, ?B/s]

In [5]:
def parse_llama_chat(
 messages,
 keys={'role', 'content'},
 roles={'system', 'user', 'assistant'},
):

 system = messages[0]['content']
 user_query = messages[-1]['content']

 users, assistants = [], []
 for q in messages[1:-1]:
 if q['role'] == 'user':
 users.append(q['content'])
 elif q['role'] == 'assistant':
 assistants.append(q['content'])

 if len(users) != len(assistants):
 raise HTTPException(
 status_code=400,
 detail='model only support `system, `user` and `assistant` roles, starting with `system`, then `user` and alternating (u/a/u/a/u...)',
 )

 texts = [f'[INST] <>\n{system}\n<>\n\n']
 for u, a in zip(users, assistants):
 texts.append(f'{u.strip()} [/INST] {a.strip()} [INST] ')
 texts.append(f'{user_query.strip()} [/INST]')
 prompt = ''.join(texts).strip()
 return prompt

In [14]:
import time
from tqdm import tqdm

kwargs = {
 'temperature': 0.9, 
 'max_new_tokens': 256, 
 'top_p': 0.95, 
 'repetition_penalty': 1.0, 
 'do_sample': True,
 'num_beams': 1,
}

In [10]:
messages = [
 {'role': 'system', 'content': 'Anda adalah pembantu AI yang berguna dan mampu jawab segala soalan yang diberikan. Jawapan yang diberikan haruslah panjang dan tepat.'},
 {'role': 'user', 'content': 'kwsp tu apa'}
]
prompt = parse_llama_chat(messages)
prompt

'[INST] <>\nAnda adalah pembantu AI yang berguna dan mampu jawab segala soalan yang diberikan. Jawapan yang diberikan haruslah panjang dan tepat.\n<>\n\nkwsp tu apa [/INST]'

In [34]:
inputs = tokenizer([prompt], return_tensors='pt').to('cuda')
generate_kwargs = dict(inputs)
generate_kwargs = {**generate_kwargs, **kwargs}

## Chat 7B

In [17]:
o = base_model.generate(**generate_kwargs)

In [26]:
print(tokenizer.decode(o[0], skip_special_tokens = True).split('[/INST]')[-1].strip())

As an AI language model, I can assist with a wide range of questions and tasks. Here are some examples of what I can help with:
1. Answering questions: I can provide detailed and accurate answers to various questions on a variety of topics, including science, history, technology, and more.
2. Generating text: I can produce text on a given topic or subject, including articles, stories, and other types of writing.
3. Translation: I can translate text from one language to another, including popular languages such as Spanish, French, German, Chinese, and many more.
4. Summarization: I can summarize long pieces of text, such as articles or documents, into shorter, more digestible versions.
5. Creative writing: I can generate creative content, such as poetry or short stories, based on prompts or topics provided.
6. Conversation: I can engage in natural-sounding conversations, using context and understanding to respond to questions and statements.
7. Language understanding: I can understand a

## Malaysian Llama2 7B 32k

In [27]:
o = fpf_model.generate(**generate_kwargs)

In [28]:
print(tokenizer.decode(o[0], skip_special_tokens = True).split('[/INST]')[-1].strip())

Kumpulan Wang Simpanan Pekerja (KWSP) ialah organisasi yang ditubuhkan di Malaysia untuk membantu rakyatnya menyimpan wang untuk persaraan. KWSP menawarkan beberapa pelan untuk memenuhi keperluan penyimpan berbeza, dengan pelaburan dalam dana, ekuiti, instrumen bon dan lain-lain. Keuntungan dan dividen daripada pelaburan tersebut digunakan untuk melindungi kepentingan simpanan persaraan penyimpan. KWSP juga menyediakan akses mudah kepada penyimpan untuk menyemak penyata dan akaun, serta untuk memindahkan dana ke pelaburan baharu atau membuat pengeluaran simpanan persaraan. Ini membolehkan penyimpan memperoleh pulangan yang dijangka di atas inflasi dan


In [40]:
messages = [
 {'role': 'system', 'content': 'Anda adalah pembantu AI yang berguna dan mampu jawab segala soalan yang diberikan. Jawapan yang diberikan haruslah panjang dan tepat.'},
 {'role': 'user', 'content': 'awat malaysia ada lembaga koko, malaysia bukan buat keluaq koko banyak pun'}
]
prompt = parse_llama_chat(messages)
prompt

'[INST] <>\nAnda adalah pembantu AI yang berguna dan mampu jawab segala soalan yang diberikan. Jawapan yang diberikan haruslah panjang dan tepat.\n<>\n\nawat malaysia ada lembaga koko, malaysia bukan buat keluaq koko banyak pun [/INST]'

In [41]:
inputs = tokenizer([prompt], return_tensors='pt').to('cuda')
generate_kwargs = dict(inputs)
generate_kwargs = {**generate_kwargs, **kwargs}

## Chat 7B

In [42]:
o = base_model.generate(**generate_kwargs)

In [43]:
print(tokenizer.decode(o[0], skip_special_tokens = True).split('[/INST]')[-1].strip())

I apologize, but the statement "Malaysia ada lembaga koko" is not accurate. Malaysia does not have a chocolate industry. While Malaysia is known for its rich cultural heritage and diverse food scene, chocolate is not a significant part of the country's culinary traditions.
However, there are some chocolate brands in Malaysia that produce high-quality chocolate products, such as dark chocolate, milk chocolate, and white chocolate. These brands often source their cocoa beans from sustainable farms in West Africa, South America, and other parts of Southeast Asia.
Some popular chocolate brands in Malaysia include:
1. Cacao & Co.: This brand offers a range of premium chocolate products, including dark chocolate bars, truffles, and gift boxes.
2. The Chocolate Room: This boutique chocolate shop in Kuala Lumpur offers a wide range of artisanal chocolate products, including handcrafted truffles, chocolate bars, and gift boxes.
3. Chocolate Dreams: This


## Malaysian Llama2 7B 32k

In [44]:
o = fpf_model.generate(**generate_kwargs)

In [45]:
print(tokenizer.decode(o[0], skip_special_tokens = True).split('[/INST]')[-1].strip())

Lembaga Koko Malaysia ialah agensi kerajaan yang ditubuhkan untuk mengawal selia industri koko Malaysia. Lembaga Koko ditubuhkan pada tahun 1988 sebagai sebahagian daripada usaha untuk meningkatkan lagi industri koko Malaysia. Tujuan utama lembaga adalah untuk mempromosikan dan membangunkan industri koko Malaysia, termasuk menggalakkan penanaman dan pengeluaran koko, meningkatkan perdagangan koko dan produk koko dan menyediakan sokongan teknikal dan latihan untuk industri koko.

Walaupun Malaysia tidak menanam banyak koko, ia masih merupakan salah satu negara pengeluar koko utama di dunia, dan pengeksport utama koko di rantau Asia. Koko menyumbang kira-kira 2.8% daripada jum
