metadata

datasets:
  - yentinglin/zh_TW_c4
  - yentinglin/traditional_chinese_instructions
inference: false
license: llama2
language:
  - zh
model_creator: Yen-Ting Lin
model_link: https://huggingface.co/yentinglin/Taiwan-LLaMa-v1.0
model_name: Language Models for Taiwanese Culture 1.0
model_type: llama
quantized_by: weiren119

Taiwan-LLaMa-v1.0 - GGML

Model creator: Yen-Ting Lin
Original model: Language Models for Taiwanese Culture v1.0

Description

This repo contains GGML format model files for Yen-Ting Lin's Language Models for Taiwanese Culture v1.0.

They are known to work with:

llama.cpp, commit e76d630 and later.

...and probably work with these too, but I have not tested perssonally:

text-generation-webui.
KoboldCpp, version 1.37 and later.
llama-cpp-python, version 0.1.77 and later.

Repositories available

Original model card: Yen-Ting Lin's Language Models for Taiwanese Culture v1.0

Language Models for Taiwanese Culture

✍️ Online Demo • 🤗 HF Repo • 🐦 Twitter • 📃 [Paper Coming Soon] • 👨️ Yen-Ting Lin

Overview

Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for Traditional Chinese applications.

Taiwan-LLaMa v1.0 pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional chinese.

Demo

A live demonstration of the model can be accessed at Hugging Face Spaces.

Key Features

Traditional Chinese Support: The model is fine-tuned to understand and generate text in Traditional Chinese, making it suitable for Taiwanese culture and related applications.
Instruction-Tuned: Further fine-tuned on conversational data to offer context-aware and instruction-following responses.
Performance on Vicuna Benchmark: Taiwan-LLaMa's relative performance on Vicuna Benchmark is measured against models like GPT-4 and ChatGPT. It's particularly optimized for Taiwanese culture.
Flexible Customization: Advanced options for controlling the model's behavior like system prompt, temperature, top-p, and top-k are available in the demo.

Work in progress

Improved pretraining: A refined pretraining process (e.g. more data from Taiwan, training strategies) is under development, aiming to enhance model performance for better Taiwanese culture.
Extend max length: Utilizing the Rope mechanism as described in the paper, the model's length will be extended from 4k to 8k.

Taiwanese Culture Examples

Question	ChatGPT	Taiwan LLaMa v1.0	Taiwan LLaMa v0.0 (w/o pretraining)	Llama2_Traditional_Chinese_13b_Chat
以惡地景觀著名的是高雄哪一個景點?	以惡地地景著名的是高雄市的旗津。	高雄以擁有惡地景觀著名的景點是田寮月世界。✅	以惡地景觀著名的是高雄市的旗津島。旗津島是一個小島，...	高雄市內有多個以惡地景觀著名的景點，其中以下面幾個最為著名： 1. 高雄捷運紅線沿途景點：高雄市政府為了推廣高雄
唐鳳是誰？	唐鳳（Audrey Tang）是台灣的政治家、社會活動家和軟體工程師。她於2016年擔任台灣行政院數位政務委員，是台灣政府歷史上首位沒有政黨背景的政務官。唐鳳在軟體工程領域有豐富的經驗。 ✅	唐鳳是政治評論家、資訊社會學研究者，也是台灣第一位擔任政府數位政委的人。 ✅	唐鳳（Tang Feng）是一位著名的中國演員和歌手。他出生於1974年12月17日，出生地點在中國北京。唐鳳在1995年開始他的演藝生涯，並在同年發行了他的首張專輯《我不孤單》。此後，他發行了許多受歡迎的歌曲，並在多部電視劇和電影中擔任主角。唐鳳因其精湛的演技和迷人的歌聲而被譽為“亞洲魅力男王”。他的音樂獲得了多個獎項，包括“華表最佳新男演唱獎”、“華表最佳歌曲獎”和“華表最佳舞蹈獎”等。	唐鳳是唐朝時期的一位著名文學家，他的代表作品包括《陰騭》、《狂凡子》等。他的文學風格具有強烈的個性和創意，被譽為「唐代文學之父」。
NTU 在哪?	NTU位於新加坡，具體地址是50 Nanyang Avenue, Singapore 639798。	NTU 是國立臺灣大學的縮寫。它是臺灣的一所頂尖研究型大學，位於臺北市。 ✅	NTU 是台灣的一所大學，位於台北市。✅	NTU 的總部位於台灣新北市板橋區，佔地面積約為1,000公頃。
甚麼是22K?	"22K" 可能指的是金的純度，尤其是在珠寶和金飾方面。	"22K"是指台灣的大學畢業生的起薪，大約是新台幣22,000元。這個詞被用來形容大學畢業生的起薪偏低，並且在就業市場上對於大學畢業生的需求不足。✅	22K是指溫度尺度上的22,000K（千度）。1K代表一度，即溫度差距，是攝氏溫標（攝氏度）的一百萬分之一。因此，22K等於-253.15°C（攝氏溫標）或-423.67°F（華氏溫標）。	22K是一種簡寫，即22,000。它是一個數字，表示22,000個單位。
肉圓要不要加香菜？	這是個個人口味的問題，有些人喜歡在肉圓中加入香菜，因為香菜的清香可以增添風味；而有些人則不喜歡香菜的味道，覺得它會蓋過其他食材的味道。所以要不要加香菜，可以根據個人喜好來決定。	這是一個個人喜好的問題，有些人喜歡在肉圓裡加香菜，有些人不喜歡。在台灣，肉圓加香菜是一種常見的做法，這是一種在肉圓上加香菜的做法。然而，在其他國這是一個文化差異的問題。😊	肉圓加香菜可以增添一些芳香和美味的口感，如果你喜歡香菜的味道，可以試試加入肉圓中。香菜通常在製作過程中炒香，可以增添一些芳香和美味的味道。如果你想要加入香菜，可以在製作過程中炒香一些香菜，然後將其混入肉圓中。肉圓加香菜可以增添一些芳香和美味的口感，如果你喜歡香菜的味道，可以試試加入肉圓中。	加香菜的肉圓味道更好，但是不加香菜的肉圓也很好吃。如果你喜歡香菜的味道，那麼加上香菜就好了。如果你不喜歡香菜，那麼不加也不會影響肉

Model

We provide a number of model checkpoints that we trained. Please find them on Hugging Face here. Here are some quick links to the checkpoints that are finetuned from LLaMa 2:

Model	13B
Taiwan-LLaMa v1.0 (better for Taiwanese Culture)	🤗 yentinglin/Taiwan-LLaMa-v1.0
Taiwan-LLaMa v0.9 (partial instruction set)	🤗 yentinglin/Taiwan-LLaMa-v0.9
Taiwan-LLaMa v0.0 (no Traditional Chinese pretraining)	🤗 yentinglin/Taiwan-LLaMa-v0.0

Data

Here are some quick links to the datasets that we used to train the models:

Dataset	Link
Instruction-tuning	🤗 yentinglin/traditional_chinese_instructions
Traditional Chinese Pretraining	🤗 yentinglin/zh_TW_c4

Architecture

Taiwan-LLaMa is based on LLaMa 2, leveraging transformer architecture, flash attention 2, and bfloat16.

It includes:

Pretraining Phase: Pretrained on a vast corpus of over 5 billion tokens, extracted from common crawl in Traditional Chinese.
Fine-tuning Phase: Further instruction-tuned on over 490k multi-turn conversational data to enable more instruction-following and context-aware responses.

Generic Capabilities on Vicuna Benchmark

The data is translated into traditional Chinese for evaluating the general capability.

The scores are calculated with ChatGPT as the baseline, represented as 100%. The other values show the relative performance of different models compared to ChatGPT.

Language Model	Relative Score (%)
GPT-4	102.59%
ChatGPT	100.00%
Taiwan-LLaMa v1.0	76.76%
Claude-Instant-1.2	74.04%
Llama2_Traditional_Chinese_13b_Chat	56.21%

How to deploy the model on my own machine?

We recommend hosting models with 🤗 Text Generation Inference. Please see their license for details on usage and limitations.

bash run_text_generation_inference.sh "yentinglin/Taiwan-LLaMa" NUM_GPUS DIR_TO_SAVE_MODEL PORT MAX_INPUT_LEN MODEL_MAX_LEN

Prompt format follows vicuna-v1.1 template:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: {user} ASSISTANT:

Setup development environment

conda create -n taiwan-llama python=3.10 -y 
conda activate taiwan-llama
pip install -r requirements.txt

Citations

If you use our code, data, or models in your research, please cite this repository. You can use the following BibTeX entry:

@inproceedings{lin-chen-2023-llm,
    title = "{LLM}-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models",
    author = "Lin, Yen-Ting  and Chen, Yun-Nung",
    booktitle = "Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.nlp4convai-1.5",
    pages = "47--58"
}

@misc{taiwanllama,
    author={Lin, Yen-Ting and Chen, Yun-Nung},
    title={Taiwanese-Aligned Language Models based on Meta-Llama2},
    year={2023},
    url={https://github.com/adamlin120/Taiwan-LLaMa},
    note={Code and models available at https://github.com/adamlin120/Taiwan-LLaMa},
}

Collaborate With Us

If you are interested in contributing to the development of Traditional Chinese language models, exploring new applications, or leveraging Taiwan-LLaMa for your specific needs, please don't hesitate to contact us. We welcome collaborations from academia, industry, and individual contributors.

License

The code in this project is licensed under the Apache 2.0 License - see the LICENSE file for details.

The models included in this project are licensed under the LLAMA 2 Community License. See the LLAMA2 License for full details.

OpenAI Data Acknowledgment

The data included in this project were generated using OpenAI's models and are subject to OpenAI's Terms of Use. Please review OpenAI's Terms of Use for details on usage and limitations.

Acknowledgements

We thank Meta LLaMA team and Vicuna team for their open-source efforts in democratizing large language models.

Intro

The 4bits-GQTQ model was converted from Taiwan-LLaMa-v1.0 13b by the package auto-gptq

How to use gptq model pyhton code

Install gptq package: pip install auto-gptq
Here is the example code

from transformers import AutoTokenizer,TextStreamer,TextIteratorStreamer
from auto_gptq import AutoGPTQForCausalLM


class TaiwanLLaMaGPTQ:
    def __init__(self, model_dir):
        self.tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True)
        self.model = AutoGPTQForCausalLM.from_quantized(model_dir,
            trust_remote_code=True,
            use_safetensors=True,
            device_map="auto",
            use_triton=False,
            strict=False)
        self.chat_history = []
        self.system_prompt = """You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

            If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."""

        self.streamer = TextStreamer(self.tokenizer, skip_prompt=True, skip_special_tokens=True)
        self.thread_streamer = TextIteratorStreamer(self.tokenizer, skip_special_tokens=True)
    def get_prompt(self, message: str, chat_history: list[tuple[str, str]]) -> str:
        texts = [f'[INST] <<SYS>>\n{self.system_prompt}\n<</SYS>>\n\n']
        for user_input, response in chat_history:
            texts.append(f'{user_input.strip()} [/INST] {response.strip()} </s><s> [INST] ')
        texts.append(f'{message.strip()} [/INST]')
        return ''.join(texts)

    def generate(self, message: str):
        prompt = self.get_prompt(message, self.chat_history)
        tokens = self.tokenizer(prompt, return_tensors='pt').input_ids
        generate_ids = self.model.generate(input_ids=tokens.cuda(), max_new_tokens=4096, streamer=self.streamer)
        output = self.tokenizer.decode(generate_ids[0, len(tokens[0]):-1]).strip()
        self.chat_history.append([message, output])
        return output
    
    def thread_generate(self, message:str):
        from threading import Thread
        prompt = self.get_prompt(message, self.chat_history)
        inputs = self.tokenizer(prompt, return_tensors="pt")

        generation_kwargs = dict(
            inputs=inputs.input_ids.cuda(),
            attention_mask=inputs.attention_mask,
            temperature=0.1,
            max_new_tokens=1024,
            streamer=self.thread_streamer,
        )

        # Run generation on separate thread to enable response streaming.
        thread = Thread(target=self.model.generate, kwargs=generation_kwargs)
        thread.start()
        for new_text in self.thread_streamer:
            yield new_text

        thread.join()

inferencer = TaiwanLLaMaGPTQ("weiren119/Taiwan-LLaMa-v1.0-4bits-GPTQ")


s = ''
while True:
    s = input("User: ")
    if s != '':
        print ('Answer:')
        print (inferencer.generate(s))
        print ('-'*80)