metadata

license: apache-2.0
tags:
  - merge
  - mergekit
  - lazymergekit
  - nvidia/Llama3-ChatQA-1.5-8B
  - shenzhi-wang/Llama3-8B-Chinese-Chat

Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge

Llama3-ChatQA-1.5-8B-Llama3-8B-Chinese-Chat-linear-merge is a merge of the following models using mergekit:

🧩 Merge Configuration

models:
  - model: nvidia/Llama3-ChatQA-1.5-8B
    parameters:
      weight: 0.5
  - model: shenzhi-wang/Llama3-8B-Chinese-Chat
    parameters:
      weight: 0.5
merge_method: linear
parameters:
  normalize: true
dtype: float16

Model Details

The Llama3-ChatQA-1.5 model excels in conversational question answering (QA) and retrieval-augmented generation (RAG). It is built on an improved training recipe from the ChatQA paper and incorporates extensive conversational QA data to enhance its capabilities in tabular and arithmetic calculations. The model is designed to provide detailed and contextually relevant responses, making it suitable for a variety of applications.

On the other hand, Llama3-8B-Chinese-Chat is specifically fine-tuned for Chinese and English users, showcasing remarkable performance in roleplaying, function calling, and math capabilities. It has been trained on a mixed dataset of approximately 100K preference pairs, significantly improving its ability to handle bilingual interactions.

Use Cases

Conversational AI: Engage users in natural dialogues, providing informative and context-aware responses.
Question Answering: Answer user queries accurately, leveraging the strengths of both English and Chinese language processing.
Multilingual Support: Cater to users who communicate in both English and Chinese, enhancing accessibility and user experience.
Educational Tools: Assist in learning and understanding complex topics through interactive Q&A sessions.

Model Features

This merged model combines the robust generative capabilities of Llama3-ChatQA-1.5 with the refined tuning of Llama3-8B-Chinese-Chat. It offers:

Enhanced context understanding for both English and Chinese queries.
Improved performance in conversational QA tasks.
Versatile text generation capabilities across different languages.

Evaluation Results

The evaluation results of the parent models indicate strong performance in various benchmarks. For instance, Llama3-ChatQA-1.5 achieved notable scores in the ChatRAG Bench, demonstrating its effectiveness in conversational QA tasks. Meanwhile, Llama3-8B-Chinese-Chat has shown superior performance in Chinese language tasks, surpassing ChatGPT and matching GPT-4 in certain evaluations.

Benchmark	Llama3-ChatQA-1.5-8B	Llama3-8B-Chinese-Chat
Doc2Dial	41.26	N/A
QuAC	38.82	N/A
CoQA	78.44	N/A
Average	58.25	N/A

Limitations

While the merged model benefits from the strengths of both parent models, it may also inherit some limitations. For instance, biases present in the training data of either model could affect the responses generated. Additionally, the model may struggle with highly specialized or niche topics that were not well-represented in the training datasets. Users should be aware of these potential biases and limitations when deploying the model in real-world applications.