File size: 2,330 Bytes

ebd4f51
42b539d
 
 
 
ebd4f51
 
42b539d
ebd4f51
42b539d
ebd4f51
 
 
 
42b539d
 
 
ebd4f51
 
 
42b539d
ebd4f51
42b539d
 
 
 
 
ebd4f51
 
42b539d
 
 
 
 
ebd4f51
42b539d
 
ebd4f51
42b539d
ebd4f51
42b539d
ebd4f51
42b539d
 
 
 
 
 
 
 
 
 
ebd4f51
42b539d
 
ebd4f51
42b539d
 
 
 
 
 
ebd4f51
42b539d
 
 
 
ebd4f51
 
 
42b539d
ebd4f51
42b539d
 
 
 
 
 
 
 
 
 
 
ebd4f51
 
42b539d

---
base_model: meta-llama/Meta-Llama-3-8B-Instruct
library_name: peft
datasets:
- Garsa3112/ChineseEnglishTranslationDataset
---

# MISHANM/Chinese_eng_text_generation_Llama3_8B_instruct

This model is fine-tuned for the Chinese language, capable of answering queries and translating text Between English and Chinese . It leverages advanced natural language processing techniques to provide accurate and context-aware responses.



## Model Details
1. Language: Chinese
2. Tasks: Question Answering , Translation (Chinese to English)
3. Base Model: meta-llama/Meta-Llama-3-8B-Instruct



# Training Details

The model is trained on approx 678,099 instruction samples.
1. GPUs: 4*AMD Radeon™ PRO V620 
2. Training Time: 263:16:37 hours
  
   


 ## Inference with HuggingFace
 ```python3
 
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the fine-tuned model and tokenizer
model_path = "MISHANM/Chinese_eng_text_generation_Llama3_8B_instruct"

model = AutoModelForCausalLM.from_pretrained(model_path,device_map="auto")

tokenizer = AutoTokenizer.from_pretrained(model_path)

# Function to generate text
def generate_text(prompt, max_length=1000, temperature=0.9):
    # Format the prompt according to the chat template
    messages = [
        {
            "role": "system",
            "content": "You are a Chinese language expert and linguist, with same knowledge give response in Chinese language.",
        },
        {"role": "user", "content": prompt}
    ]

    # Apply the chat template
    formatted_prompt = f"<|system|>{messages[0]['content']}<|user|>{messages[1]['content']}<|assistant|>"

    # Tokenize and generate output
    inputs = tokenizer(formatted_prompt, return_tensors="pt")
    output = model.generate(  
        **inputs, max_new_tokens=max_length, temperature=temperature, do_sample=True
    )
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Example usage
prompt = """LLM là gì ."""
translated_text = generate_text(prompt)
print(translated_text)



```

## Citation Information
```
@misc{MISHANM/Chinese_eng_text_generation_Llama3_8B_instruct,
  author = {Mishan Maurya},
  title = {Introducing Fine Tuned LLM for Chinese Language},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  
}
```


- PEFT 0.12.0