--- license: llama3 tags: - axolotl - dpo - trl base_model: meta-llama/Meta-Llama-3-8B-Instruct datasets: - HumanLLMs/Human-Like-DPO-Dataset model-index: - name: Humanish-LLama3.1-8B-Instruct results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 64.98 name: strict accuracy source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 28.01 name: normalized accuracy source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 8.46 name: exact match source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 0.78 name: acc_norm source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 2 name: acc_norm source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 30.02 name: accuracy source: url: >- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=HumanLLMs/Humanish-LLama3.1-8B-Instruct name: Open LLM Leaderboard pipeline_tag: text-generation library_name: transformers ---

Enhancing Human-Like Responses in Large Language Models

   | 🤗 Models   |    📊 Dataset   |    📄Paper   |

# 🚀 Human-Like-Llama3-8B-Instruct This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), specifically optimized to generate more human-like and conversational responses. The fine-tuning process employed both [Low-Rank Adaptation (LoRA)](https://arxiv.org/abs/2106.09685) and [Direct Preference Optimization (DPO)](https://arxiv.org/abs/2305.18290) to enhance natural language understanding, conversational coherence, and emotional intelligence in interactions. The proccess of creating this models is detailed in the research paper [“Enhancing Human-Like Responses in Large Language Models”](https://arxiv.org/abs/2501.05032). # 🛠️ Training Configuration - **Base Model:** Llama3-8B-Instruct - **Framework:** Axolotl v0.4.1 - **Hardware:** 2x NVIDIA A100 (80 GB) GPUs - **Training Time:** ~2 hours 20 minutes - **Dataset:** Synthetic dataset with ≈11,000 samples across 256 diverse topics
See axolotl config axolotl version: `0.4.1` ```yaml base_model: meta-llama/Meta-Llama-3-8B-Instruct model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: true load_in_4bit: false strict: false chat_template: llama3 rl: dpo datasets: - path: HumanLLMs/humanish-dpo-project type: llama3.prompt_pairs chat_template: llama3 dataset_prepared_path: val_set_size: 0.05 output_dir: ./humanish-llama3-8b-instruct sequence_len: 8192 sample_packing: false pad_to_sequence_len: true adapter: lora lora_model_dir: lora_r: 8 lora_alpha: 4 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: wandb_project: Humanish-DPO wandb_entity: wandb_watch: wandb_name: wandb_log_model: hub_model_id: HumanLLMs/Humanish-LLama3.1-8B-Instruct gradient_accumulation_steps: 8 micro_batch_size: 2 num_epochs: 1 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0002 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true s2_attention: warmup_steps: 10 evals_per_epoch: 2 eval_table_size: eval_max_new_tokens: 128 saves_per_epoch: 1 debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: save_safetensors: true ```

# 💬 Prompt Template You can use Llama3 prompt template while using the model: ### Llama3 ``` <|start_header_id|>system<|end_header_id|> {system}<|eot_id|> <|start_header_id|>user<|end_header_id|> {user}<|eot_id|> <|start_header_id|>assistant<|end_header_id|> {assistant}<|eot_id|> ``` This prompt template is available as a [chat template](https://huggingface.co/docs/transformers/main/chat_templating), which means you can format messages using the `tokenizer.apply_chat_template()` method: ```python messages = [ {"role": "system", "content": "You are helpful AI asistant."}, {"role": "user", "content": "Hello!"} ] gen_input = tokenizer.apply_chat_template(message, return_tensors="pt") model.generate(**gen_input) ``` # 🤖 Models | Model | Download | |:---------------------:|:-----------------------------------------------------------------------:| | Human-Like-Llama-3-8B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-LLama3-8B-Instruct) | | Human-Like-Qwen-2.5-7B-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Qwen2.5-7B-Instruct) | | Human-Like-Mistral-Nemo-Instruct | 🤗 [HuggingFace](https://huggingface.co/HumanLLMs/Human-Like-Mistral-Nemo-Instruct-2407) | # 🎯 Benchmark Results | **Group** | **Model** | **Average** | **IFEval** | **BBH** | **MATH Lvl 5** | **GPQA** | **MuSR** | **MMLU-PRO** | |--------------------------------|--------------------------------|-------------|------------|---------|----------------|----------|----------|--------------| | **Llama Models** | Human-Like-Llama-3-8B-Instruct | 22.37 | **64.97** | 28.01 | 8.45 | 0.78 | **2.00** | 30.01 | | | Llama-3-8B-Instruct | 23.57 | 74.08 | 28.24 | 8.68 | 1.23 | 1.60 | 29.60 | | | *Difference (Human-Like)* | -1.20 | **-9.11** | -0.23 | -0.23 | -0.45 | +0.40 | +0.41 | | **Qwen Models** | Human-Like-Qwen-2.5-7B-Instruct | 26.66 | 72.84 | 34.48 | 0.00 | 6.49 | 8.42 | 37.76 | | | Qwen-2.5-7B-Instruct | 26.86 | 75.85 | 34.89 | 0.00 | 5.48 | 8.45 | 36.52 | | | *Difference (Human-Like)* | -0.20 | -3.01 | -0.41 | 0.00 | **+1.01**| -0.03 | **+1.24** | | **Mistral Models** | Human-Like-Mistral-Nemo-Instruct | 22.88 | **54.51** | 32.70 | 7.62 | 5.03 | 9.39 | 28.00 | | | Mistral-Nemo-Instruct | 23.53 | 63.80 | 29.68 | 5.89 | 5.37 | 8.48 | 27.97 | | | *Difference (Human-Like)* | -0.65 | **-9.29** | **+3.02**| **+1.73** | -0.34 | +0.91 | +0.03 | # 📊 Dataset The dataset used for fine-tuning was generated using LLaMA 3 models. The dataset includes 10,884 samples across 256 distinct topics such as technology, daily life, science, history, and arts. Each sample consists of: - **Human-like responses:** Natural, conversational answers mimicking human dialogue. - **Formal responses:** Structured and precise answers with a more formal tone. The dataset has been open-sourced and is available at: - 👉 [Human-Like-DPO-Dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) More details on the dataset creation process can be found in the accompanying research paper. # 📝 Citation ``` @misc{çalık2025enhancinghumanlikeresponseslarge, title={Enhancing Human-Like Responses in Large Language Models}, author={Ethem Yağız Çalık and Talha Rüzgar Akkuş}, year={2025}, eprint={2501.05032}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2501.05032}, } ```