RocktimMBZ/dpo_model_merged_lr_2e_6_lora_64_16_epoch_6_beta_25_llama_3_ckpt_1000 Updated 1 day ago • 6
MALT: Improving Reasoning with Multi-Agent LLM Training Paper • 2412.01928 • Published Dec 2, 2024 • 40