amirabdullah19852020
/

gpt-neo-125m_hh_reward

Text Generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

amirabdullah19852020 commited on Jan 1

Commit

460c39e

•

1 Parent(s): 5ab78a7

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ should probably proofread and complete it, then remove this comment. -->
 # gpt-neo-125m_hh_reward
-This model is a fine-tuned version of [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.7764
 - Rewards/chosen: -1.0726

 # gpt-neo-125m_hh_reward
+This model is a DPO fine-tuned version of [EleutherAI/gpt-neo-125m](https://huggingface.co/EleutherAI/gpt-neo-125m) on Anthropics HH dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.7764
 - Rewards/chosen: -1.0726