arxiv:2403.00673
Adam Yanxiao Zhao
sdpkjc
AI & ML interests
Reinforcement Learning
Recent Activity
updated
a model
3 days ago
sdpkjc/Qwen2.5-1.5B-Instruct-FT-DPO
published
a model
3 days ago
sdpkjc/Qwen2.5-1.5B-Instruct-FT-DPO
updated
a model
3 days ago
sdpkjc/SmolLM2-FT-DPO
Organizations
Papers
2
models
98
sdpkjc/Qwen2.5-1.5B-Instruct-FT-DPO
Text Generation
•
Updated
•
3
sdpkjc/SmolLM2-FT-DPO
Text Generation
•
Updated
•
2
sdpkjc/SmolLM2-FT-MyDataset
Text Generation
•
Updated
sdpkjc/Ant-v4-ppo_fix_continuous_action-seed5
Reinforcement Learning
•
Updated
sdpkjc/Ant-v4-ppo_fix_continuous_action-seed4
Reinforcement Learning
•
Updated
sdpkjc/Ant-v4-ppo_fix_continuous_action-seed3
Reinforcement Learning
•
Updated
sdpkjc/Ant-v4-ppo_fix_continuous_action-seed2
Reinforcement Learning
•
Updated
sdpkjc/Ant-v4-ppo_fix_continuous_action-seed1
Reinforcement Learning
•
Updated
sdpkjc/Humanoid-v4-ppo_fix_continuous_action-seed5
Reinforcement Learning
•
Updated
sdpkjc/Humanoid-v4-ppo_fix_continuous_action-seed4
Reinforcement Learning
•
Updated
datasets
None public yet