Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Models filters
Tasks
Libraries
Datasets
Languages
Licenses
Other
1
Misc
Reset Misc
arxiv:
1909.08593
Inference Endpoints
AutoTrain Compatible
text-generation-inference
4-bit precision
8-bit precision
custom_code
Misc with no match
Eval Results
Merge
text-embeddings-inference
Carbon Emissions
Mixture of Experts
Apply filters
Models
56
Full-text search
Edit filters
Sort: Trending
Active filters:
1909.08593
Clear all
bikalnetomi/RLHF-PPO-PPOModel-LLama3-1B-v1.3
Text Generation
•
Updated
Dec 2, 2024
•
131
bikalnetomi/RLHF-PPO-PPOModel-LLama3-1B-v1.4
Text Generation
•
Updated
Dec 2, 2024
•
128
Evan-Lin/Qwen2.5-0.5B-RL
Updated
Dec 12, 2024
RLHF-And-Friends/FedPPO-Pythia-70M-a0
Text Generation
•
Updated
Dec 13, 2024
•
139
RLHF-And-Friends/FedPPO-Pythia-70M-a1
Text Generation
•
Updated
Dec 13, 2024
•
137
RLHF-And-Friends/FedPPO-Isolated-Pythia-70M-a0
Text Generation
•
Updated
Dec 13, 2024
•
139
RLHF-And-Friends/FedPPO-Confused-Pythia-70M-a0
Text Generation
•
Updated
Dec 13, 2024
•
141
RLHF-And-Friends/FedPPO-Collaborative-Pythia-70M-a0
Text Generation
•
Updated
Dec 13, 2024
•
138
RLHF-And-Friends/FedPPO-Isolated-Pythia-70M-a1
Text Generation
•
Updated
Dec 13, 2024
•
143
RLHF-And-Friends/FedPPO-Collaborative-Pythia-70M-a1
Text Generation
•
Updated
Dec 13, 2024
•
138
RLHF-And-Friends/FedPPO-Confused-Pythia-70M-a1
Text Generation
•
Updated
Dec 13, 2024
•
139
nologin/ppo
Text Generation
•
Updated
Dec 13, 2024
•
142
RichardErkhov/bikalnetomi_-_RLHF-PPO-PPOModel-LLama3-1B-v1.1-awq
Updated
Dec 21, 2024
•
3
RichardErkhov/bikalnetomi_-_RLHF-PPO-PPOModel-LLama3-1B-v1.3-awq
Updated
Dec 21, 2024
•
3
RichardErkhov/bikalnetomi_-_RLHF-PPO-PPOModel-LLama3-1B-v1.4-awq
Updated
Dec 21, 2024
•
2
RichardErkhov/bikalnetomi_-_RLHF-PPO-PPOModel-LLama3-1B-v1.0-awq
Updated
28 days ago
•
8
joaoluislins/trained_ppo_model
Updated
15 days ago
Ousso1117/PPO-meta-Llama-3.2-1B-meta-Llama-3.2-1B-mrd3-sum
Updated
8 days ago
Ousso1117/PPO-SFT-meta-Llama-3.2-1B-meta-Llama-3.2-1B-mrd3-sum
Updated
4 days ago
Ousso1117/PPO-meta-Llama-3.2-3B-meta-Llama-3.2-3B-mrd3-sum
Updated
7 days ago
Ousso1117/PPO-SFT-meta-Llama-3.2-3B-meta-Llama-3.2-3B-mrd3-sum
Updated
7 days ago
Ousso1117/PPO-meta-Llama-2-7B-meta-Llama-2-7B-mrd3-sum
Updated
7 days ago
Ousso1117/PPO-meta-Llama-3.1-8B-meta-Llama-3.1-8B-mrd3-sum
Updated
7 days ago
Ousso1117/PPO-SFT-meta-Llama-2-7B-meta-Llama-2-7B-mrd3-sum
Updated
7 days ago
Ousso1117/PPO-SFT-meta-Llama-3.1-8B-meta-Llama-3.1-8B-mrd3-sum
Updated
7 days ago
Ousso1117/PPO-meta-Llama-3.2-1B-meta-Llama-3.1-8B-mrd3-sum
Updated
about 5 hours ago
Previous
1
2
Next