Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
amang1802
/
Llama3.2-1B-summary-length-exp2
like
0
Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Inference Endpoints
Model card
Files
Files and versions
Community
Train
Deploy
Use this model
Model Card for Model ID
Model Details
Model Card for Model ID
Summary Length PPO experiment #2
No KL divergence in loss
Model Details
Dataset size: 1024
Epochs: 2
Batch Size: 4 * 8 (using Grad Accu)
Optimizer args: Torch AdamW default, except
LR = 0.0001
Downloads last month
48
Safetensors
Model size
1.24B params
Tensor type
BF16
·
Inference Providers
NEW
Text Generation
This model is not currently available via any of the supported Inference Providers.
Model tree for
amang1802/Llama3.2-1B-summary-length-exp2
Quantizations
1 model
Collection including
amang1802/Llama3.2-1B-summary-length-exp2
PPO experiments
Collection
Using PPO with simpler reward functions
•
8 items
•
Updated
Jan 23