Model Card for AICrossSim/clm-60m
A 60M parameter language model trained on the FineWeb dataset.
Model Details
Model Description
aixsim-60M is a transformer-based language model with approximately 60 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb dataset.
- Developed by: AICrossSim
- Funded by: ARIA
- Model type: Transformer Language Model
- Language(s) (NLP): English
- License: odc-by
- Tokenizer: Based on HuggingFaceTB/cosmo2-tokenizer
Model Sources
- Repository: https://github.com/AICrossSim/NewComputeBench
Training Details
Training Data
The model was trained 60*22M tokens from the FineWeb dataset (HuggingFaceTB/fineweb).
Training Procedure
Training Hyperparameters
- Batch Size: 48
- Sequence Length: 2048
- Learning Rate: 0.0001
- Optimizer: AdamW with fused=false
- Training Steps: 7075
- Warmup Steps: 1415
- Maximum Gradient Norm: 1.0
- Training regime: Mixed precision (bfloat16 for parameters, float32 for reduction)
- Data Parallel Configuration:
- Shard Degree: 2
- Replicate Degree: 1
- Random Seed: 42
- Garbage Collection Frequency: Every 50 steps
- Compilation: Enabled
- Downloads last month
- 7
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.