Model Card for AICrossSim/clm-60m

A 60M parameter language model trained on the FineWeb dataset.

Model Details

Model Description

aixsim-60M is a transformer-based language model with approximately 60 million parameters (embedding layer params excluded). It uses RMSNorm for normalization and is trained on the FineWeb dataset.

  • Developed by: AICrossSim
  • Funded by: ARIA
  • Model type: Transformer Language Model
  • Language(s) (NLP): English
  • License: odc-by
  • Tokenizer: Based on HuggingFaceTB/cosmo2-tokenizer

Model Sources

Training Details

Training Data

The model was trained 60*22M tokens from the FineWeb dataset (HuggingFaceTB/fineweb).

Training Procedure

Training Hyperparameters

  • Batch Size: 48
  • Sequence Length: 2048
  • Learning Rate: 0.0001
  • Optimizer: AdamW with fused=false
  • Training Steps: 7075
  • Warmup Steps: 1415
  • Maximum Gradient Norm: 1.0
  • Training regime: Mixed precision (bfloat16 for parameters, float32 for reduction)
  • Data Parallel Configuration:
    • Shard Degree: 2
    • Replicate Degree: 1
  • Random Seed: 42
  • Garbage Collection Frequency: Every 50 steps
  • Compilation: Enabled
Downloads last month
7
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Dataset used to train AICrossSim/clm-60m