Mistral-7b with continued pretraining using Quiet-STaR (https://arxiv.org/abs/2403.09629) for generating 8 thought tokens before each output token.

Downloads last month
169
Safetensors
Model size
7.29B params
Tensor type
BF16
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for ezelikman/quietstar-8-ahead

Merges
3 models
Quantizations
1 model

Dataset used to train ezelikman/quietstar-8-ahead

Spaces using ezelikman/quietstar-8-ahead 3