Mistral-7b with continued pretraining using Quiet-STaR (https://arxiv.org/abs/2403.09629) for generating 8 thought tokens before each output token.

Safetensors

Model size

7.29B params

Tensor type

BF16

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

Model tree for ezelikman/quietstar-8-ahead

Merges

Quantizations

ezelikman
/

quietstar-8-ahead