Doctor-Shotgun
/

smol_llama-220M-GQA-32k-linear

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

smol_llama-220M-GQA-32k-linear / README.md

Doctor-Shotgun's picture

Create README.md

acc10ef 11 months ago

|

history blame contribute delete

999 Bytes

	---
	license: apache-2.0
	datasets:
	- togethercomputer/RedPajama-Data-1T-Sample
	language:
	- en
	tags:
	- llama
	- llama 2
	- smol_llama
	---
	# smol_llama-220M-GQA-32k-linear

	Experimental model meant to serve as a long-context speculative decoding model.

	Created using [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) and further pretraining at 32768 context length on [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T-Sample).

	This variant uses the linear rope scaling method for context extension.

	Wikitext Perplexity (64 rows) as evaluated by [exllamav2](https://github.com/turboderp/exllamav2):
	```
	Base Model
	2048: 20.2193
	4096: 102.6928
	8192: 235.5210
	16384: 390.7198
	32768: 515.8053

	32k - Linear Rope Scale 16.0
	2048: 25.7148
	4096: 23.4461
	8192: 22.3326
	16384: 21.6744
	32768: 21.4317

	32k - Rope Theta 1000000.0
	2048: 20.2158
	4096: 18.3868
	8192: 17.5976
	16384: 17.1462
	32768: 16.6989
	```