--- title: README emoji: 🐢 colorFrom: yellow colorTo: blue sdk: static pinned: false ---

Spectra Suite

We release the Spectra Suite consisting of 54 models ranging from 99M to 3.9B parameters across different bitwidths: FloatLM: LLMs pretrained in FP16 (Half-Precision). TriLM: LLMs pretrained with effective ternary bitwidth. QuantLM 8-bit: FloatLM LLMs Quantized to 8-bits. QuantLM 6-bit: FloatLM LLMs Quantized to 6-bits. QuantLM 4-bit: FloatLM LLMs Quantized to 4-bits. QuantLM 3-bit: FloatLM LLMs Quantized to 3-bits. All models are released in unpacked (FP16 format) - compatible with FP16 GEMMs across any library supporting the LLaMa architecture. ## Usage: ```python import transformers as tf, torch # Please select the model you wish to run. model_name = "SpectraSuite/TriLM_3.9B_Unpacked" # Please adjust the temperature, repetition penalty, top_k, top_p and other sampling parameters according to your needs. pipeline = tf.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.float16}, device_map="auto") # These are base (pretrained) LLMs that are not instruction and chat tuned. You may need to adjust your prompt accordingly. pipeline("Once upon a time") ``` ## Citation If you find these models or the associated paper useful, please cite the paper: ```bibtex @misc{kaushal2024spectracomprehensivestudyternary, title={Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models}, author={Ayush Kaushal and Tejas Pandey and Tejas Vaidhya and Aaryan Bhagat and Irina Rish}, year={2024}, eprint={2407.12327}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2407.12327}, } ```