Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Paper • 2410.23320 • Published 11 days ago • 6
view article Article Transformers.js v3: WebGPU support, new models & tasks, and more… 19 days ago • 58
Llama3-8B-1.58 Collection A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated Sep 14 • 12
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 7 days ago • 157
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 67 items • Updated Jul 3 • 74
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24 • 58
Gemma release Collection Groups the Gemma models released by the Google team. • 40 items • Updated Jul 31 • 325
Canonical models Collection This collection lists all the historical (pre-"Hub") canonical model checkpoints, i.e. repos that were not under an org or user namespace • 68 items • Updated Feb 13 • 13
SigLIP Collection Contrastive (sigmoid) image-text models from https://arxiv.org/abs/2303.15343 • 8 items • Updated Jul 31 • 34
Switch-Transformers release Collection This release included various MoE (Mixture of expert) models, based on the T5 architecture . The base models use from 8 to 256 experts. • 9 items • Updated Jul 31 • 15
zephyr story Collection sources mentioned by hf.co/thomwolf tweet: x.com/Thom_Wolf/status/1720503998518640703 • 8 items • Updated Jan 24 • 15
Distil-Whisper Models Collection The first version of the Distil-Whisper models released with the Distil-Whisper paper. • 4 items • Updated Mar 21 • 35