Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2407.11062

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10 • 8
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 7
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper • 2306.00978 • Published Jun 1, 2023 • 8

Efficient Training with Denoised Neural Weights

Paper • 2407.11966 • Published Jul 16 • 8
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10 • 8

Runtime error

244

📊

Llm Pricing
Running

809

🚀

Can You Run It? LLM version
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

Paper • 2312.15234 • Published Dec 23, 2023 • 3
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10 • 8

Papers - Quantization - AQLM

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10 • 8

Papers - Quantization - EfficientQAT

EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10 • 8

The official prequantized EfficientQAT models.

ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w4g128

Text Generation • Updated Jul 22 • 2
ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g64

Text Generation • Updated Jul 22 • 3
ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g128

Text Generation • Updated Jul 22 • 7
ChenMnZ/Llama-3-8b-EfficientQAT-w4g128

Text Generation • Updated Jul 22 • 10

Papers - Quantization

QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 45
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

Paper • 2407.11062 • Published Jul 10 • 8
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Paper • 2407.12327 • Published Jul 17 • 77
BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published 2 days ago • 51

aigc acceleration

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25 • 20
Phased Consistency Model

Paper • 2405.18407 • Published May 28 • 46
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28
Imp: Highly Capable Large Multimodal Models for Mobile Devices

Paper • 2405.12107 • Published May 20 • 25

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

Paper • 2404.15653 • Published Apr 24 • 26
MoDE: CLIP Data Experts via Clustering

Paper • 2404.16030 • Published Apr 24 • 12
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 45
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21 • 28

Foundation Models and Tools

FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models

Paper • 2402.10986 • Published Feb 16 • 76
bigcode/starcoder2-15b

Text Generation • Updated Jun 5 • 23.7k • • 568
Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 121
mixedbread-ai/mxbai-rerank-large-v1

Text Classification • Updated Jul 22 • 25.1k • 105

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs