-
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 8 -
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Paper • 2210.17323 • Published • 7 -
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Paper • 2306.00978 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2407.11062
-
244📊
Llm Pricing
-
809🚀
Can You Run It? LLM version
-
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Paper • 2312.15234 • Published • 3 -
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 8
-
ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w4g128
Text Generation • Updated • 2 -
ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g64
Text Generation • Updated • 3 -
ChenMnZ/Llama-3-8b-instruct-EfficientQAT-w2g128
Text Generation • Updated • 7 -
ChenMnZ/Llama-3-8b-EfficientQAT-w4g128
Text Generation • Updated • 10
-
QLoRA: Efficient Finetuning of Quantized LLMs
Paper • 2305.14314 • Published • 45 -
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
Paper • 2407.11062 • Published • 8 -
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models
Paper • 2407.12327 • Published • 77 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 51
-
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions
Paper • 2403.16627 • Published • 20 -
Phased Consistency Model
Paper • 2405.18407 • Published • 46 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28 -
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper • 2405.12107 • Published • 25
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 12 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 45 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 28
-
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Paper • 2402.10986 • Published • 76 -
bigcode/starcoder2-15b
Text Generation • Updated • 23.7k • • 568 -
Zephyr: Direct Distillation of LM Alignment
Paper • 2310.16944 • Published • 121 -
mixedbread-ai/mxbai-rerank-large-v1
Text Classification • Updated • 25.1k • 105