-
neuralmagic/granite-3.1-2b-instruct-quantized.w4a16
Text Generation • Updated • 171 -
neuralmagic/granite-3.1-2b-instruct-quantized.w8a8
Text Generation • Updated • 166 -
neuralmagic/granite-3.1-8b-instruct-quantized.w4a16
Text Generation • Updated • 109 -
neuralmagic/granite-3.1-8b-instruct-quantized.w8a8
Text Generation • Updated • 106
Neural Magic
company
Verified
AI & ML interests
LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV
Recent Activity
View all activity
Organization Card
The Future of AI is Open
Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.
- nm-vllm: Enterprise-ready inferencing system based on the open-source library, vLLM, for at-scale operationalization of performant open-source LLMs
- LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
- DeepSparse: Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application
In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.
Collections
13
2:4 sparse versions of Llama-3.1, including transfer learning
-
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4-FP8-dynamic
Text Generation • Updated • 47 • 1 -
neuralmagic/Sparse-Llama-3.1-8B-gsm8k-2of4
Text Generation • Updated • 61 • 1 -
neuralmagic/Sparse-Llama-3.1-8B-2of4
Text Generation • Updated • 900 • 61 -
neuralmagic/Sparse-Llama-3.1-8B-ultrachat_200k-2of4
Text Generation • Updated • 22 • 1
spaces
8
Running
2
🔥
Quant Llms Text Generation
Quantized vs. Unquantized LLM: Text Generation Comparison
Running
on
CPU Upgrade
🏃
Llama 3 8B Chat Deepsparse
Sleeping
🏃
Llama 2 Sparse Transfer Chat Deepsparse
Runtime error
1
⚡
DeepSparse Sentiment Analysis
Runtime error
6
🏢
DeepSparse Named Entity Recognition
Running
on
CPU Upgrade
16
📚
Sparse Llama Gsm8k
models
263
neuralmagic/granite-3.1-2b-instruct-quantized.w8a8
Text Generation
•
Updated
•
166
neuralmagic/granite-3.1-8b-instruct-FP8-dynamic
Text Generation
•
Updated
•
5
neuralmagic/granite-3.1-8b-instruct-quantized.w8a8
Text Generation
•
Updated
•
106
neuralmagic/granite-3.1-8b-instruct-quantized.w4a16
Text Generation
•
Updated
•
109
neuralmagic/granite-3.1-8b-base-quantized.w8a8
Text Generation
•
Updated
neuralmagic/granite-3.1-8b-base-quantized.w4a16
Text Generation
•
Updated
neuralmagic/granite-3.1-8b-base-FP8-dynamic
Updated
neuralmagic/granite-3.1-2b-base-FP8-dynamic
Text Generation
•
Updated
neuralmagic/granite-3.1-2b-base-quantized.w8a8
Text Generation
•
Updated
neuralmagic/granite-3.1-2b-base-quantized.w4a16
Text Generation
•
Updated
datasets
12
neuralmagic/mmlu_it
Viewer
•
Updated
•
14k
•
54
neuralmagic/mmlu_fr
Viewer
•
Updated
•
14k
•
364
neuralmagic/mmlu_th
Viewer
•
Updated
•
14k
•
65
neuralmagic/mmlu_de
Viewer
•
Updated
•
14k
•
91
neuralmagic/mmlu_es
Viewer
•
Updated
•
14k
•
72
neuralmagic/mmlu_hi
Viewer
•
Updated
•
14k
•
61
neuralmagic/mmlu_pt
Viewer
•
Updated
•
14k
•
63
neuralmagic/quantized-llama-3.1-leaderboard-v2-evals
Viewer
•
Updated
•
247k
•
231
neuralmagic/quantized-llama-3.1-humaneval-evals
Viewer
•
Updated
•
73.8k
•
54
neuralmagic/quantized-llama-3.1-arena-hard-evals
Viewer
•
Updated
•
6k
•
109