Neural Magic

company

Verified

https://neuralmagic.com/

neuralmagic

neuralmagic

AI & ML interests

LLMs, optimization, compression, sparsification, quantization, pruning, distillation, NLP, CV

Recent Activity

nm-research updated a model about 2 hours ago

neuralmagic/DeepSeek-R1-Distill-Qwen-1.5B-FP8-dynamic

nm-research updated a model about 2 hours ago

neuralmagic/DeepSeek-R1-Distill-Qwen-1.5B-quantized.w4a16

nm-research updated a model about 2 hours ago

neuralmagic/DeepSeek-R1-Distill-Qwen-1.5B-quantized.w8a8

View all activity

Organization Card

Community About org cards

The Future of AI is Open

Neural Magic helps developers in accelerating deep learning performance using automated model compression technologies and inference engines. Download our compression-aware inference engines and open source tools for fast model inference.

nm-vllm: Enterprise-ready inferencing system based on the open-source library, vLLM, for at-scale operationalization of performant open-source LLMs
LLM Compressor: HF-native library for applying quantization and sparsity algorithms to llms for optimized deployment with vLLM
DeepSparse: Inference runtime offering accelerated performance on CPUs and APIs to integrate ML into your application

In this profile we provide accurate model checkpoints compressed with SOTA methods ready to run in vLLM such as W4A16, W8A16, W8A8 (int8 and fp8), and many more! If you would like help quantizing a model or have a request for us to add a checkpoint, please open an issue in https://github.com/vllm-project/llm-compressor.

Collections 14

spaces 4

Quant Llms Text Generation

Quantized vs. Unquantized LLM: Text Generation Comparison

Llama 3 8B Chat Deepsparse

Chat with a Llama-3-8B-Instruct model efficiently using text

Llama 2 Sparse Transfer Chat Deepsparse

Sparse Llama Gsm8k

Solve math problems with chat-based guidance

models 306

neuralmagic/DeepSeek-R1-Distill-Qwen-1.5B-FP8-dynamic

Text Generation • Updated about 2 hours ago • 734

neuralmagic/DeepSeek-R1-Distill-Qwen-1.5B-quantized.w4a16

Text Generation • Updated about 2 hours ago • 1.04k

neuralmagic/DeepSeek-R1-Distill-Qwen-1.5B-quantized.w8a8

Text Generation • Updated about 2 hours ago • 958 • 1

neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w4a16

Text Generation • Updated about 2 hours ago • 1.69k • 2

neuralmagic/DeepSeek-R1-Distill-Llama-70B-quantized.w8a8

Text Generation • Updated about 2 hours ago • 1.15k • 1

neuralmagic/DeepSeek-R1-Distill-Llama-70B-FP8-dynamic

Text Generation • Updated about 2 hours ago • 2.56k • 5

neuralmagic/DeepSeek-R1-Distill-Llama-8B-FP8-dynamic

Text Generation • Updated 1 day ago • 1.31k • 1

neuralmagic/DeepSeek-R1-Distill-Llama-8B-quantized.w8a8

Text Generation • Updated 1 day ago • 1.23k

neuralmagic/DeepSeek-R1-Distill-Llama-8B-quantized.w4a16

Text Generation • Updated 1 day ago • 1.31k

neuralmagic/DeepSeek-R1-Distill-Qwen-7B-FP8-dynamic

Text Generation • Updated 1 day ago • 769

datasets 12

neuralmagic/mmlu_it

Viewer • Updated Oct 23, 2024 • 14k • 220

neuralmagic/mmlu_fr

Viewer • Updated Oct 23, 2024 • 14k • 217

neuralmagic/mmlu_th

Viewer • Updated Oct 23, 2024 • 14k • 327

neuralmagic/mmlu_de

Viewer • Updated Oct 23, 2024 • 14k • 372

neuralmagic/mmlu_es

Viewer • Updated Oct 23, 2024 • 14k • 109

neuralmagic/mmlu_hi

Viewer • Updated Oct 23, 2024 • 14k • 252

neuralmagic/mmlu_pt

Viewer • Updated Oct 23, 2024 • 14k • 243

neuralmagic/quantized-llama-3.1-leaderboard-v2-evals

Viewer • Updated Oct 10, 2024 • 247k • 534

neuralmagic/quantized-llama-3.1-humaneval-evals

Viewer • Updated Oct 10, 2024 • 73.8k • 174

neuralmagic/quantized-llama-3.1-arena-hard-evals

Viewer • Updated Oct 10, 2024 • 6k • 273