Michael Goin's picture

Michael Goin PRO

mgoin

·

mgoin_
mgoin

AI & ML interests

LLM inference optimization, compression, quantization, pruning, distillation

Recent Activity

published a model 3 minutes ago

nm-testing/Yi-6B-Llama-50-quant-ds-768

published a model 3 minutes ago

nm-testing/AmberChat-pruned60-quant-ds

published a model 4 minutes ago

nm-testing/TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds

View all activity

Organizations

mgoin's activity

New activity in neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV 7 days ago

How to load this model?

#1 opened 7 months ago by

New activity in neuralmagic/Llama-3.2-90B-Vision-Instruct-FP8-dynamic about 2 months ago

Model does not run with VLLM

#3 opened about 2 months ago by

New activity in nm-testing/Llama-3.3-70B-Instruct-FP8-dynamic about 2 months ago

Nice model, any info on scripts used to quantize?

#1 opened about 2 months ago by

New activity in neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic about 2 months ago

Thanks!

#2 opened about 2 months ago by

New activity in mistralai/Pixtral-Large-Instruct-2411 3 months ago

Add config_format and load_format to vLLM args

#5 opened 3 months ago by

Update config.json to use null for sliding_window

#4 opened 3 months ago by

New activity in mgoin/nemotron-3-8b-chat-4k-sft-hf 3 months ago

Adding `safetensors` variant of this model

#1 opened 3 months ago by

New activity in neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w8a16 3 months ago

Is this the standard GPTQ quantization?

#5 opened 3 months ago by

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a16 3 months ago

Model weights are not loaded

#3 opened 5 months ago by

New activity in neuralmagic/pixtral-12b-FP8-dynamic 3 months ago

Update model card

#1 opened 3 months ago by

New activity in nm-testing/llava-1.5-7b-hf-FP8-dynamic 3 months ago

Add chat_template to tokenizer_config.json

#1 opened 3 months ago by

New activity in neuralmagic/Mistral-Nemo-Instruct-2407-FP8 3 months ago

7900xtx torch._scaled_mm is only supported on CUDA devices with compute capability >= 9.0 or 8.9, or ROCm MI300+

#3 opened 3 months ago by

New activity in mistral-community/pixtral-12b 4 months ago

Why is the Pixtral activation function "gelu" when the reference code uses "silu"?

#10 opened 4 months ago by

Update tokenizer_config.json with chat_template

#11 opened 4 months ago by

New activity in neuralmagic/Llama-3.2-90B-Vision-Instruct-FP8-dynamic 4 months ago

Any chance your team is working on a 4-bit Llama-3.2-90B-Vision-Instruct-quantized.w4a16 version?

#1 opened 4 months ago by

New activity in neuralmagic/Llama-3.2-11B-Vision-Instruct-FP8-dynamic 4 months ago

Oom with 24g vram

#1 opened 4 months ago by

New activity in neuralmagic/Phi-3.5-mini-instruct-FP8-KV 4 months ago

latest vllm docker (v0.6.2) fail to load

#1 opened 4 months ago by

New activity in neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16 5 months ago

Issue with loading model

#1 opened 5 months ago by

New activity in neuralmagic/DeepSeek-Coder-V2-Instruct-FP8 5 months ago

Can it run on A100/A800 with VLLM?

#1 opened 6 months ago by

Parkerlambert123

New activity in neuralmagic/Meta-Llama-3.1-405B-Instruct-quantized.w4a16 6 months ago

weights does not exist when trying to deploy in sagemaker endpoint

#1 opened 6 months ago by

LorenzoCevolaniAXA