Michael Goin PRO
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Recent Activity
published
a model
3 minutes ago
nm-testing/Yi-6B-Llama-50-quant-ds-768
published
a model
3 minutes ago
nm-testing/AmberChat-pruned60-quant-ds
published
a model
4 minutes ago
nm-testing/TinyLlama-1.1B-intermediate-step-1431k-3T-gsms8k-pruned50-quant-ds
Organizations
mgoin's activity
How to load this model?
2
#1 opened 7 months ago
by
Frz614
Model does not run with VLLM
2
#3 opened about 2 months ago
by
aswad546
Nice model, any info on scripts used to quantize?
1
#1 opened about 2 months ago
by
RonanMcGovern
Add config_format and load_format to vLLM args
#5 opened 3 months ago
by
mgoin
Update config.json to use null for sliding_window
#4 opened 3 months ago
by
mgoin
Adding `safetensors` variant of this model
#1 opened 3 months ago
by
SFconvertbot
Is this the standard GPTQ quantization?
1
#5 opened 3 months ago
by
molereddy
Model weights are not loaded
4
#3 opened 5 months ago
by
MarvelousMouse
Update model card
#1 opened 3 months ago
by
nm-research
Add chat_template to tokenizer_config.json
#1 opened 3 months ago
by
nm-research
Why is the Pixtral activation function "gelu" when the reference code uses "silu"?
2
#10 opened 4 months ago
by
mgoin
Update tokenizer_config.json with chat_template
3
#11 opened 4 months ago
by
mgoin
Any chance your team is working on a 4-bit Llama-3.2-90B-Vision-Instruct-quantized.w4a16 version?
1
#1 opened 4 months ago
by
mrhendrey
Oom with 24g vram
3
#1 opened 4 months ago
by
Klopez
latest vllm docker (v0.6.2) fail to load
2
#1 opened 4 months ago
by
choronz333
Issue with loading model
1
#1 opened 5 months ago
by
xSumukhax
Can it run on A100/A800 with VLLM?
3
#1 opened 6 months ago
by
Parkerlambert123
weights does not exist when trying to deploy in sagemaker endpoint
1
#1 opened 6 months ago
by
LorenzoCevolaniAXA