9 5 14

wenhua cheng

wenhuach

wenhuach21

AI & ML interests

Model Compression, CV

Recent Activity

reacted to their post with 🚀 2 days ago

Are we the only providers of INT4 quantized models for Llama 3.2 VL? https://huggingface.co/OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc https://huggingface.co/OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc

posted an update 2 days ago

replied to their post 7 days ago

AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: https://huggingface.co/OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.

View all activity

Organizations

wenhuach's activity

reacted to their post with 🚀 2 days ago

Post

2090

Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc

posted an update 2 days ago

Post

2090

Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc

replied to their post 7 days ago

You can try using auto-round-fast xxx for a slight accuracy drop, or auto-round-fast xxx --nsamples 1 --iters 1 for very fast execution without algorithm tuning.

replied to their post 7 days ago

Thank you for your suggestion. As our focus is on algorithm development and our computational resources are limited, we currently lack the bandwidth to support a large number of models. If you come across any models that would benefit from quantization, feel free to comment on any models under OPEA. We will make an effort to prioritize and quantize them if resources allow.

reacted to their post with 🔥👀 8 days ago

Post

1772

AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.

4 replies

posted an update 8 days ago

Post

1772

AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.

4 replies

reacted to their post with ❤️ 16 days ago

Post

329

This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!

https://huggingface.co/OPEA

3 replies

replied to their post 16 days ago

Sure, we will have a try

posted an update 19 days ago

Post

329

3 replies

reacted to their post with 🚀 26 days ago

Post

977

OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview,
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA

posted an update 26 days ago

Post

977

OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview,
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA

posted an update 5 months ago

Post

649

Try to find a better int4 algorithm for LLAMA3.1? For the 8B model, AutoRound boasts an average improvement across 10 zero-shot tasks, scoring 63.93 versus 63.15 (AWQ). Notably, on the MMLU task, it achieved 66.72 compared to 65.25, and on the ARC-C task, it scored 52.13 against 50.94. For further details and comparisons, visit the leaderboard at Intel/low_bit_open_llm_leaderboard.

posted an update 6 months ago

Post

537

Check out AutoRound, SOTA LLM quantization algorithm across 2-4 bits without adding any inference overhead to any model
paper: https://arxiv.org/abs/2309.05516
github: https://github.com/intel/auto-round
lowbits leaderboard: https://huggingface.co/spaces/Intel/low-bit-leaderboard