Need4Speed

company

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

moshew authored a paper 21 days ago

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

loubnabnl authored a paper 29 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

lvwerra authored a paper 29 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

View all activity

need-for-speed's activity

wenhuach

posted an update 4 days ago

Post

2429

Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |

wenhuach

posted an update 15 days ago

Post

704

OPEA Space has released several quantized DeepSeek models, including INT2. Explore them here
OPEA/deepseek-6784a012d91191015587584a

moshew

authored a paper 21 days ago

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

Paper • 2502.09390 • Published 22 days ago • 16

loubnabnl

authored a paper 29 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published about 1 month ago • 198

lvwerra

authored a paper 29 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published about 1 month ago • 198

lvwerra

authored a paper about 2 months ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 56

wenhuach

posted an update 2 months ago

Post

2342

Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc

3 replies

wenhuach

posted an update 3 months ago

Post

1824

AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.

4 replies

wenhuach

posted an update 3 months ago

Post

345

This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!

https://huggingface.co/OPEA

3 replies

Haihao

authored a paper 3 months ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published Nov 29, 2024 • 5

wenhuach

posted an update 3 months ago

Post

988

OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview,
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out https://huggingface.co/OPEA

loubnabnl

posted an update 3 months ago

Post

2925

Making SmolLM2 reproducible: open-sourcing our training & evaluation toolkit 🛠️ https://github.com/huggingface/smollm/

- Pre-training code with nanotron
- Evaluation suite with lighteval
- Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk)
- Post-training scripts with TRL & the alignment handbook
- On-device tools with llama.cpp for summarization, rewriting & agents

Apache 2.0 licensed. V2 pre-training data mix coming soon!

Which other tools should we add next?