Speculative Decoding

by FrenzyBiscuit - opened 11 days ago

11 days ago

I'm trying to use this model for speculative decoding of 32B and it dramatically slows down the model.

On the other hand, regular Qwen 2.5 1.5B dramatically speeds up the regular 32B Qwen 2.5 model.

Is this trained on the same v0.2 data as the 32B v0.2?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment