Silicon Macs support.

by quantoser - opened

Does this model work on Macs with Silicon chips? I'm running it on a Mac Pro M1 and it gets stuck with:

UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.

The process just sits there eating up CPU and memory, but no output ever produced.

Hi @quantoser !
In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in bfloat16?

yes it does!
For example you can use Gemma.cpp ( or Ollama and both run on Mac.

Sign up or log in to comment