Silicon Macs support.
#66
by
quantoser
- opened
Does this model work on Macs with Silicon chips? I'm running it on a Mac Pro M1 and it gets stuck with:
UserWarning: Using the model-agnostic default max_length
(=20) to control the generation length. We recommend setting max_new_tokens
to control the maximum length of the generation.
warnings.warn(
The process just sits there eating up CPU and memory, but no output ever produced.
Hi
@quantoser
!
In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in bfloat16
?
yes it does!
For example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.