inference not working in any enviroment
#5
by
LoreVitCon
- opened
Tryed both instruct and instruct-cuda, in both cases i get some errors (like missing triton, but cant install it) or cuda version that requieres a GPU
Hi ! Thank you for your interest in phi-3 !
For the small model, because we use a custom triton kernel for block-sparse attention, there is a dependency on having a GPU as well as on Triton.
There is active work going on for enabling llama.cpp support for this (see this issue).
bapatra
changed discussion status to
closed