Split/shard support

#65
by SixOpen - opened
No description provided.
ggml.ai org

Nice! thanks for the PR - this looks very comprehensive!

cc: @phymbert - would you like to give this a review from the split functionality PoV

ggml.ai org

Gladly! Update: there's also support for i-quants virtually ready though it'd benefit a few more additions/fallbacks to handle some gotchas before submitting: https://huggingface.co/spaces/SixOpen/gguf-my-repo-sp_imat/tree/main I have access to Zero but it doesn't seem to support docker SDK so can't load any layers to GPU, though works regardless (in few hours given the free space CPU, granted that is no issue on the original space)
image.png Result Just split result using the space mirroring this PR

reach-vb changed pull request status to merged
ggml.ai org

Hey @SixOpen - thanks a lot for this! I'm merging this now. Can you open a new PR with matrix support? πŸ€—

Awesome :) of course, will do so!

ggml.ai org

Hey hey! @SixOpen - Just double checking are you still planning on opening a PR for iMatrix support! I think it could be quite cool to add! πŸ€—

Sure thing! πŸ˜„ I might do it this weekend, though there's an impediment regarding the Dockerfile and putting the GPU to work which I haven't been able to figure a fix that follows best practices yet (other than through Dev Mode), and haven't been able to replicate it locally either. Will look into it in a bit! :)
image.png

ggml.ai org

Interesting, I think in your start.sh you should have LLAMA_CUDA=1 make -j quantize gguf-split imatrix so that it compiles with CUDA support.
Feel free to email me at vaibhav [at] huggingface [dot] co or message on twitter if you want to chat more about this, happy to help debug issues with you.

Sounds good let's do that then :) looked at the latest commits here by the way, they're cool stuff! I was using that cuda flag indeed, but good news: turns out the space itself was the issue, after moving to a new one everything is working well- Another odd thing similar to this that happened in another iteration of the space is that in spite of factory rebuilding, once the auth expires it results into an endless redirect loop
image.png

Sign up or log in to comment