Why text_encoder model in the openclip (CLIP ViT-H) library is 3.94G, while the size in this library is 1.36G
#93
by
MetaInsight
- opened
The model card states that OpenCLIP ViT/H is used, but the size is different
Does anyone know why?
openclip :https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/tree/main
Yeah, That s big question. I couldnt project encoded hiddenstates. Bec. in this repo, there is no projection weights