The output size when deployed in GCP is 1536 instead of 1024
Hello!
The reason that the default behaviour uses 1024 is because it uses the Dense module from 2_Dense_1024
: https://huggingface.co/dunzhang/stella_en_1.5B_v5/blob/main/modules.json#L15-L19
Whereas GCP will likely read the "usual" 2_Dense
folder and use that one instead. That folder has the 1536 that you're experiencing.
I see you already created a clone of this model to try and fix it, but I think your fix might be wrong (i.e. you're not using any Dense anymore). I would fix it like this:
- Clone the model
- Rename 2_Dense to 2_Dense_1536
- Rename 2_Dense_1024 to 2_Dense
- Update modules.json to use 2_Dense instead of 2_Dense_1024.
Then both the Sentence Transformers and GCP should use the 1024 with the Dense module (which is important to get the correct performance!)
- Tom Aarsen
Hi Tom, thanks for the response
I looked inside 2_Dense and saw this
"out_features": 8192,
Does this mean the output when this layer is used is 8192 dimensions?
To me, it seems the one click GCP deployment doesn't use any of the 2_Dense_* layers.
Yes. I would advise against using it, because the MTEB score of 1024d is only 0.001 lower than 8192d.
Having said that, I think my original assumption here:
Whereas GCP will likely read the "usual" 2_Dense folder and use that one instead. That folder has the 1536 that you're experiencing.
was wrong. I think GCP perhaps just doesn't use any Dense layer? This will result in worse performance I'm afraid.
@philschmid
do you have some experience with this? Or
@olivierdehaene
due to TEI?
- Tom Aarsen
I took a look at the TEI code and it seems TEI only reads the 1_Pooling layer. But I would definitely appreciate the view of someone who has expertise on that.
@philschmid @olivierdehaene any updates on the answer for Stella on TEI?
Hi @philschmid @olivierdehaene any work around or update on this? for Stella on TEI?
The length of the output is 1536 instead of 1024. I did one click deploy. It doesn't match when i load the model for training and for inference. Could you make the model that loads in TEI also use 1024 dims?
I encountered the same problem, changed the configuration, but the result is still 1536, deployed through /text-embeddings-inference.
You can use the fork i made, it's 1536 dims for both TEI and loading into SentenceTransformer. https://huggingface.co/bennegeek/stella_en_1.5B_v5/
I believe the performance will be worse if you have 1536 dims, i.e. if you're not using the Dense module.
- Tom Aarsen