Could u please provide more details about model training?
like train dataset, loss funciton, etc.
like train dataset, loss funciton, etc.
actually, the dataset are also from bge, https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding#fine-tune , and you can find the sample on https://huggingface.co/datasets/Shitao/bge-reranker-data .
and trainer is also a llm implement of bge reranker, you can find loss function on https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/reranker/modeling.py
like train dataset, loss funciton, etc.
actually, the dataset are also from bge, https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding#fine-tune , and you can find the sample on https://huggingface.co/datasets/Shitao/bge-reranker-data .
and trainer is also a llm implement of bge reranker, you can find loss function on https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/reranker/modeling.py
Thanks for your reply.
We tried applying llm for the task of search relevance judgement, but the performance is quite poor, our experiment settings is :
- Dataset: Chinese search relevance open source dataset, https://modelscope.cn/datasets/iic/QBQTC/summary
- backbone: qwen2 1.5b model as backbone
- finetune: full-parameter finetuned
- epoch: 1
- model structure: adding a classification head on the top of last token's hidden states.
Does the settings above same to your implementation?
We tested our code on sentence classification task, it worked well. But on sentence pair classification, the performance deteriorated.
It seems that using Encoder-Only structure(Bert-Like) performs much better than Decoder-Only structure, though we add classification head on the last token.
May you provide some clues to help us figure out this problem?
My Sincerely
like train dataset, loss funciton, etc.
actually, the dataset are also from bge, https://github.com/FlagOpen/FlagEmbedding/tree/master/FlagEmbedding#fine-tune , and you can find the sample on https://huggingface.co/datasets/Shitao/bge-reranker-data .
and trainer is also a llm implement of bge reranker, you can find loss function on https://github.com/FlagOpen/FlagEmbedding/blob/master/FlagEmbedding/reranker/modeling.py
Thanks for your reply.
We tried applying llm for the task of search relevance judgement, but the performance is quite poor, our experiment settings is :
- Dataset: Chinese search relevance open source dataset, https://modelscope.cn/datasets/iic/QBQTC/summary
- backbone: qwen2 1.5b model as backbone
- finetune: full-parameter finetuned
- epoch: 1
- model structure: adding a classification head on the top of last token's hidden states.
Does the settings above same to your implementation?
We tested our code on sentence classification task, it worked well. But on sentence pair classification, the performance deteriorated.
It seems that using Encoder-Only structure(Bert-Like) performs much better than Decoder-Only structure, though we add classification head on the last token.
May you provide some clues to help us figure out this problem?
My Sincerely
Actually, my model is not a classification, it is a regression model, https://huggingface.co/neofung/LdIR-Qwen2-reranker-1.5B/discussions/4#67160e1224552afb830b1dd8 .
And please double confirm that, padding is correct? the last token is the \n(198)
?
last five tokens: <|im_end|>(151645), \n(198), <|im_start|>(151644), assistant(77091), \n(198)
If you applied incorrect padding strategy, or truncated the context by max model length, you may get the wrong last token's hidden states.