How to avoid meaning deviation?
I have the following scenario where I check similarity between my source and targets.
Even though the matcing sentences are more close in semantically, they are not compatible in meaning. Jane vs John. How to avoid such kind of deviation? Maybe using different model designed for this purpose.
Thanks!
Hey, did you find any solution to this problem?
No, unfortunately.
In this case I'd try different models and perhaps various methods (e.g. w2v approach - although it is an older method, it might perform better than SOTA LLMs for certain problems). You can also try combining model. If combining models, you can do majority voting in the end (like a classic Random Forest algorithm), or averaging the results etc., depending on the prediction problem. Or, as you said, maybe there are models out there that consider this (if you found one let me know).
As a final experiment, I'd weight up words that are closer to the source. It is perhaps not the most sophisticated solution, but it might do the trick.