@tomaarsen on Hugging Face: "🚀 Sentence Transformers v3.1 is out! Featuring a hard negatives mining…"

tomaarsen

posted an update Sep 11, 2024

Post

3782

🚀 Sentence Transformers v3.1 is out! Featuring a hard negatives mining utility to get better models out of your data, a new strong loss function, training with streaming datasets, custom modules, bug fixes, small additions and docs changes. Here's the details:

⛏ Hard Negatives Mining Utility: Hard negatives are texts that are rather similar to some anchor text (e.g. a question), but are not the correct match. They're difficult for a model to distinguish from the correct answer, often resulting in a stronger model after training.
📉 New loss function: This loss function works very well for symmetric tasks (e.g. clustering, classification, finding similar texts/paraphrases) and a bit less so for asymmetric tasks (e.g. question-answer retrieval).
💾 Streaming datasets: You can now train with the datasets.IterableDataset, which doesn't require downloading the full dataset to disk before training. As simple as "streaming=True" in your "datasets.load_dataset".
🧩 Custom Modules: Model authors can now customize a lot more of the components that make up Sentence Transformer models, allowing for a lot more flexibility (e.g. multi-modal, model-specific quirks, etc.)
✨ New arguments to several methods: encode_multi_process gets a progress bar, push_to_hub can now be done to different branches, and CrossEncoders can be downloaded to specific cache directories.
🐛 Bug fixes: Too many to name here, check out the release notes!
📝 Documentation: A particular focus on clarifying the batch samplers in the Package Reference this release.

Check out the full release notes here ⭐: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.1.0

I'm very excited to hear your feedback, and I'm looking forward to the future changes that I have planned, such as ONNX inference! I'm also open to suggestions for new features: feel free to send me your ideas.

louisbrulenaudet

Sep 11, 2024

A dream update, I was just about to start working on a Hard Negatives Mining function, @tomaarsen , I'm gaining hours of sleep thanks to you and the rest of the community 😅

I'm going to test it out as soon as possible!

PS: I'd like to take this opportunity to thank you again for the new documentation, which is just perfect.

tomaarsen

Sep 11, 2024

Glad to hear it! Feel free to send over feedback if you have any, it's always quite valuable for new features/docs.

skcandx

Nov 22, 2024

This comment has been hidden

Join the conversation