Stefano Fiorucci PRO

anakin87

AI & ML interests

Contributing to Haystack LLM framework ๐Ÿ—๏ธ. Language Models: orchestration, post-training, synthetic data...

Recent Activity

Articles

Organizations

deepset's profile picture Blog-explorers's profile picture ZeroGPU Explorers's profile picture Hugging Face Discord Community's profile picture

anakin87's activity

New activity in google/gemma-2-9b 4 days ago

Fine-tuning Hyperparameters

8
#27 opened 7 months ago by
tanliboy
replied to their post 6 days ago
upvoted an article 6 days ago
view article
Article

Fine-tune ModernBERT for RAG with Synthetic Data

By sdiazlor โ€ข
โ€ข 28
posted an update 6 days ago
view post
Post
1533
๐๐ž๐ฐ ๐ˆ๐ญ๐š๐ฅ๐ข๐š๐ง ๐’๐ฆ๐š๐ฅ๐ฅ ๐‹๐š๐ง๐ ๐ฎ๐š๐ ๐ž ๐Œ๐จ๐๐ž๐ฅ๐ฌ: ๐†๐ž๐ฆ๐ฆ๐š ๐๐ž๐จ๐ ๐ž๐ง๐ž๐ฌ๐ข๐ฌ ๐œ๐จ๐ฅ๐ฅ๐ž๐œ๐ญ๐ข๐จ๐ง ๐Ÿ’Ž๐ŸŒ๐Ÿ‡ฎ๐Ÿ‡น

I am happy to release two new language models for the Italian Language!

๐Ÿ’ช Gemma 2 9B Neogenesis ITA
anakin87/gemma-2-9b-neogenesis-ita
Building on the impressive work by VAGO Solutions, I applied Direct Preference Optimization with a mix of Italian and English data.
Using Spectrum, I trained 20% of model layers.

๐Ÿ“Š Evaluated on the Open ITA LLM leaderboard ( mii-llm/open_ita_llm_leaderboard), this model achieves strong performance.
To beat it on this benchmark, you'd need a 27B model ๐Ÿ˜Ž


๐Ÿค Gemma 2 2B Neogenesis ITA
anakin87/gemma-2-2b-neogenesis-ita
This smaller variant is fine-tuned from the original Gemma 2 2B it by Google.
Through a combination of Supervised Fine-Tuning and Direct Preference Optimization, I trained 25% of the layers using Spectrum.

๐Ÿ“ˆ Compared to the original model, it shows improved Italian proficiency, good for its small size.


Both models were developed during the recent #gemma competition on Kaggle.
๐Ÿ““ Training code: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond


๐Ÿ™ Thanks @FinancialSupport and mii-llm for the help during evaluation.
ยท
reacted to tomaarsen's post with โค๏ธ 11 days ago
view post
Post
4322
๐ŸŽ๏ธ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2๏ธโƒฃ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
๐Ÿง  my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
๐Ÿ“œ my training scripts, using the Sentence Transformers library
๐Ÿ“Š my Weights & Biases reports with losses & metrics
๐Ÿ“• my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
๐ŸŽ๏ธ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0๏ธโƒฃ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
๐Ÿ“ No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
๐Ÿ“ Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
๐Ÿช† Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1
  • 1 reply
ยท