41 39 133

Stefano Fiorucci PRO

anakin87

AI & ML interests

Contributing to Haystack LLM framework 🏗️. Language Models: orchestration, post-training, synthetic data...

Recent Activity

liked a dataset 4 days ago

anakin87/evol-dpo-ita-reranked

liked a dataset 4 days ago

anakin87/fine-instructions-ita-70k

new activity 4 days ago

google/gemma-2-9b:Fine-tuning Hyperparameters

View all activity

Articles

🇮🇹🇯🇵🇧🇷 Generating multilingual instruction datasets with Magpie 🐦‍⬛

Oct 21, 2024

• 18

Selective fine-tuning of Language Models with Spectrum

Sep 3, 2024

• 31

Organizations

anakin87's activity

liked 2 datasets 4 days ago

anakin87/evol-dpo-ita-reranked

Viewer • Updated 12 days ago • 19.8k • 40 • 3

anakin87/fine-instructions-ita-70k

Viewer • Updated 12 days ago • 69.9k • 33 • 3

New activity in google/gemma-2-9b 4 days ago

Fine-tuning Hyperparameters

#27 opened 7 months ago by

tanliboy

liked a dataset 5 days ago

mlabonne/orpo-dpo-mix-40k

Viewer • Updated Oct 17, 2024 • 44.2k • 911 • 271

replied to their post 6 days ago

Ok, I understand...

In the past, I've also fine-tuned models with different licenses.
You may be interested in https://huggingface.co/anakin87/Phi-3.5-mini-ITA (MIT license).

upvoted an article 6 days ago

Article

Fine-tune ModernBERT for RAG with Synthetic Data

•

6 days ago

• 28

posted an update 6 days ago

Post

1533

𝐍𝐞𝐰 𝐈𝐭𝐚𝐥𝐢𝐚𝐧 𝐒𝐦𝐚𝐥𝐥 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬: 𝐆𝐞𝐦𝐦𝐚 𝐍𝐞𝐨𝐠𝐞𝐧𝐞𝐬𝐢𝐬 𝐜𝐨𝐥𝐥𝐞𝐜𝐭𝐢𝐨𝐧 💎🌍🇮🇹

I am happy to release two new language models for the Italian Language!

💪 Gemma 2 9B Neogenesis ITA
anakin87/gemma-2-9b-neogenesis-ita
Building on the impressive work by VAGO Solutions, I applied Direct Preference Optimization with a mix of Italian and English data.
Using Spectrum, I trained 20% of model layers.

📊 Evaluated on the Open ITA LLM leaderboard ( mii-llm/open_ita_llm_leaderboard), this model achieves strong performance.
To beat it on this benchmark, you'd need a 27B model 😎

🤏 Gemma 2 2B Neogenesis ITA
anakin87/gemma-2-2b-neogenesis-ita
This smaller variant is fine-tuned from the original Gemma 2 2B it by Google.
Through a combination of Supervised Fine-Tuning and Direct Preference Optimization, I trained 25% of the layers using Spectrum.

📈 Compared to the original model, it shows improved Italian proficiency, good for its small size.

Both models were developed during the recent #gemma competition on Kaggle.
📓 Training code: https://www.kaggle.com/code/anakin87/post-training-gemma-for-italian-and-beyond

🙏 Thanks @FinancialSupport and mii-llm for the help during evaluation.

3 replies

updated 4 Spaces 9 days ago

Running

🎸

Fact Checking rocks!

Running on Zero

💬🇮🇹

Phi 3.5 Mini ITA

Chat with an Italian Small Model

Running on Zero

💎🤏🇮🇹

Gemma 2 2B Neogenesis ITA

Chat with an Italian Small Model

Running on Zero

💎💪🇮🇹

Gemma 2 9B Neogenesis ITA

9B Italian strong model 💪

updated a model 9 days ago

anakin87/gemma-2-2b-neogenesis-ita

Text Generation • Updated 9 days ago • 1.34k • 5

liked a dataset 9 days ago

mii-llm/argilla-math-preferences-it

Viewer • Updated Jan 30, 2024 • 2.35k • 49 • 2

updated a model 9 days ago

anakin87/gemma-2-9b-neogenesis-ita

Text Generation • Updated 9 days ago • 1.74k • 7

liked a model 9 days ago

anakin87/gemma-2-9b-neogenesis-ita

Text Generation • Updated 9 days ago • 1.74k • 7

published a Space 9 days ago

Running on Zero

💎💪🇮🇹

Gemma 2 9B Neogenesis ITA

9B Italian strong model 💪

updated a collection 9 days ago

Gemma Neogenesis 💎🌍🇮🇹

Collection

Datasets and models for Neogenesis: Post-training recipe for improving Gemma 2 for a specific language. Notebook: https://t.ly/iuKdy • 11 items • Updated 7 days ago • 4

upvoted a collection 11 days ago

Gemma Neogenesis 💎🌍🇮🇹

Collection

Datasets and models for Neogenesis: Post-training recipe for improving Gemma 2 for a specific language. Notebook: https://t.ly/iuKdy • 11 items • Updated 7 days ago • 4

liked a Space 11 days ago

Running on Zero

💎🤏🇮🇹

Gemma 2 2B Neogenesis ITA

Chat with an Italian Small Model

reacted to tomaarsen's post with ❤️ 11 days ago

Post

4322

🏎️ Today I'm introducing a method to train static embedding models that run 100x to 400x faster on CPU than common embedding models, while retaining 85%+ of the quality! Including 2 fully open models: training scripts, datasets, metrics.

We apply our recipe to train 2 Static Embedding models that we release today! We release:
2️⃣ an English Retrieval model and a general-purpose Multilingual similarity model (e.g. classification, clustering, etc.), both Apache 2.0
🧠 my modern training strategy: ideation -> dataset choice -> implementation -> evaluation
📜 my training scripts, using the Sentence Transformers library
📊 my Weights & Biases reports with losses & metrics
📕 my list of 30 training and 13 evaluation datasets

The 2 Static Embedding models have the following properties:
🏎️ Extremely fast, e.g. 107500 sentences per second on a consumer CPU, compared to 270 for 'all-mpnet-base-v2' and 56 for 'gte-large-en-v1.5'
0️⃣ Zero active parameters: No Transformer blocks, no attention, not even a matrix multiplication. Super speed!
📏 No maximum sequence length! Embed texts at any length (note: longer texts may embed worse)
📐 Linear instead of exponential complexity: 2x longer text takes 2x longer, instead of 2.5x or more.
🪆 Matryoshka support: allow you to truncate embeddings with minimal performance loss (e.g. 4x smaller with a 0.56% perf. decrease for English Similarity tasks)

Check out the full blogpost if you'd like to 1) use these lightning-fast models or 2) learn how to train them with consumer-level hardware: https://huggingface.co/blog/static-embeddings

The blogpost contains a lengthy list of possible advancements; I'm very confident that our 2 models are only the tip of the iceberg, and we may be able to get even better performance.

Alternatively, check out the models:
* sentence-transformers/static-retrieval-mrl-en-v1
* sentence-transformers/static-similarity-mrl-multilingual-v1

1 reply