Stefano Fiorucci

anakin87

AI & ML interests

Contributing to Haystack, the LLM Framework ๐Ÿ—๏ธ. NLP / LLMs.

Articles

Organizations

anakin87's activity

posted an update 16 days ago
view post
Post
999
๐Œ๐ฒ ๐Ÿ๐ข๐ซ๐ฌ๐ญ ๐œ๐จ๐ฆ๐ฆ๐ฎ๐ง๐ข๐ญ๐ฒ ๐š๐ซ๐ญ๐ข๐œ๐ฅ๐ž! ๐’๐ž๐ฅ๐ž๐œ๐ญ๐ข๐ฏ๐ž ๐Ÿ๐ข๐ง๐ž-๐ญ๐ฎ๐ง๐ข๐ง๐  ๐ฐ๐ข๐ญ๐ก ๐’๐ฉ๐ž๐œ๐ญ๐ซ๐ฎ๐ฆ ๐ŸŽฏ

Full walkthrough on how to get started with Spectrum and TRL for efficient fine-tuning.
๐Ÿ“” ๐Ÿ‘ฃ https://huggingface.co/blog/anakin87/spectrum

---

Looking to fine-tune Language Models efficiently and save on computational resources?

One popular method is QLoRa, which quantizes the original model and trains low-rank adapters on top.
It's quite effective and uses less GPU than full fine-tuning.

However, QLoRa applies Low-Rank Adaptation uniformly across the entire model.

What if we could identify the most informative layers and only fine-tune those? ๐Ÿค”

This is exactly what Spectrum does! ๐Ÿ‘‡

๐Ÿ”ฌ Spectrum analyzes the weight matrices for all layers in a Language Model and calculates a Signal to Noise Ratio (SNR) for each one.
(It uses Random Matrix Theory and Marchenko-Pastur distribution to distinguish signal from noise.)

๐ŸŽฏ Based on a chosen percentage (say, 25%), Spectrum selects the most informative layers of each type (mlp.down_proj, self_attn.o_proj, etc.).

You can then โ„๏ธ freeze the rest of the model and focus your ๐Ÿ‹๏ธโ€โ™‚๏ธ training on the chosen layers.


๐Ÿ† Results/Evaluation
- Spectrum is competitive with full fine-tuning and beats QLoRA on benchmarks.
- While QLoRA is more memory-efficient on a single GPU, Spectrum shines in distributed training setups.
- Great models trained with Spectrum: Dolphin models, Llama 3.1 Storm, numerous models by VAGO Solutions...

---

For a practical guide, check out the article above.
replied to their post 21 days ago
posted an update 21 days ago
view post
Post
1575
๐Ÿ’ฌ ๐Ÿ‡ฎ๐Ÿ‡น Phi 3.5 mini ITA: a Small Language Model for Italian

Lately, I've spent some time fine-tuning language models.

Now I am happy to release Phi 3.5 mini ITA: a fine-tuned version of Phi-3.5-mini-instruct to improve performance on the Italian language

๐Ÿ”น Small (3.82 B parameters) but capable model
๐Ÿ”น 128k context length

Chat with it on ๐Ÿค— Spaces: anakin87/Phi-3.5-mini-ITA
Model card: anakin87/Phi-3.5-mini-ITA

๐Ÿ—ƒ๏ธ Data
Supervised fine-tuning using a good mix of English and Italian data:
- mlabonne/FineTome-100k by @mlabonne
- efederici/capybara-claude-15k-ita by @efederici
๐Ÿ™ Thanks to the authors for the datasets.


๐ŸŽฏ Targeted training with Spectrum
I used Spectrum, a relatively new technique for parameter-efficient learning.
The idea is to train only the layers of the model with high Signal-to-Noise Ratio (SNR) and โ„๏ธ freeze the rest.
I trained the top 30% of model layers.

๐Ÿ“ Spectrum paper: https://arxiv.org/abs/2406.06623


๐Ÿ“Š Vibe check and performance on Italian benchmarks seem encouraging
  • 2 replies
ยท
replied to grimjim's post 3 months ago
posted an update 3 months ago
view post
Post
1028
How to alter the behavior of a Language Model without fine-tuning or prompting? Say hello to ๐ŸŽค yo-Llama ๐Ÿฆ™!

Model anakin87/yo-Llama-3-8B-Instruct

This experiment steers Llama-3-8B-Instruct to respond in a rap style.
How? Amplifying the rap direction in the activation space. ๐Ÿ˜Ž


๐–๐ก๐š๐ญ ๐ฌ๐ฉ๐š๐ซ๐ค๐ž๐ ๐ญ๐ก๐ข๐ฌ ๐ข๐๐ž๐š?

Lately, I got interested in mechanistic interpretability of LLMs.

๐Ÿ’ก A recent paper, "Refusal in Language Models Is Mediated by a Single Direction," showed how to find the refusal direction in the activation space of Chat Language Models and either erase or amplify it.
A clever jailbreak method for open weights models.

Then, @failspy took it a step further by modifying the models to amplify different traits, such as making a model seem grumpy or irritable.


๐‡๐จ๐ฐ ๐๐ข๐ ๐ˆ ๐œ๐ซ๐ž๐š๐ญ๐ž ๐ฒ๐จ-๐‹๐ฅ๐š๐ฆ๐š?
(๐Ÿ““ notebook in the HF repository, heavily inspired by Failspy's work)

1๏ธโƒฃ Load the Llama-3-8B-Instruct model.
2๏ธโƒฃ Load 1024 examples from Alpaca (instruction dataset).
3๏ธโƒฃ Prepare a system prompt to make the original model act like a rapper.
4๏ธโƒฃ Run inference on the examples, with and without the system prompt, and cache the activations.
5๏ธโƒฃ Compute the rap feature directions (one for each layer) from the activations.
6๏ธโƒฃ Apply the feature directions one by one, checking the results on some examples.
7๏ธโƒฃ Pick the best-performing feature direction.
8๏ธโƒฃ Apply this feature direction and voilร !
yo-Llama-3-8B-Instruct is born! ๐Ÿฅณ๐ŸŽถ

This was a fun experiment.


๐Ÿ“š Resources

Refusal in Language Models Is Mediated by a Single Direction - https://arxiv.org/abs/2406.11717

Uncensor any LLM with abliteration: great practical blog post by @mlabonne https://huggingface.co/blog/mlabonne/abliteration

Practical materials by @failspy
- abliterator library https://github.com/FailSpy/abliterator
- Llama-MopeyMule-3-8B-Instruct model (+ notebook) failspy/Llama-3-8B-Instruct-MopeyMule
posted an update 3 months ago
view post
Post
1648
๐ŸŒŒ Creating adventures with local LLMs

What if ๐Ÿค”... Homer Simpson met Spider-Man and they went on a quest for donuts? ๐Ÿฉ
Or if Fred Astaire and Corporal Hicks teamed up to fight xenomorphs? ๐Ÿ‘พ

In the words of Karpathy, LLMs are dream machines...
they seem specially made to simulate these wild scenarios!

๐„๐ฑ๐ฉ๐ž๐ซ๐ข๐ฆ๐ž๐ง๐ญ๐ข๐ง๐  ๐ฐ๐ข๐ญ๐ก ๐ญ๐ก๐ข๐ฌ ๐ข๐๐ž๐š ๐Ÿ‘‡
Nous Research / @teknium recently released NousResearch/CharacterCodex:
a massive dataset with information on 16k characters, both fictional and real.
I couldn't wait to play it...

After a few attempts, I found that combining the information in this dataset with a good model (like meta-llama/Meta-Llama-3-8B-Instruct) opens the doors to a myriad of chat adventures.

๐Ÿ› ๏ธ Stack:
๐Ÿ”นHaystack for orchestration ๐Ÿ—๏ธ
๐Ÿ”นllamafile ๐Ÿฆ™๐Ÿ—‚๏ธ to run our model locally.

๐Ÿ““ Check out the notebook: https://t.ly/y6jrZ
(includes a bonus ๐Ÿ•ต๏ธ Mystery Character Quiz)
posted an update 3 months ago
view post
Post
915
๐Ÿงช RAG Evaluation with ๐Ÿ”ฅ Prometheus 2 + Haystack

๐Ÿ“ Blog post: https://haystack.deepset.ai/blog/rag-evaluation-with-prometheus-2
๐Ÿ““ Notebook: https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/prometheus2_evaluation.ipynb

โ”€โ”€โ”€ โ‹†โ‹…โ˜†โ‹…โ‹† โ”€โ”€โ”€

When evaluating LLMs' responses, ๐ฉ๐ซ๐จ๐ฉ๐ซ๐ข๐ž๐ญ๐š๐ซ๐ฒ ๐ฆ๐จ๐๐ž๐ฅ๐ฌ like GPT-4 are commonly used due to their strong performance.
However, relying on closed models presents challenges related to data privacy ๐Ÿ”’, transparency, controllability, and cost ๐Ÿ’ธ.

On the other hand, ๐จ๐ฉ๐ž๐ง ๐ฆ๐จ๐๐ž๐ฅ๐ฌ typically do not correlate well with human judgments and lack flexibility.


๐Ÿ”ฅ Prometheus 2 is a new family of open-source models designed to address these gaps:
๐Ÿ”น two variants: prometheus-eval/prometheus-7b-v2.0; prometheus-eval/prometheus-8x7b-v2.0
๐Ÿ”น trained on open-source data
๐Ÿ”น high correlation with human evaluations and proprietary models
๐Ÿ”น highly flexible: capable of performing direct assessments and pairwise rankings, and allowing the definition of custom evaluation criteria.

See my experiments with RAG evaluation in the links above.
posted an update 4 months ago
view post
Post
2076
โš™๏ธ Prompt Optimization with Haystack and DSPy

Experimental notebook: ๐Ÿงช๐Ÿ““ https://github.com/deepset-ai/haystack-cookbook/blob/main/notebooks/prompt_optimization_with_dspy.ipynb

When building applications with LLMs, writing effective prompts is a long process of trial and error. ๐Ÿ”„
Often, if you switch models, you also have to change the prompt. ๐Ÿ˜ฉ
What if you could automate this process?


๐Ÿ’ก That's where DSPy comes in - a framework designed to algorithmically optimize prompts for Language Models.
By applying classical machine learning concepts (training and evaluation data, metrics, optimization), DSPy generates better prompts for a given model and task.


Recently, I explored combining DSPy with the robustness of Haystack Pipelines.

Here's how it works:
โ–ถ๏ธ Start from a Haystack RAG pipeline with a basic prompt
๐ŸŽฏ Define a goal (in this case, get correct and concise answers)
๐Ÿ“Š Create a DSPy program, define data and metrics
โœจ Optimize and evaluate -> improved prompt
๐Ÿš€ Build a refined Haystack RAG pipeline using the optimized prompt
  • 1 reply
ยท
posted an update 4 months ago
view post
Post
1277
Do you want to play a game against Llama 3? ๐Ÿฆ™๐Ÿฆ™๐Ÿฆ™

Meet ๐Ÿง‘โ€๐Ÿซ ๐€๐ฎ๐ญ๐จ๐๐ฎ๐ข๐ณ๐ณ๐ž๐ซ, a new LLM application that you can use for learning or just for fun.

Try it out on Hugging Face Spaces ๐Ÿค— deepset/autoquizzer

๐‡๐จ๐ฐ ๐ข๐ญ ๐ฐ๐จ๐ซ๐ค๐ฌ
You provide an URL -> A multiple-choice quiz is instantly generated.

๐Ÿ”น You can play the quiz yourself.

๐Ÿ”น You can let the LLM play in two different ways
๐Ÿ“• Closed book: the LLM responds only by knowing the general topic and using its parametric knowledge and reasoning abilities.
๐Ÿ”Ž๐ŸŒ Web RAG: for each question, a Google search is done and the top 3 snippets are included in the prompt for the LLM.

๐’๐ญ๐š๐œ๐ค
๐Ÿ—๏ธ Haystack LLM framework https://haystack.deepset.ai/
๐Ÿฆ™ Llama 3 8B Instruct
โšก Groq

Original idea: @Tuana
  • 1 reply
ยท