Omar Sanseviero's picture

Omar Sanseviero

osanseviero

·

https://osanseviero.github.io/hackerllama/

AI & ML interests

Llamas, model merging, massive ASR for data collection, 3D ML, on-device ML, quantization, model judging, ML in browser, healthcare applications, education, intersection of art and ML.🦙

Recent Activity

liked a model about 15 hours ago

WiroAI/wiroai-turkish-llm-9b

liked a model about 15 hours ago

silma-ai/SILMA-Kashif-2B-Instruct-v1.0

liked a model 1 day ago

arcee-ai/Virtuoso-Medium-v2

View all activity

Articles

Llama can now see and run on your device - welcome Llama 3.2

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

WWDC 24: Running Mistral 7B with Core ML

How we leveraged distilabel to create an Argilla 2.0 Chatbot

Welcome Gemma 2 - Google's new open LLM

Welcome Llama 3 - Meta's new open LLM

CodeGemma - an official Google release for code LLMs

🪆 Introduction to Matryoshka Embedding Models

Welcome Gemma - Google's new open LLM

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

Inference for PROs

Spread Your Wings: Falcon 180B is here

Code Llama: Llama 2 learns to code

Results of the Open Source AI Game Jam

Llama 2 is here - get it on Hugging Face

The Falcon has landed in the Hugging Face ecosystem

Hugging Face Machine Learning Demos on arXiv

What's new in Diffusers? 🎨

Announcing Evaluation on the Hub

An Introduction to Deep Reinforcement Learning

Welcome spaCy to the 🤗 Hub

Sentence Transformers in the 🤗 Hub

Organizations

osanseviero's activity

upvoted an article 1 day ago

Article

Open-R1: Update #1

By

•

2 days ago

• 181

upvoted a paper 3 days ago

Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch

Paper • 2501.18512 • Published 4 days ago • 22

upvoted an article 6 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

7 days ago

• 587

upvoted an article 11 days ago

Article

Mastering Long Contexts in LLMs with KVPress

By

•

12 days ago

• 57

upvoted 2 papers 17 days ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published 21 days ago • 89

Enhancing Human-Like Responses in Large Language Models

Paper • 2501.05032 • Published 26 days ago • 49

upvoted a paper 19 days ago

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published 20 days ago • 271

upvoted a paper 22 days ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 89

upvoted a paper 25 days ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published 26 days ago • 252

upvoted a collection 25 days ago

KaLM-embedding

7 items • Updated 12 days ago • 22

upvoted a paper 25 days ago

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Paper • 2501.01028 • Published Jan 2 • 13

upvoted a collection 25 days ago

Cosmos

The collection of Cosmos models • 31 items • Updated 18 days ago • 254

upvoted a collection 29 days ago

Google's Gemma models family

243 items • Updated Dec 13, 2024 • 81

upvoted a collection about 1 month ago

🤖 Agents

21 items • Updated Dec 31, 2024 • 113

upvoted 6 papers about 1 month ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 73

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Paper • 2501.01257 • Published Jan 2 • 48

A3: Android Agent Arena for Mobile GUI Agents

Paper • 2501.01149 • Published Jan 2 • 22

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99

2 OLMo 2 Furious

Paper • 2501.00656 • Published Dec 31, 2024 • 16

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Paper • 2412.21199 • Published Dec 30, 2024 • 13