Tom Aarsen's picture

Tom Aarsen

tomaarsen

·

https://linkedin.com/in/tomaarsen

AI & ML interests

NLP: text embeddings, information retrieval, named entity recognition, few-shot text classification

Recent Activity

liked a model about 2 hours ago

nickprock/Italian-ModernBERT-base-embed-mmarco-mnrl

updated a collection about 4 hours ago

NanoBEIR 🍺with BM25 Rankings

updated a collection about 4 hours ago

NanoBEIR 🍺with BM25 Rankings

View all activity

Organizations

tomaarsen's activity

upvoted a paper 4 days ago

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published 6 days ago • 31

upvoted 2 collections 6 days ago

ModernGLiClass

GLiClass with ModernBERT backbone • 2 items • Updated 6 days ago • 6

The Ultimate Collection of Code Classifiers

🔥 15 classifiers, 124M parameters, one per programming language— for assessing the educational value of GitHub code • 15 items • Updated 5 days ago • 10

upvoted an article 7 days ago

Article

Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥

8 days ago

• 89

upvoted 2 articles 12 days ago

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

14 days ago

• 49

Article

1 Billion Classifications

13 days ago

• 39

upvoted a collection 13 days ago

Nomic Embed v2

Multilingual Embedding Models • 4 items • Updated 10 days ago • 11

upvoted an article 14 days ago

Article

From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages

By

and 1 other •

14 days ago

• 25

upvoted an article 15 days ago

Article

Open R1: Update #2

By

and 6 others •

15 days ago

• 185

upvoted a paper 16 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published 21 days ago • 192

upvoted 2 collections 19 days ago

GTE ModernBERT

GTE Models Based on ModernBERT • 2 items • Updated Jan 21 • 15

GTE models

General Text Embedding Models Released by Tongyi Lab of Alibaba Group • 21 items • Updated Jan 21 • 23

upvoted an article 20 days ago

Article

Open-source DeepResearch – Freeing our search agents

22 days ago

• 1.1k

upvoted an article 21 days ago

Article

Agentic RAG Stack (1/5) - Index and retrieve documents for vector search using Sentence Transformers and DuckDB

By

•

29 days ago

• 18

upvoted an article 25 days ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 400

upvoted 2 articles 26 days ago

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

By

•

27 days ago

• 32

Article

State of open video generation models in Diffusers

30 days ago

• 46

upvoted 2 articles 27 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

29 days ago

• 773

Article

🚀 Build a Qwen 2.5 VL API endpoint with Hugging Face spaces and Docker!

By

•

28 days ago

• 17

upvoted a paper 28 days ago

SPLADE-v3: New baselines for SPLADE

Paper • 2403.06789 • Published Mar 11, 2024 • 2