Hugo Laurençon's picture

Hugo Laurençon

HugoLaurencon

·

HugoLaurencon

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

Autonomy-of-Experts Models

upvoted a paper 17 days ago

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

upvoted a paper 20 days ago

Tensor Product Attention Is All You Need

View all activity

Articles

Docmatix - a huge dataset for Document Visual Question Answering

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Putting ethical principles at the core of research lifecycle

Organizations

HugoLaurencon's activity

upvoted a paper 11 days ago

Autonomy-of-Experts Models

Paper • 2501.13074 • Published 11 days ago • 40

upvoted a paper 17 days ago

Learnings from Scaling Visual Tokenizers for Reconstruction and Generation

Paper • 2501.09755 • Published 17 days ago • 33

upvoted a paper 20 days ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published 23 days ago • 79

upvoted 3 papers about 1 month ago

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 99

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 345

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 124

upvoted 4 papers about 2 months ago

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 139

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 106

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Paper • 2412.04626 • Published Dec 5, 2024 • 13

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

Paper • 2412.04280 • Published Dec 5, 2024 • 13

upvoted 2 papers 2 months ago

TÜLU 3: Pushing Frontiers in Open Language Model Post-Training

Paper • 2411.15124 • Published Nov 22, 2024 • 59

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 43

upvoted 4 papers 3 months ago

Watermark Anything with Localized Messages

Paper • 2411.07231 • Published Nov 11, 2024 • 20

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7, 2024 • 50

Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch

Paper • 2410.18693 • Published Oct 24, 2024 • 40

WAFFLE: Multi-Modal Model for Automated Front-End Development

Paper • 2410.18362 • Published Oct 24, 2024 • 12

upvoted 4 papers 4 months ago

MoH: Multi-Head Attention as Mixture-of-Head Attention

Paper • 2410.11842 • Published Oct 15, 2024 • 21

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 92

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15, 2024 • 17

Diversity-Rewarded CFG Distillation

Paper • 2410.06084 • Published Oct 8, 2024 • 10