LeafInTheTree (Feuilleaubois)

upvoted a paper 22 days ago

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Paper • 2410.13754 • Published 24 days ago • 74

upvoted a collection about 1 month ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18 • 215

upvoted an article about 2 months ago

Article

Mixture of Experts Explained

Dec 11, 2023

• 183

upvoted 5 collections 2 months ago

upvoted a paper 2 months ago

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Paper • 2408.16768 • Published Aug 29 • 26

upvoted a collection 2 months ago

video

Collection

110 items • Updated 4 days ago • 3

upvoted 5 papers 2 months ago

CogVLM2: Visual Language Models for Image and Video Understanding

Paper • 2408.16500 • Published Aug 29 • 56

Law of Vision Representation in MLLMs

Paper • 2408.16357 • Published Aug 29 • 92

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Paper • 2408.15881 • Published Aug 28 • 20

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16 • 97

LLaVA-OneVision: Easy Visual Task Transfer

Paper • 2408.03326 • Published Aug 6 • 59

upvoted 3 collections 2 months ago

Multi-modality LVM

Collection

27 items • Updated Sep 6 • 1

Multimodal LLM

Collection

103 items • Updated 5 days ago • 3

multimodal

Collection

146 items • Updated 4 days ago • 4

upvoted a collection 3 months ago

MFM - Multimodal Foundation Models

Collection

23 items • Updated 27 days ago • 1

upvoted an article 3 months ago

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14

• 206

Feuilleaubois

AI & ML interests

Organizations

LeafInTheTree's activity

MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures

Moshi v0.1 Release

Mixture of Experts Explained

VisionLM

General Multimodal Learning

Marqo-FashionCLIP and Marqo-FashionSigLIP

Multimodal Benchmarks

3d

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

video

CogVLM2: Visual Language Models for Image and Video Understanding

Law of Vision Representation in MLLMs

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

LLaVA-OneVision: Easy Visual Task Transfer

Multi-modality LVM

Multimodal LLM

multimodal

MFM - Multimodal Foundation Models

PaliGemma – Google's Cutting-Edge Open Vision Language Model