Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2309.10020

Multimodal Model (LLM)

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40
Language as the Medium: Multimodal Video Classification through text only

Paper • 2309.10783 • Published Sep 19, 2023 • 1
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27 • 44
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52

Multimodal Papers

From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations

Paper • 2401.01885 • Published Jan 3 • 27
Media2Face: Co-speech Facial Animation Generation With Multi-Modality Guidance

Paper • 2401.15687 • Published Jan 28 • 21
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Paper • 2312.17172 • Published Dec 28, 2023 • 26
MouSi: Poly-Visual-Expert Vision-Language Models

Paper • 2401.17221 • Published Jan 30 • 7

AugmentedLearning

What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning

Paper • 2312.15685 • Published Dec 25, 2023 • 17
mistralai/Mixtral-8x7B-Instruct-v0.1

Text Generation • Updated Aug 19 • 742k • • 4.19k
microsoft/phi-2

Text Generation • Updated Apr 29 • 242k • 3.24k
TinyLlama/TinyLlama-1.1B-Chat-v1.0

Text Generation • Updated Mar 17 • 1.24M • 1.09k

VLMs for 3D reconstructions and their evaluation

List of papers to help with developing a model that reviews a photogrammetry scan and evaluates its quality

ImageBind: One Embedding Space To Bind Them All

Paper • 2305.05665 • Published May 9, 2023 • 3
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth

Paper • 2302.12288 • Published Feb 23, 2023
HuggingFaceM4/howto100m

Updated May 18, 2022 • 27 • 4
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Paper • 2201.12086 • Published Jan 28, 2022 • 3

Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 14
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models

Paper • 2310.14566 • Published Oct 23, 2023 • 25
SILC: Improving Vision Language Pretraining with Self-Distillation

Paper • 2310.13355 • Published Oct 20, 2023 • 6
Conditional Diffusion Distillation

Paper • 2310.01407 • Published Oct 2, 2023 • 20

Advanced and Recent Papers

Advanced and recent papers about deep learning. Please send your recommend paper to email: [email protected]

AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models

Paper • 2309.16414 • Published Sep 28, 2023 • 19
Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Paper • 2309.13018 • Published Sep 22, 2023 • 9
Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 23
Language models in molecular discovery

Paper • 2309.16235 • Published Sep 28, 2023 • 10

多模特基础模型

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40

Multimodal Foundation Models: From Specialists to General-Purpose Assistants

Paper • 2309.10020 • Published Sep 18, 2023 • 40
Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 55
Jointly Training Large Autoregressive Multimodal Models

Paper • 2309.15564 • Published Sep 27, 2023 • 8

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs