matlok
's Collections
Image Papers
updated
Visual Instruction Tuning
Paper
•
2304.08485
•
Published
•
13
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper
•
2311.05437
•
Published
•
45
Improved Baselines with Visual Instruction Tuning
Paper
•
2310.03744
•
Published
•
37
Aligning Large Multimodal Models with Factually Augmented RLHF
Paper
•
2309.14525
•
Published
•
29
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models
Paper
•
2309.09958
•
Published
•
18
Generate Anything Anywhere in Any Scene
Paper
•
2306.17154
•
Published
•
22
LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day
Paper
•
2306.00890
•
Published
•
10
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Paper
•
2401.01885
•
Published
•
27
Instruct-Imagen: Image Generation with Multi-modal Instruction
Paper
•
2401.01952
•
Published
•
30
High-Quality Image Restoration Following Human Instructions
Paper
•
2401.16468
•
Published
•
12
AI training resources for GLAM: a snapshot
Paper
•
2205.04738
•
Published
•
2
Agile But Safe: Learning Collision-Free High-Speed Legged Locomotion
Paper
•
2401.17583
•
Published
•
25
Reformulating Vision-Language Foundation Models and Datasets Towards
Universal Multimodal Assistants
Paper
•
2310.00653
•
Published
•
3
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large
Language Models
Paper
•
2402.05935
•
Published
•
15
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
Paper
•
2308.12966
•
Published
•
6
Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Paper
•
2402.08017
•
Published
•
24
Deep Residual Learning for Image Recognition
Paper
•
1512.03385
•
Published
•
6
Foundation Models for Generalist Geospatial Artificial Intelligence
Paper
•
2310.18660
•
Published
•
8
U-Net: Convolutional Networks for Biomedical Image Segmentation
Paper
•
1505.04597
•
Published
•
7
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video
Editing
Paper
•
2402.10294
•
Published
•
22
The boundary of neural network trainability is fractal
Paper
•
2402.06184
•
Published
•
4
Video ReCap: Recursive Captioning of Hour-Long Videos
Paper
•
2402.13250
•
Published
•
24
PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong
Vision-language Adapter
Paper
•
2402.10896
•
Published
•
14
Improving Robustness for Joint Optimization of Camera Poses and
Decomposed Low-Rank Tensorial Radiance Fields
Paper
•
2402.13252
•
Published
•
17