Imotech
's Collections
LinFusion: 1 GPU, 1 Minute, 16K Image
Paper
•
2409.02097
•
Published
•
32
Phidias: A Generative Model for Creating 3D Content from Text, Image,
and 3D Conditions with Reference-Augmented Diffusion
Paper
•
2409.11406
•
Published
•
25
Diffusion Models Are Real-Time Game Engines
Paper
•
2408.14837
•
Published
•
121
Segment Anything with Multiple Modalities
Paper
•
2408.09085
•
Published
•
21
MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction
Model
Paper
•
2408.10198
•
Published
•
32
Open-FinLLMs: Open Multimodal Large Language Models for Financial
Applications
Paper
•
2408.11878
•
Published
•
52
ControlNeXt: Powerful and Efficient Control for Image and Video
Generation
Paper
•
2408.06070
•
Published
•
53
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
•
2408.06195
•
Published
•
63
DC3DO: Diffusion Classifier for 3D Objects
Paper
•
2408.06693
•
Published
•
10
Learning Task Decomposition to Assist Humans in Competitive Programming
Paper
•
2406.04604
•
Published
•
4
Task-oriented Sequential Grounding in 3D Scenes
Paper
•
2408.04034
•
Published
•
8
Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from
User's Casual Sketches
Paper
•
2408.04567
•
Published
•
24
CLAY: A Controllable Large-scale Generative Model for Creating
High-quality 3D Assets
Paper
•
2406.13897
•
Published
•
12
Streetscapes: Large-scale Consistent Street View Generation Using
Autoregressive Video Diffusion
Paper
•
2407.13759
•
Published
•
17
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation
Paper
•
2407.14931
•
Published
•
20
OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any
Person
Paper
•
2407.16224
•
Published
•
27
DistilDIRE: A Small, Fast, Cheap and Lightweight Diffusion Synthesized
Deepfake Detection
Paper
•
2406.00856
•
Published
•
11
OpenDevin: An Open Platform for AI Software Developers as Generalist
Agents
Paper
•
2407.16741
•
Published
•
68
3D Question Answering for City Scene Understanding
Paper
•
2407.17398
•
Published
•
22
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Paper
•
2407.20229
•
Published
•
7
SAM 2: Segment Anything in Images and Videos
Paper
•
2408.00714
•
Published
•
109
RelBench: A Benchmark for Deep Learning on Relational Databases
Paper
•
2407.20060
•
Published
•
7
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented
Generation
Paper
•
2408.02545
•
Published
•
35
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh
Tokenization
Paper
•
2408.02555
•
Published
•
28
Synthesizing Text-to-SQL Data from Weak and Strong LLMs
Paper
•
2408.03256
•
Published
•
10
LLaVA-OneVision: Easy Visual Task Transfer
Paper
•
2408.03326
•
Published
•
59
Transformer Explainer: Interactive Learning of Text-Generative Models
Paper
•
2408.04619
•
Published
•
155
FruitNeRF: A Unified Neural Radiance Field based Fruit Counting
Framework
Paper
•
2408.06190
•
Published
•
17
Diversity Empowers Intelligence: Integrating Expertise of Software
Engineering Agents
Paper
•
2408.07060
•
Published
•
40
Paper
•
2408.07009
•
Published
•
61
TableBench: A Comprehensive and Complex Benchmark for Table Question
Answering
Paper
•
2408.09174
•
Published
•
51
LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation
Paper
•
2408.13252
•
Published
•
24
MuCodec: Ultra Low-Bitrate Music Codec
Paper
•
2409.13216
•
Published
•
22
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
135
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary
Resolution
Paper
•
2409.12961
•
Published
•
24
FlexiTex: Enhancing Texture Generation with Visual Guidance
Paper
•
2409.12431
•
Published
•
11
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt
Paper
•
2409.12892
•
Published
•
5
SpaceBlender: Creating Context-Rich Collaborative Spaces Through
Generative 3D Scene Blending
Paper
•
2409.13926
•
Published
•
5
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language
Instructions
Paper
•
2409.15278
•
Published
•
22
Improvements to SDXL in NovelAI Diffusion V3
Paper
•
2409.15997
•
Published
•
11
Programming Every Example: Lifting Pre-training Data Quality like
Experts at Scale
Paper
•
2409.17115
•
Published
•
60
LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with
3D-awareness
Paper
•
2409.18125
•
Published
•
33
Game4Loc: A UAV Geo-Localization Benchmark from Game Data
Paper
•
2409.16925
•
Published
•
6
DressRecon: Freeform 4D Human Reconstruction from Monocular Video
Paper
•
2409.20563
•
Published
•
7
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image
Restoration
Paper
•
2410.00418
•
Published
•
9
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D
Semantic MPIs
Paper
•
2410.00337
•
Published
•
10
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model
And Input View Curation
Paper
•
2410.00890
•
Published
•
18
Law of the Weakest Link: Cross Capabilities of Large Language Models
Paper
•
2409.19951
•
Published
•
53
Illustrious: an Open Advanced Illustration Model
Paper
•
2409.19946
•
Published
•
13
From Code to Correctness: Closing the Last Mile of Code Generation with
Hierarchical Debugging
Paper
•
2410.01215
•
Published
•
30
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and
Box-Focused Sampling for 3D Object Detection
Paper
•
2410.01647
•
Published
•
28
Addition is All You Need for Energy-efficient Language Models
Paper
•
2410.00907
•
Published
•
144
MIGA: Mixture-of-Experts with Group Aggregation for Stock Market
Prediction
Paper
•
2410.02241
•
Published
•
6
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot
Interaction
Paper
•
2410.01273
•
Published
•
9
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive
Transformer for Efficient Finegrained Image Generation
Paper
•
2410.01912
•
Published
•
13
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning
Trajectories Search
Paper
•
2410.03864
•
Published
•
10
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent
Approach
Paper
•
2410.06949
•
Published
•
5
Data Selection via Optimal Control for Language Models
Paper
•
2410.07064
•
Published
•
8
IterComp: Iterative Composition-Aware Feedback Learning from Model
Gallery for Text-to-Image Generation
Paper
•
2410.07171
•
Published
•
41
Does Spatial Cognition Emerge in Frontier Models?
Paper
•
2410.06468
•
Published
•
2
MLLM as Retriever: Interactively Learning Multimodal Retrieval for
Embodied Agents
Paper
•
2410.03450
•
Published
•
36
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Paper
•
2410.08164
•
Published
•
24
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers
in LLMs
Paper
•
2410.05265
•
Published
•
29
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large
Multimodal Models
Paper
•
2410.09732
•
Published
•
54
Toward General Instruction-Following Alignment for Retrieval-Augmented
Generation
Paper
•
2410.09584
•
Published
•
47
Animate-X: Universal Character Image Animation with Enhanced Motion
Representation
Paper
•
2410.10306
•
Published
•
54
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Paper
•
2410.10563
•
Published
•
38
Semantic Image Inversion and Editing using Rectified Stochastic
Differential Equations
Paper
•
2410.10792
•
Published
•
29
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality
Documents
Paper
•
2410.10594
•
Published
•
24
Rethinking Data Selection at Scale: Random Selection is Almost All You
Need
Paper
•
2410.09335
•
Published
•
16
Baichuan-Omni Technical Report
Paper
•
2410.08565
•
Published
•
84
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
Paper
•
2410.09037
•
Published
•
4
SuperCorrect: Supervising and Correcting Language Models with
Error-Driven Insights
Paper
•
2410.09008
•
Published
•
16
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining
Paper
•
2410.08102
•
Published
•
19
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via
Inference-time Hybrid Information Structurization
Paper
•
2410.08815
•
Published
•
43
From Generalist to Specialist: Adapting Vision Language Models via
Task-Specific Visual Instruction Tuning
Paper
•
2410.06456
•
Published
•
35
Meissonic: Revitalizing Masked Generative Transformers for Efficient
High-Resolution Text-to-Image Synthesis
Paper
•
2410.08261
•
Published
•
49
FlatQuant: Flatness Matters for LLM Quantization
Paper
•
2410.09426
•
Published
•
12
Harnessing Webpage UIs for Text-Rich Visual Understanding
Paper
•
2410.13824
•
Published
•
29
MobA: A Two-Level Agent System for Efficient Mobile Task Automation
Paper
•
2410.13757
•
Published
•
31
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts
as Your Personalized Assistant
Paper
•
2410.13360
•
Published
•
8
AERO: Softmax-Only LLMs for Efficient Private Inference
Paper
•
2410.13060
•
Published
•
4
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures
Paper
•
2410.13754
•
Published
•
74
MagicTailor: Component-Controllable Personalization in Text-to-Image
Diffusion Models
Paper
•
2410.13370
•
Published
•
35
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning
via Image-Guided Diffusion
Paper
•
2410.13674
•
Published
•
15
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework
for Talking Head Video Generation
Paper
•
2410.13726
•
Published
•
10
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Paper
•
2410.10812
•
Published
•
15
AutoTrain: No-code training for state-of-the-art models
Paper
•
2410.15735
•
Published
•
58
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper
•
2410.13861
•
Published
•
52
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation
Paper
•
2410.14745
•
Published
•
45
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages
Paper
•
2410.16153
•
Published
•
43
Meta-Chunking: Learning Efficient Text Segmentation via Logical
Perception
Paper
•
2410.12788
•
Published
•
22
DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes
Paper
•
2410.18084
•
Published
•
13
Lightweight Neural App Control
Paper
•
2410.17883
•
Published
•
9
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
Paper
•
2410.13924
•
Published
•
6
LOGO -- Long cOntext aliGnment via efficient preference Optimization
Paper
•
2410.18533
•
Published
•
42
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis
from Scratch
Paper
•
2410.18693
•
Published
•
40
Framer: Interactive Frame Interpolation
Paper
•
2410.18978
•
Published
•
36
Unbounded: A Generative Infinite Game of Character Life Simulation
Paper
•
2410.18975
•
Published
•
35
Distill Visual Chart Reasoning Ability from LLMs to MLLMs
Paper
•
2410.18798
•
Published
•
19
WAFFLE: Multi-Modal Model for Automated Front-End Development
Paper
•
2410.18362
•
Published
•
11
mistralai/Pixtral-12B-Base-2409
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite
Learning
Paper
•
2410.19290
•
Published
•
10
Continuous Speech Synthesis using per-token Latent Diffusion
Paper
•
2410.16048
•
Published
•
29
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
Paper
•
2410.19168
•
Published
•
19
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Paper
•
2404.16710
•
Published
•
75
Paper
•
2410.21276
•
Published
•
82
Neural Fields in Robotics: A Survey
Paper
•
2410.20220
•
Published
•
4
Vision Search Assistant: Empower Vision-Language Models as Multimodal
Search Engines
Paper
•
2410.21220
•
Published
•
10
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized
Generalist Computer Assistant
Paper
•
2410.18603
•
Published
•
32
DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe
Dataset Curation
Paper
•
2410.18666
•
Published
•
19
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM
Inference
Paper
•
2410.21465
•
Published
•
11
CLEAR: Character Unlearning in Textual and Visual Modalities
Paper
•
2410.18057
•
Published
•
200
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science
Competitions
Paper
•
2410.20424
•
Published
•
39
TokenFormer: Rethinking Transformer Scaling with Tokenized Model
Parameters
Paper
•
2410.23168
•
Published
•
24
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
of Neural Networks
Paper
•
2410.20650
•
Published
•
16
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
Autoencoders
Paper
•
2410.22366
•
Published
•
77
Language Models can Self-Lengthen to Generate Long Texts
Paper
•
2410.23933
•
Published
•
17
SelfCodeAlign: Self-Alignment for Code Generation
Paper
•
2410.24198
•
Published
•
21
Navigating the Unknown: A Chat-Based Collaborative Interface for
Personalized Exploratory Tasks
Paper
•
2410.24032
•
Published
•
9
M2rc-Eval: Massively Multilingual Repository-level Code Completion
Evaluation
Paper
•
2410.21157
•
Published
•
6
Face Anonymization Made Simple
Paper
•
2411.00762
•
Published
•
7
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level
and Fidelity-Rich Conditions in Diffusion Models
Paper
•
2410.22901
•
Published
•
8
CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for
Large-Scale Scenes
Paper
•
2411.00771
•
Published
•
9
AndroidLab: Training and Systematic Benchmarking of Android Autonomous
Agents
Paper
•
2410.24024
•
Published
•
48
Training-free Regional Prompting for Diffusion Transformers
Paper
•
2411.02395
•
Published
•
25
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM
Quantization
Paper
•
2411.02355
•
Published
•
46
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Paper
•
2411.02336
•
Published
•
23
GarVerseLOD: High-Fidelity 3D Garment Reconstruction from a Single
In-the-Wild Image using a Dataset with Levels of Details
Paper
•
2411.03047
•
Published
•
8
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge
in RAG Systems
Paper
•
2411.02959
•
Published
•
64
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for
Image-to-Video Generation
Paper
•
2411.04709
•
Published
•
25
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Paper
•
2411.04905
•
Published
•
111
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced
Code-Mixed Information Retrieval
Paper
•
2411.04752
•
Published
•
16
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion
Models
Paper
•
2411.05007
•
Published
•
16
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page
Multi-document Understanding
Paper
•
2411.04952
•
Published
•
28
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper
•
2411.04965
•
Published
•
63
M-Longdoc: A Benchmark For Multimodal Super-Long Document Understanding
And A Retrieval-Aware Tuning Framework
Paper
•
2411.06176
•
Published
•
44
Edify Image: High-Quality Image Generation with Pixel Space Laplacian
Diffusion Models
Paper
•
2411.07126
•
Published
•
28
OmniEdit: Building Image Editing Generalist Models Through Specialist
Supervision
Paper
•
2411.07199
•
Published
•
45
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Paper
•
2411.04954
•
Published
•
8
PramaLLC/BEN
Image Segmentation
•
Updated
•
236
•
78
SAMPart3D: Segment Any Part in 3D Objects
Paper
•
2411.07184
•
Published
•
26
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Paper
•
2411.09595
•
Published
•
71
MagicQuill: An Intelligent Interactive Image Editing System
Paper
•
2411.09703
•
Published
•
57
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
•
2411.08147
•
Published
•
62
GenXD: Generating Any 3D and 4D Scenes
Paper
•
2411.02319
•
Published
•
20
LLaVA-o1: Let Vision Language Models Reason Step-by-Step
Paper
•
2411.10440
•
Published
•
111
SmoothCache: A Universal Inference Acceleration Technique for Diffusion
Transformers
Paper
•
2411.10510
•
Published
•
8
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of
Experts
Paper
•
2411.10669
•
Published
•
10
SlimLM: An Efficient Small Language Model for On-Device Document
Assistance
Paper
•
2411.09944
•
Published
•
12
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
Language Models on Mobile Devices
Paper
•
2411.10640
•
Published
•
44
Continuous Speculative Decoding for Autoregressive Image Generation
Paper
•
2411.11925
•
Published
•
15
RedPajama: an Open Dataset for Training Large Language Models
Paper
•
2411.12372
•
Published
•
47
ITACLIP: Boosting Training-Free Semantic Segmentation with Image, Text,
and Architectural Enhancements
Paper
•
2411.12044
•
Published
•
13
Building Trust: Foundations of Security, Safety and Transparency in AI
Paper
•
2411.12275
•
Published
•
10
SEAGULL: No-reference Image Quality Assessment for Regions of Interest
via Vision-Language Instruction Tuning
Paper
•
2411.10161
•
Published
•
8
SageAttention2 Technical Report: Accurate 4 Bit Attention for
Plug-and-play Inference Acceleration
Paper
•
2411.10958
•
Published
•
50
SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking
with Motion-Aware Memory
Paper
•
2411.11922
•
Published
•
18
Ultra-Sparse Memory Network
Paper
•
2411.12364
•
Published
•
19
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented
LMs
Paper
•
2411.14199
•
Published
•
28
Natural Language Reinforcement Learning
Paper
•
2411.14251
•
Published
•
26
Patience Is The Key to Large Language Model Reasoning
Paper
•
2411.13082
•
Published
•
7
DINO-X: A Unified Vision Model for Open-World Object Detection and
Understanding
Paper
•
2411.14347
•
Published
•
13
MagicDriveDiT: High-Resolution Long Video Generation for Autonomous
Driving with Adaptive Control
Paper
•
2411.13807
•
Published
•
11
Hymba: A Hybrid-head Architecture for Small Language Models
Paper
•
2411.13676
•
Published
•
38
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
•
2411.14405
•
Published
•
57
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
•
2411.15124
•
Published
•
55
Large Multi-modal Models Can Interpret Features in Large Multi-modal
Models
Paper
•
2411.14982
•
Published
•
15
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained
Video Reasoning via Core Frame Selection
Paper
•
2411.14794
•
Published
•
11
MyTimeMachine: Personalized Facial Age Transformation
Paper
•
2411.14521
•
Published
•
20
Adapting Vision Foundation Models for Robust Cloud Segmentation in
Remote Sensing Images
Paper
•
2411.13127
•
Published
•
4
Material Anything: Generating Materials for Any 3D Object via Diffusion
Paper
•
2411.15138
•
Published
•
42
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper
•
2411.17465
•
Published
•
76
Learning 3D Representations from Procedural 3D Programs
Paper
•
2411.17467
•
Published
•
8
TEXGen: a Generative Diffusion Model for Mesh Textures
Paper
•
2411.14740
•
Published
•
15
ROICtrl: Boosting Instance Control for Visual Generation
Paper
•
2411.17949
•
Published
•
82
DreamCache: Finetuning-Free Lightweight Personalized Image Generation
via Feature Caching
Paper
•
2411.17786
•
Published
•
12
Adaptive Blind All-in-One Image Restoration
Paper
•
2411.18412
•
Published
•
4
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context
Learning via MCTS
Paper
•
2411.18478
•
Published
•
32
GRAPE: Generalizing Robot Policy via Preference Alignment
Paper
•
2411.19309
•
Published
•
42
FAM Diffusion: Frequency and Attention Modulation for High-Resolution
Image Generation with Stable Diffusion
Paper
•
2411.18552
•
Published
•
17
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Paper
•
2411.19146
•
Published
•
13
MATATA: a weak-supervised MAthematical Tool-Assisted reasoning for
Tabular Applications
Paper
•
2411.18915
•
Published
•
8
Reverse Thinking Makes LLMs Stronger Reasoners
Paper
•
2411.19865
•
Published
•
19
LLM Teacher-Student Framework for Text Classification With No Manually
Annotated Data: A Case Study in IPTC News Topic Classification
Paper
•
2411.19638
•
Published
•
6
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Paper
•
2411.19842
•
Published
•
10
TinyFusion: Diffusion Transformers Learned Shallow
Paper
•
2412.01199
•
Published
•
14
o1-Coder: an o1 Replication for Coding
Paper
•
2412.00154
•
Published
•
40
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction
with 3D Autonomous Characters
Paper
•
2412.00174
•
Published
•
22
The Well: a Large-Scale Collection of Diverse Physics Simulations for
Machine Learning
Paper
•
2412.00568
•
Published
•
14
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision
Language Models
Paper
•
2412.01822
•
Published
•
14
Art-Free Generative Models: Art Creation Without Graphic Art Knowledge
Paper
•
2412.00176
•
Published
•
8
HUGSIM: A Real-Time, Photo-Realistic and Closed-Loop Simulator for
Autonomous Driving
Paper
•
2412.01718
•
Published
•
2
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual
Preferences
Paper
•
2412.01292
•
Published
•
11
SNOOPI: Supercharged One-step Diffusion Distillation with Proper
Guidance
Paper
•
2412.02687
•
Published
•
109
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper
•
2412.03555
•
Published
•
118
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
Paper
•
2412.03515
•
Published
•
25
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic
Adversarial Training
Paper
•
2412.02030
•
Published
•
18
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
Paper
•
2412.03558
•
Published
•
14
CleanDIFT: Diffusion Features without Noise
Paper
•
2412.03439
•
Published
•
12
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Paper
•
2412.03085
•
Published
•
12
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and
Proactive Robotic Failure Detection
Paper
•
2412.04455
•
Published
•
35
Around the World in 80 Timesteps: A Generative Approach to Global Visual
Geolocation
Paper
•
2412.06781
•
Published
•
18
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Paper
•
2412.07744
•
Published
•
19
Are Your LLMs Capable of Stable Reasoning?
Paper
•
2412.13147
•
Published
•
87
GeoX: Geometric Problem Solving Through Unified Formalized
Vision-Language Pre-training
Paper
•
2412.11863
•
Published
•
2
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot
Learning
Paper
•
2412.10447
•
Published
•
5
The Open Source Advantage in Large Language Models (LLMs)
Paper
•
2412.12004
•
Published
•
9
Smaller Language Models Are Better Instruction Evolvers
Paper
•
2412.11231
•
Published
•
24
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
Evidence within Generation
Paper
•
2412.11919
•
Published
•
33
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
74
Multi-Dimensional Insights: Benchmarking Real-World Personalization in
Large Multimodal Models
Paper
•
2412.12606
•
Published
•
41
Compressed Chain of Thought: Efficient Reasoning Through Dense
Representations
Paper
•
2412.13171
•
Published
•
30
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal
Retrieval-Augmented Generation
Paper
•
2412.10704
•
Published
•
14
Seeker: Towards Exception Safety Code Generation with Intermediate
Language Agents Framework
Paper
•
2412.11713
•
Published
•
3
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
Generation for Preference Alignment
Paper
•
2412.13746
•
Published
•
8
Paper
•
2412.08905
•
Published
•
92
Evaluating and Aligning CodeLLMs on Human Preference
Paper
•
2412.05210
•
Published
•
47
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Paper
•
2412.09501
•
Published
•
43
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity
Visual Descriptions
Paper
•
2412.08737
•
Published
•
51
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
•
2412.09626
•
Published
•
19
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded
Attributions and Learning to Refuse
Paper
•
2409.11242
•
Published
•
5
GenEx: Generating an Explorable World
Paper
•
2412.09624
•
Published
•
84
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper
•
2412.13663
•
Published
•
103
AniDoc: Animation Creation Made Easier
Paper
•
2412.14173
•
Published
•
47
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World
Tasks
Paper
•
2412.14161
•
Published
•
43
Thinking in Space: How Multimodal Large Language Models See, Remember,
and Recall Spaces
Paper
•
2412.14171
•
Published
•
22
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
Post-LN
Paper
•
2412.13795
•
Published
•
18
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic
Long-context Multitasks
Paper
•
2412.15204
•
Published
•
30
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper
•
2412.14475
•
Published
•
51
Progressive Multimodal Reasoning via Active Retrieval
Paper
•
2412.14835
•
Published
•
66
Paper
•
2412.15115
•
Published
•
324
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Paper
•
2412.14042
•
Published
•
5
Predicting the Original Appearance of Damaged Historical Documents
Paper
•
2412.11634
•
Published
•
4
AnySat: An Earth Observation Model for Any Resolutions, Scales, and
Modalities
Paper
•
2412.14123
•
Published
•
11
Parallelized Autoregressive Visual Generation
Paper
•
2412.15119
•
Published
•
43
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation
Paper
•
2412.13649
•
Published
•
17
MixLLM: LLM Quantization with Global Mixed-precision between
Output-features and Highly-efficient System Design
Paper
•
2412.14590
•
Published
•
8
Multi-LLM Text Summarization
Paper
•
2412.15487
•
Published
•
3
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Paper
•
2412.14963
•
Published
•
4