atayloraerospace's picture

atayloraerospace

Taylor658

·

atayloraerospace

AI & ML interests

Computer Vision 🔭 | Multimodal Gen AI 🤖| AI in Healthcare 🩺 | AI in Aerospace 🚀

Organizations

Taylor658's activity

upvoted 20 papers 1 day ago

Self-Consistency Preference Optimization

Paper • 2411.04109 • Published 3 days ago • 10

From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond

Paper • 2411.03590 • Published 4 days ago • 9

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published 4 days ago • 41

Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Paper • 2411.03823 • Published 4 days ago • 41

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

Paper • 2411.04952 • Published 2 days ago • 18

RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval

Paper • 2411.04752 • Published 2 days ago • 12

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Paper • 2411.05005 • Published 2 days ago • 12

GazeGen: Gaze-Driven User Interaction for Visual Content Generation

Paper • 2411.04335 • Published 3 days ago • 13

SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Paper • 2411.05007 • Published 2 days ago • 13

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation

Paper • 2411.04989 • Published 2 days ago • 12

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

Paper • 2411.05000 • Published 2 days ago • 16

TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation

Paper • 2411.04709 • Published 4 days ago • 21

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Paper • 2411.04923 • Published 2 days ago • 13

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Paper • 2411.04999 • Published 2 days ago • 13

Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model

Paper • 2411.04496 • Published 3 days ago • 16

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published 2 days ago • 31

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published 2 days ago • 31

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published 2 days ago • 50

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published 2 days ago • 56

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models

Paper • 2411.04905 • Published 2 days ago • 77