samusenps

AI & ML interests

Foundational Architectures, Multi-Modality, Interpretability, Benchmarking w/ simulations, Robotics, Integration with Non envasive Open Source stack RISC-V BCI. Extremely high quality training data. Fully Open Source ML/AI.

Recent Activity

liked a model 6 days ago

seawolf2357/hanbok

liked a model 6 days ago

showlab/ShowUI-2B

liked a model 6 days ago

NX-AI/xLSTM-7b

View all activity

Organizations

samusenps's activity

upvoted a paper 6 days ago

Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published 14 days ago • 32

upvoted 5 papers 13 days ago

STIV: Scalable Text and Image Conditioned Video Generation

Paper • 2412.07730 • Published 15 days ago • 69

3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark

Paper • 2412.07825 • Published 14 days ago • 12

Mogo: RQ Hierarchical Causal Transformer for High-Quality 3D Human Motion Generation

Paper • 2412.07797 • Published 20 days ago • 11

LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations

Paper • 2412.08580 • Published 14 days ago • 44

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Paper • 2412.07760 • Published 14 days ago • 49

upvoted a paper 17 days ago

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published 19 days ago • 104

upvoted a paper 20 days ago

Teach Multimodal LLMs to Comprehend Electrocardiographic Images

Paper • 2410.19008 • Published Oct 21 • 23

upvoted a paper about 1 month ago

Balancing Pipeline Parallelism with Vocabulary Parallelism

Paper • 2411.05288 • Published Nov 8 • 19

upvoted 11 papers about 2 months ago

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

Paper • 2411.04996 • Published Nov 7 • 49

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Paper • 2411.04928 • Published Nov 7 • 48

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7 • 63

ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning

Paper • 2411.05003 • Published Nov 7 • 70

Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models

Paper • 2411.00743 • Published Nov 1 • 6

AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Paper • 2411.02394 • Published Nov 4 • 17

LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models

Paper • 2411.00918 • Published Nov 1 • 8

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

Paper • 2411.02327 • Published Nov 4 • 11