Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator Paper • 2411.15466 • Published Nov 23, 2024 • 35
Graph-Aware Isomorphic Attention in Transformers Collection We present an approach to modifying Transformer architectures by integrating graph-aware relational reasoning into the attention mechanism. • 4 items • Updated 3 days ago • 2
CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models Paper • 2407.15886 • Published Jul 21, 2024 • 2
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering Paper • 2408.09702 • Published Aug 19, 2024 • 10
view article Article Metric and Relative Monocular Depth Estimation: An Overview. Fine-Tuning Depth Anything V2 👐 📚 By Isayoften • Jul 10, 2024 • 44
OneDiffusion Collection Collection of different version of OneDiffusion models • 8 items • Updated 14 days ago • 2
Bamba Collection Collection of Bamba - hybrid Mamba2 model architecture based models trained on open data • 8 items • Updated 25 days ago • 18
WavTokenizer-Medium-Large Collection https://arxiv.org/abs/2408.16532 • 5 items • Updated Oct 23, 2024 • 6
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 189
Tulu 3 Models Collection All models released with Tulu 3 -- state of the art open post-training recipes. • 7 items • Updated 6 days ago • 33
Cephalo Collection Cephalo is a series of multimodal vision large language models (V-LLMs) designed to integrate visual and linguistic reasoning in materials science. • 15 items • Updated Oct 25, 2024 • 4
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models Paper • 2411.09595 • Published Nov 14, 2024 • 71
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting Paper • 2305.15685 • Published May 25, 2023 • 4
Cosmos Tokenizer Collection A suite of image and video tokenizers • 13 items • Updated 1 day ago • 36
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 21 days ago • 198
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published Sep 17, 2024 • 29
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 77
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think Paper • 2410.06940 • Published Oct 9, 2024 • 6