ICCV2023

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

garyzhao9012 authored a paper 7 days ago

The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering

Eladlev authored a paper about 1 month ago

IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems

hadasor authored a paper about 1 month ago

Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models

View all activity

ICCV2023's activity

AdinaY

posted an update about 3 hours ago

Post

301

Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision

AdinaY

posted an update about 20 hours ago

Post

1526

Try QwQ-Max-Preview, Qwen's reasoning model here👉 https://chat.qwen.ai
Can't wait for the model weights to drop on the Hugging Face Hub 🔥

1 reply

AdinaY

posted an update 1 day ago

Post

1798

Two AI startups, DeepSeek & Moonshot AI , keep moving in perfect sync 👇

✨ Last December: DeepSeek & Moonshot AI released their reasoning models on the SAME DAY.
DeepSeek: deepseek-ai/DeepSeek-R1
MoonShot: https://github.com/MoonshotAI/Kimi-k1.5

✨ Last week: Both teams published papers on modifying attention mechanisms on the SAME DAY AGAIN.
DeepSeek: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)
Moonshot: MoBA: Mixture of Block Attention for Long-Context LLMs (2502.13189)

✨ TODAY:
DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences.
https://github.com/deepseek-ai/FlashMLA

Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs.
moonshotai/Moonlight-16B-A3B

What's next? 👀

xiuyul

authored a paper 4 days ago

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published 5 days ago • 52

DmitryRyumin

posted an update 5 days ago

Post

3521

🚀🎭🌟 New Research Alert - WACV 2025 (Avatars Collection)! 🌟🎭🚀
📄 Title: EmoVOCA: Speech-Driven Emotional 3D Talking Heads 🔝

📝 Description: EmoVOCA is a data-driven method for generating emotional 3D talking heads by combining speech-driven lip movements with expressive facial dynamics. This method has been developed to overcome the limitations of corpora and to achieve state-of-the-art animation quality.

👥 Authors: @FedeNoce , Claudio Ferrari, and Stefano Berretti

📅 Conference: WACV, 28 Feb – 4 Mar, 2025 | Arizona, USA 🇺🇸

📄 Paper: https://arxiv.org/abs/2403.12886

🌐 Github Page: https://fedenoce.github.io/emovoca/
📁 Repository: https://github.com/miccunifi/EmoVOCA

🚀 CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

🚀 WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

🚀 ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: #EmoVOCA #3DAnimation #TalkingHeads #SpeechDriven #FacialExpressions #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #WACV2024

1 reply

AdinaY

posted an update 5 days ago

Post

718

VLM-R1🔥bringing DeepSeek’s R1 method to vision language models!

GitHub: https://github.com/om-ai-lab/VLM-R1
Demo: omlab/VLM-R1-Referral-Expression

Junyi42

authored a paper 5 days ago

Pre-training Auto-regressive Robotic Models with 4D Representations

Paper • 2502.13142 • Published 7 days ago • 4

AdinaY

posted an update 7 days ago

Post

4161

🚀 StepFun阶跃星辰 is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm 🔥but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

📺 Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

🔊 Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b

3 replies

AdinaY

posted an update 7 days ago

Post

2402

The latest paper of DeepSeek is now available on the Daily Papers page 🚀
You can reach out to the authors directly on this page👇
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)

1 reply

AdinaY

posted an update 12 days ago

Post

2541

Ovis2 🔥 a multimodal LLM released by Alibaba AIDC team.
AIDC-AI/ovis2-67ab36c7e497429034874464
✨1B/2B/4B/8B/16B/34B
✨Strong CoT for deeper problem solving
✨Multilingual OCR – Expanded beyond English & Chinese, with better data extraction

AdinaY

posted an update 12 days ago

Post

3536

InspireMusic 🎵🔥 an open music generation framework by Alibaba FunAudio Lab
Model: FunAudioLLM/InspireMusic-1.5B-Long
Demo: FunAudioLLM/InspireMusic
✨ Music, songs, audio - ALL IN ONE
✨ High quality audio: 24kHz & 48kHz sampling rates
✨ Long-Form Generation: enables extended audio creation
✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts

1 reply

zsytony

authored 3 papers 14 days ago

LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation

Paper • 2501.12976 • Published Jan 22

HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance

Paper • 2401.08772 • Published Jan 16, 2024

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published 15 days ago • 59

AdinaY

posted an update 20 days ago

Post

3071

Xwen 🔥 a series of open models based on Qwen2.5 models, developed by a brilliant research team of PhD students from the Chinese community.
shenzhi-wang/xwen-chat-679e30ab1f4b90cfa7dbc49e
✨ 7B/72B
✨ Apache 2.0
✨ Xwen-72B-Chat outperformed DeepSeek V3 on Arena Hard Auto

dbaranchuk

authored a paper 20 days ago

Inverse Bridge Matching Distillation

Paper • 2502.01362 • Published 22 days ago • 26

TaeGyeong

authored a paper 22 days ago

Multi-aspect Knowledge Distillation with Large Language Model

Paper • 2501.13341 • Published Jan 23

AdinaY

posted an update 28 days ago

Post

3193

It’s not just a flood of model releases, papers are dropping just as fast 🚀

Here are the 10 most upvoted papers from the Chinese community:
👉 zh-ai-community/2025-january-papers-679933cbf0f3ced11f5a168a

tennant

authored 2 papers 28 days ago

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

Paper • 2412.18551 • Published Dec 24, 2024

Humanity's Last Exam

Paper • 2501.14249 • Published Jan 24 • 63

AI & ML interests

Recent Activity

Team members 208

ICCV2023's activity