ICCV2023

community

AI & ML interests

None defined yet.

Recent Activity

ICCV2023's activity

AdinaY 
posted an update about 3 hours ago
view post
Post
301
Wan2.1 🔥📹 new OPEN video model by Alibaba Wan team!

Model: Wan-AI/Wan2.1-T2V-14B
Demo: Wan-AI/Wan2.1

✨Apache 2.0
✨8.19GB VRAM, runs on most GPUs
✨Multi-Tasking: T2V, I2V, Video Editing, T2I, V2A
✨Text Generation: Supports Chinese & English
✨Powerful Video VAE: Encode/decode 1080P w/ temporal precision
AdinaY 
posted an update about 20 hours ago
view post
Post
1526
Try QwQ-Max-Preview, Qwen's reasoning model here👉 https://chat.qwen.ai
Can't wait for the model weights to drop on the Hugging Face Hub 🔥
  • 1 reply
·
AdinaY 
posted an update 1 day ago
view post
Post
1798
Two AI startups, DeepSeek & Moonshot AI , keep moving in perfect sync 👇

✨ Last December: DeepSeek & Moonshot AI released their reasoning models on the SAME DAY.
DeepSeek: deepseek-ai/DeepSeek-R1
MoonShot: https://github.com/MoonshotAI/Kimi-k1.5

✨ Last week: Both teams published papers on modifying attention mechanisms on the SAME DAY AGAIN.
DeepSeek: Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)
Moonshot: MoBA: Mixture of Block Attention for Long-Context LLMs (2502.13189)

✨ TODAY:
DeepSeek unveiled Flash MLA: a efficient MLA decoding kernel for NVIDIA Hopper GPUs, optimized for variable-length sequences.
https://github.com/deepseek-ai/FlashMLA

Moonshot AI introduces Moonlight: a 3B/16B MoE trained on 5.7T tokens using Muon, pushing the Pareto frontier with fewer FLOPs.
moonshotai/Moonlight-16B-A3B

What's next? 👀
DmitryRyumin 
posted an update 5 days ago
view post
Post
3521
🚀🎭🌟 New Research Alert - WACV 2025 (Avatars Collection)! 🌟🎭🚀
📄 Title: EmoVOCA: Speech-Driven Emotional 3D Talking Heads 🔝

📝 Description: EmoVOCA is a data-driven method for generating emotional 3D talking heads by combining speech-driven lip movements with expressive facial dynamics. This method has been developed to overcome the limitations of corpora and to achieve state-of-the-art animation quality.

👥 Authors: @FedeNoce , Claudio Ferrari, and Stefano Berretti

📅 Conference: WACV, 28 Feb – 4 Mar, 2025 | Arizona, USA 🇺🇸

📄 Paper: https://arxiv.org/abs/2403.12886

🌐 Github Page: https://fedenoce.github.io/emovoca/
📁 Repository: https://github.com/miccunifi/EmoVOCA

🚀 CVPR-2023-24-Papers: https://github.com/DmitryRyumin/CVPR-2023-24-Papers

🚀 WACV-2024-Papers: https://github.com/DmitryRyumin/WACV-2024-Papers

🚀 ICCV-2023-Papers: https://github.com/DmitryRyumin/ICCV-2023-Papers

📚 More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

🚀 Added to the Avatars Collection: DmitryRyumin/avatars-65df37cdf81fec13d4dbac36

🔍 Keywords: #EmoVOCA #3DAnimation #TalkingHeads #SpeechDriven #FacialExpressions #MachineLearning #ComputerVision #ComputerGraphics #DeepLearning #AI #WACV2024
  • 1 reply
·
AdinaY 
posted an update 5 days ago
AdinaY 
posted an update 7 days ago
view post
Post
4161
🚀 StepFun阶跃星辰 is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm 🔥but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

📺 Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

🔊 Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b
·
AdinaY 
posted an update 7 days ago
AdinaY 
posted an update 12 days ago
view post
Post
2541
Ovis2 🔥 a multimodal LLM released by Alibaba AIDC team.
AIDC-AI/ovis2-67ab36c7e497429034874464
✨1B/2B/4B/8B/16B/34B
✨Strong CoT for deeper problem solving
✨Multilingual OCR – Expanded beyond English & Chinese, with better data extraction
AdinaY 
posted an update 12 days ago
view post
Post
3536
InspireMusic 🎵🔥 an open music generation framework by Alibaba FunAudio Lab
Model: FunAudioLLM/InspireMusic-1.5B-Long
Demo: FunAudioLLM/InspireMusic
✨ Music, songs, audio - ALL IN ONE
✨ High quality audio: 24kHz & 48kHz sampling rates
✨ Long-Form Generation: enables extended audio creation
✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts
  • 1 reply
·
AdinaY 
posted an update 20 days ago
view post
Post
3071
Xwen 🔥 a series of open models based on Qwen2.5 models, developed by a brilliant research team of PhD students from the Chinese community.
shenzhi-wang/xwen-chat-679e30ab1f4b90cfa7dbc49e
✨ 7B/72B
✨ Apache 2.0
✨ Xwen-72B-Chat outperformed DeepSeek V3 on Arena Hard Auto
AdinaY 
posted an update 28 days ago