tencent/Hunyuan3D-2
Image-to-3D
β’
Updated
β’
4.79k
β’
329
Note 660B reasoning models with MIT license
Note A non transformer based ( ViT-MLP-LLM framework) VLM
Note 456B LLM with 1M tokens training context
Note End-side multimodal LLM that supports real time conversation and video understanding.
Note A unified model for dense grounded understanding of images & videos.
Note A multimodel dataset for vision language pretraining , includes 6.5M images + 0.8B text from 22k hours of instructional videos
Note Dataset designed specifically for natural language processing (NLP) tasks in the education sector.
Text-to-3D and Image-to-3D Generation