Le Duc Khai PRO

leduckhai

AI & ML interests

Speech Processing, Large Language Models, Medical AI

Recent Activity

updated a model about 6 hours ago
leduckhai/MultiMed-ST
published a model about 18 hours ago
leduckhai/MultiMed-ST
new activity 8 days ago
leduckhai/MultiMed:Add language tag
View all activity

Organizations

RWTH Aachen University's profile picture University of Toronto's profile picture Sailor2's profile picture VietMed's profile picture

leduckhai's activity

published a model about 18 hours ago
New activity in leduckhai/MultiMed 8 days ago

Add language tag

#4 opened 9 days ago by
lbourdois
New activity in leduckhai/VietMed-Sum 8 months ago
reacted to merve's post with ๐Ÿ”ฅ 8 months ago
view post
Post
4354
Florence-2 is a new vision foundation model capable of a wide variety of tasks ๐Ÿคฏ
Demo ๐Ÿ‘‰๐Ÿป gokaygokay/Florence-2
Collection ๐Ÿ‘‰๐Ÿป microsoft/florence-6669f44df0d87d9c3bfb76de

This model can handle tasks that vary from OCR to semantic segmentation.

The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.

The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.

You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder ๐Ÿค“๐Ÿ“‰
They have released fine-tuned models too, you can find them in the collection above ๐Ÿค—
ยท