3 2 1

Le Duc Khai PRO

leduckhai

https://scholar.google.com/citations?user=DfAzEe0AAAAJ&hl=en

leduckhai

AI & ML interests

Speech Processing, Large Language Models, Medical AI

Recent Activity

updated a model about 6 hours ago

leduckhai/MultiMed-ST

published a model about 18 hours ago

leduckhai/MultiMed-ST

new activity 8 days ago

leduckhai/MultiMed:Add language tag

View all activity

Organizations

leduckhai's activity

updated a model about 6 hours ago

leduckhai/MultiMed-ST

Updated about 6 hours ago

published a model about 18 hours ago

leduckhai/MultiMed-ST

Updated about 6 hours ago

New activity in leduckhai/MultiMed 8 days ago

Add language tag

#4 opened 9 days ago by

lbourdois

New activity in leduckhai/MultiMed 23 days ago

Transcription langauge is different from audio language

#3 opened 30 days ago by

Shamus

updated a model 4 months ago

leduckhai/ViT5-VietMedSum

Summarization • Updated Nov 9, 2024 • 19

updated a dataset 4 months ago

leduckhai/VietMed-Sum

Viewer • Updated Nov 9, 2024 • 106k • 173 • 1

updated a dataset 5 months ago

leduckhai/MultiMed

Viewer • Updated 8 days ago • 48.4k • 244 • 1

authored 2 papers 8 months ago

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

Paper • 2210.13397 • Published Oct 24, 2022

Unsupervised Pre-Training for Vietnamese Automatic Speech Recognition in the HYKIST Project

Paper • 2309.15869 • Published Sep 26, 2023

New activity in leduckhai/VietMed-Sum 8 months ago

[bot] Conversion to Parquet

#1 opened 8 months ago by

parquet-converter

authored 2 papers 8 months ago

Medical Spoken Named Entity Recognition

Paper • 2406.13337 • Published Jun 19, 2024

Real-time Speech Summarization for Medical Conversations

Paper • 2406.15888 • Published Jun 22, 2024 • 1

upvoted a paper 8 months ago

Real-time Speech Summarization for Medical Conversations

Paper • 2406.15888 • Published Jun 22, 2024 • 1

updated a dataset 8 months ago

leduckhai/VietMed-NER

Viewer • Updated Jun 21, 2024 • 9.27k • 97

reacted to merve's post with 🔥 8 months ago

Post

4354

Florence-2 is a new vision foundation model capable of a wide variety of tasks 🤯
Demo 👉🏻 gokaygokay/Florence-2
Collection 👉🏻 microsoft/florence-6669f44df0d87d9c3bfb76de

This model can handle tasks that vary from OCR to semantic segmentation.

The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.

The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.

You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder 🤓📉
They have released fine-tuned models too, you can find them in the collection above 🤗

3 replies

authored a paper 9 months ago

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Paper • 2404.05659 • Published Apr 8, 2024 • 2

liked a dataset 9 months ago

leduckhai/VietMed

Preview • Updated May 25, 2024 • 347 • 15

updated a dataset 9 months ago

leduckhai/VietMed

Preview • Updated May 25, 2024 • 347 • 15

upvoted a paper 11 months ago

VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain

Paper • 2404.05659 • Published Apr 8, 2024 • 2