Nguyen Bach's picture

Nguyen Bach

nguyenbh

·

nguyenbh

AI & ML interests

None yet

Organizations

nguyenbh's activity

upvoted a paper 5 months ago

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 84

upvoted 7 collections 6 months ago

GIT

GIT (Generative Image-to-text Transformer) is a model useful for vision-language tasks such as image/video captioning and question answering. • 18 items • Updated Jul 11 • 10

UDOP

UDOP is a general multimodal model for document AI • 4 items • Updated Jul 11 • 23

Orca

The Orca family of LMs developed by Microsoft. • 2 items • Updated Jul 11 • 7

Table Transformer

The Table Transformer (TATR) is a series of object detection models useful for table extraction from PDF images. • 5 items • Updated Jul 11 • 19

TAPEX

TAPEX is the state-of-the-art table pre-training models which can be used for table-based question answering and table-based fact verification. • 10 items • Updated Jul 11 • 8

SpeechT5

The SpeechT5 framework consists of a shared seq2seq and six modal-specific (speech/text) pre/post-nets that can address a few audio-related tasks. • 8 items • Updated Jul 11 • 21

LayoutLM

The LayoutLM series are Transformer encoders useful for document AI tasks such as invoice parsing, document image classification and DocVQA. • 5 items • Updated Jul 11 • 13

upvoted a collection 7 months ago

Phi-3

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 27 items • Updated 10 days ago • 489

upvoted a paper 7 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 251

upvoted 2 papers about 1 year ago

DEsignBench: Exploring and Benchmarking DALL-E 3 for Imagining Visual Design

Paper • 2310.15144 • Published Oct 23, 2023 • 13

Kosmos-2.5: A Multimodal Literate Model

Paper • 2309.11419 • Published Sep 20, 2023 • 50