Post
4319
Florence-2 is a new vision foundation model capable of a wide variety of tasks π€―
Demo ππ» gokaygokay/Florence-2
Collection ππ» microsoft/florence-6669f44df0d87d9c3bfb76de
This model can handle tasks that vary from OCR to semantic segmentation.
The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.
The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.
You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder π€π
They have released fine-tuned models too, you can find them in the collection above π€
Demo ππ» gokaygokay/Florence-2
Collection ππ» microsoft/florence-6669f44df0d87d9c3bfb76de
This model can handle tasks that vary from OCR to semantic segmentation.
The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.
The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.
You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder π€π
They have released fine-tuned models too, you can find them in the collection above π€