LMMRotate 🎮: A Simple Aerial Detection Baseline of Multimodal Language Models

Qingyun Li Yushi Chen Xinya Shu Dong Chen Xin He Yi Yu Xue Yang

If you find our work helpful, please consider giving us a ⭐!

ArXiv Paper: https://arxiv.org/abs/2501.09720
GitHub Repo: https://github.com/Li-Qingyun/mllm-mmrotate
HuggingFace Page: https://huggingface.co/collections/Qingyun/lmmrotate-6780cabaf49c4e705023b8df

This repo hosts all the available checkpoints of Florence-2 trained for aerial detection with LMMRotate in our paper.

LMMRotate is a technical practice to fine-tune Large Multimodal language Models for oriented object detection as in MMRotate and hosts the official implementation of the paper: A Simple Aerial Detection Baseline of Multimodal Language Models.

See the list of available checkpoint here.

The folder is named {base_model}_vis{vision_input_size}-lang{max_language_input_length}_{dataset_name}-{annotation_version}_b{samples_per_gpu}x{num_gpus}-{num_epoch}e-{note}

For example:

florence-2-b_vis1024-lang2048_dota1-train-v2_b2x16-100e-slurm-zero2:

base_model: Microsoft/Florence-2-base

vision input size: 1024 \times 1024

max language input length: 2048

aerial detection source dataset name: dota-train (train split of split_ss_dota)

annotation version: v2 (the users should ignore this)

batch size and resources: 2x16gpus = 32

schedule: 100 epochs

note: the model is trained on a slurm cluster and accelerated with DeepSpeed ZeRO2

Downloading Guide

You can download with your web browser on the file page.

We recommand downloading in terminal using huggingface-cli (pip install --upgrade huggingface_cli). You can refer to the document for more usages.

# Set Huggingface Mirror for Chinese users (if required):
export HF_ENDPOINT=https://hf-mirror.com 
# Download a certain checkpoint:
huggingface-cli download Qingyun/Florence-2-models-lmmrotate <checkpoint_folder_name> --repo-type model --local-dir checkpoint/
# If any error (such as network error) interrupts the downloading, you just need to execute the same command, the latest huggingface_hub will resume downloading.

Detection Performance

Cite

LMMRotate paper:

@article{li2025lmmrotate,
  title={A Simple Aerial Detection Baseline of Multimodal Language Models},
  author={Li, Qingyun and Chen, Yushi and Shu, Xinya and Chen, Dong and He, Xin and Yu Yi and Yang, Xue },
  journal={arXiv preprint arXiv:2501.09720},
  year={2025}
}

Qingyun
/

Florence-2-models-lmmrotate

LMMRotate 🎮: A Simple Aerial Detection Baseline of Multimodal Language Models

Downloading Guide

Detection Performance

Cite

Model tree for Qingyun/Florence-2-models-lmmrotate

Dataset used to train Qingyun/Florence-2-models-lmmrotate

Collection including Qingyun/Florence-2-models-lmmrotate

lmmrotate 🎮