Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,28 @@
|
|
1 |
-
---
|
2 |
-
license: gpl-3.0
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: gpl-3.0
|
3 |
+
---
|
4 |
+
|
5 |
+
<div align="center">
|
6 |
+
<h1>Mamba-YOLO-World</h1>
|
7 |
+
<h3>Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection</h3>
|
8 |
+
Haoxuan Wang, Qingdong He, Jinlong Peng, Hao Yang, Mingmin Chi, Yabiao Wang
|
9 |
+
|
10 |
+
<br>
|
11 |
+
<br>
|
12 |
+
|
13 |
+
[data:image/s3,"s3://crabby-images/035b1/035b1f232f76dddc1c1cf507af822fbb5696d3d5" alt="arxiv paper"](https://arxiv.org/abs/2409.08513)
|
14 |
+
|
15 |
+
</div>
|
16 |
+
|
17 |
+
|
18 |
+
## Abstract
|
19 |
+
Open-vocabulary detection (OVD) aims to detect objects beyond a predefined set of categories.
|
20 |
+
As a pioneering model incorporating the YOLO series into OVD, YOLO-World is well-suited for scenarios prioritizing speed and efficiency.
|
21 |
+
However, its performance is hindered by its neck feature fusion mechanism, which causes the quadratic complexity and the limited guided receptive fields.
|
22 |
+
To address these limitations, we present Mamba-YOLO-World, a novel YOLO-based OVD model employing the proposed MambaFusion Path Aggregation Network (MambaFusion-PAN) as its neck architecture.
|
23 |
+
Specifically, we introduce an innovative State Space Model-based feature fusion mechanism consisting of a Parallel-Guided Selective Scan algorithm and a Serial-Guided Selective Scan algorithm with linear complexity and globally guided receptive fields.
|
24 |
+
It leverages multi-modal input sequences and mamba hidden states to guide the selective scanning process.
|
25 |
+
Experiments demonstrate that our model outperforms the original YOLO-World on the COCO and LVIS benchmarks in both zero-shot and fine-tuning settings while maintaining comparable parameters and FLOPs.
|
26 |
+
Additionally, it surpasses existing state-of-the-art OVD methods with fewer parameters and FLOPs.
|
27 |
+
|
28 |
+
For our code and more information, please turn to https://github.com/Xuan-World/Mamba-YOLO-World
|