File size: 2,172 Bytes
02735d5
b33a05b
 
 
 
38755f4
02735d5
 
b33a05b
02735d5
 
b33a05b
02735d5
b33a05b
02735d5
b33a05b
02735d5
b33a05b
02735d5
b33a05b
02735d5
b33a05b
 
cfa166d
02735d5
b33a05b
02735d5
b33a05b
02735d5
b33a05b
02735d5
b33a05b
 
 
02735d5
b33a05b
02735d5
b33a05b
 
02735d5
2e651ba
0a0f7dc
b33a05b
02735d5
b33a05b
02735d5
b33a05b
 
 
 
 
446760b
e73b064
446760b
 
 
e73b064
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
datasets:
- agentsea/wave-ui
language:
- en
library_name: transformers
---

# Paligemma WaveUI


Transformers [PaliGemma 3B 896-res weights](https://huggingface.co/google/paligemma-3b-pt-896), fine-tuned on the [WaveUI](https://huggingface.co/datasets/agentsea/wave-ui) dataset for object-detection.

## Model Details

### Model Description

This fine-tune was done atop of the [Paligemma 896](https://huggingface.co/google/paligemma-3b-pt-896) model, using the [WaveUI](https://huggingface.co/datasets/agentsea/wave-ui) dataset, which contains ~80k examples of labeled UI elements.

The fine-tune was done for the object detection task. Specifically, this model aims to perform well at UI element detection, as part of a wider effort to enable our open-source toolkit for building agents at [AgentSea](https://www.agentsea.ai/).

- **Developed by:** https://agentsea.ai/
- **Language(s) (NLP):** en
- **Finetuned from model:** https://huggingface.co/google/paligemma-3b-pt-896

### Demo

You can find a **demo** for this model [here](https://huggingface.co/spaces/agentsea/paligemma-waveui).

## Notes

- The only task used in the fine-tune was the object detection task, so it might not perform well in other types of tasks.
  
## Usage

To start using this model, run the following:

```python
from transformers import AutoProcessor, PaliGemmaForConditionalGeneration

model = PaliGemmaForConditionalGeneration.from_pretrained("agentsea/paligemma-3b-ft-waveui-896").eval()
processor = AutoProcessor.from_pretrained("agentsea/paligemma-3b-ft-waveui-896")
```

## Data

We used the [WaveUI](https://huggingface.co/datasets/agentsea/wave-ui) dataset for this fine-tune. Before using it, we preprocessed the data to use the Paligemma bounding-box format.
 

## Evaluation

We calculated the mean IoU over 1024 examples of the test set using 3 different closed-source models: Gemini 1.5 Pro, Claude 3.5 Sonnet and GPT 4o. We also ran this same calculation using the PaliGemma WaveUI fine-tunes. We obtained the following values:

- Gemini 1.5 Pro: 0.12
- Claude 3.5 Sonnet: 0.05
- GPT 4o:  0.05
- PaliGemma Widgetcap+WaveUI 448: 0.40
- **PaliGemma WaveUI 896: 0.49**