File size: 3,422 Bytes
9f5d654
d4fb5d6
 
 
 
 
 
 
9f5d654
d4fb5d6
9f5d654
d4fb5d6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
---
tags:
- latex-ocr
- math-ocr
- math-formula-recognition
- mfr
- pix2text
- image-to-text
license: mit
library_name: transformers
---

# Model Card: Pix2Text-MFR
Math Formula Recognition (MFR) model from [Pix2Text (P2T)]().

## Model Details / 模型细节

This model is fine-tuned on a coin dataset using **contrastive learning** techniques, based on OpenAI's CLIP (ViT-B/32). It aims to enhance the feature extraction capabilities for **Coin** images, thus achieving more accurate image-based search functionalities. The model combines the powerful features of the Vision Transformer (ViT) with the multimodal learning capabilities of CLIP, specifically optimized for coin imagery.

这个模型是在 OpenAI 的 CLIP (ViT-B/32) 基础上,利用对比学习技术并使用硬币数据集进行微调得到的。它旨在提高硬币图像的特征提取能力,从而实现更准确的以图搜图功能。该模型结合了视觉变换器(ViT)的强大功能和 CLIP 的多模态学习能力,专门针对硬币图像进行了优化。



## Usage and Limitations / 使用和限制

- **Usage**: This model is primarily used for extracting representation vectors from coin images, enabling efficient and precise image-based searches in a coin image database.
- **Limitations**: As the model is trained specifically on coin images, it may not perform well on non-coin images.




- **用途**:此模型主要用于提取硬币图片的表示向量,以实现在硬币图像库中进行高效、精确的以图搜图。
- **限制**:由于模型是针对硬币图像进行训练的,因此在处理非硬币图像时可能效果不佳。



## Documents / 文档

- Base Model: [openai/clip-vit-base-patch32](https://huggingface.co/openai/clip-vit-base-patch32)

  

## Model Use / 模型使用

```python3
from PIL import Image
import requests

from transformers import CLIPProcessor, CLIPModel

model = CLIPModel.from_pretrained("breezedeus/coin-clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("breezedeus/coin-clip-vit-base-patch32")

image_fp = "path/to/coin_image.jpg"
image = Image.open(image_fp).convert("RGB")

inputs = processor(images=image, return_tensors="pt")
img_features = model.get_image_features(**inputs)
img_features = F.normalize(img_features, dim=1)
```



## Training Data / 训练数据

The model was trained on a specialized coin image dataset. This dataset includes images of various currencies' coins.



本模型使用的是专门的硬币图像数据集进行训练。这个数据集包含了多种货币的硬币图片。

## Training Process / 训练过程

The model was fine-tuned on the OpenAI CLIP (ViT-B/32) pretrained model using a coin image dataset. The training process involved Contrastive Learning fine-tuning techniques and parameter settings.



模型是在 OpenAI 的 CLIP (ViT-B/32) 预训练模型的基础上,使用硬币图像数据集进行微调。训练过程采用了对比学习的微调技巧和参数设置。

## Performance / 性能

This model demonstrates excellent performance in coin image retrieval tasks. 



该模型在硬币图像检索任务上展现了优异的性能。



## Feedback / 反馈

> Where to send questions or comments about the model.

Welcome to contact the author [Breezedeus](https://www.breezedeus.com/join-group).

欢迎联系作者  [Breezedeus](https://www.breezedeus.com/join-group) 。