MaterialSpecVision π
release 1.0
WHAT IS MaterialSpecVision?
This is a very specific object detection model designed to detect heat numbers in material certificates, solving the challenge of locating them manually in poorly scanned or low-quality PDF documents.
WHY USE MaterialSpecVision?
Heat numbers, which are critical for traceability in material documentation, can often be difficult to identify due to inconsistent formatting, low resolution, or cluttered layouts.
WHO SHOULD USE MaterialSpecVision
Ideal for engineers and quality control teams, it streamlines the task of parsing certificates, even under challenging conditions.
TECHNICAL DETAILS
Trained on a dataset of over 2,000 material certificates in German, Italian, Spanish, and Chinese, ensuring robust performance across a wide range of formats and languages. Model ensures high accuracy and reliability, effectively identifying and highlighting heat numbers even in low quality material certificates.
"""Model Training Configuration:"""
model = YOLO('yolov8n.pt')
model.train(epochs=100, batch=16, imgsz=640)
USAGE EXAMPLES
- Single-page Material Certificate (JPG format)
import matplotlib.pyplot as plt
from ultralytics import YOLO
model = YOLO("best.pt") # the current pre-trained model file you download from HF
results = model("your_material_certificate_file.jpg")
for r in results:
filtered_boxes = r.boxes[r.boxes.conf > 0.4] # filtering confidence
r.boxes = filtered_boxes
im_bgr = r.plot(line_width=5, font_size=5, conf=True)
plt.imshow(im_bgr)
plt.axis('off')
plt.show()
- Multi-page Material Certificate (PDF format)
For PDFs, you need to convert each page to JPG images before processing. This requires installing Tesseract and Poppler.
import pytesseract
from pdf2image import convert_from_path
import os
import matplotlib.pyplot as plt
from ultralytics import YOLO
custom_config = r'--oem 3 --psm 4'
pytesseract.pytesseract.tesseract_cmd = r'\...\Tesseract-OCR\tesseract.exe' # download and install Tesseract
poppler_path=r'\...\poppler-24.08.0\Library\bin' # download and install poppler
pdf_file_name = "your_material_certificate.pdf"
pdf_file_path = rf":\...\{pdf_file_name}" # absolute path to your material certificate
images_to_process = []
model = YOLO("best.pt")
try:
images = convert_from_path(pdf_file_path,
poppler_path=poppler_path,
dpi=300,
use_cropbox=True)
for i, image in enumerate(images):
orientation_data = pytesseract.image_to_osd(image)
rotation_angle = int(orientation_data.split("Rotate: ")[1].split("\n")[0])
full_path_img = f"{os.path.join(os.getcwd(), str(i) + pdf_file_name[:-4])}.jpg"
images_to_process.append(full_path_img)
if rotation_angle != 0:
rotated_image = image.rotate(-rotation_angle, expand=True)
rotated_image.save(full_path_img, "JPEG")
else:
image.save(full_path_img, "JPEG")
except:
print(f"PDF with file name {pdf_file_name} didn't processed ")
for img_path in images_to_process:
results = model(img_path)
for r in results:
filtered_boxes = r.boxes[r.boxes.conf > 0.1] # filtered confidence
r.boxes = filtered_boxes
im_bgr = r.plot(line_width=5, font_size=5, conf=True)
plt.imshow(im_bgr)
plt.axis('off')
plt.show()
IF DETECTION ISN'T WORKING AS EXPECTED.
If detection doesn't work as expected, Iβd love to review your material certificates for further investigation and potential inclusion in the next training session. Feel free to reach me out via my linkedin account: http://linkedin.com/in/sergey-zhmaev-7a325896
CITATION
If you use this model, please cite it as follows:
cff-version: 1.2.0
message: "If you use this model, please cite:"
authors:
- family-names: Zhmaev
given-names: Sergey
title: "Material Specifications AI Vision Model"
version: 1.0
doi: 10.57967/hf/4255
LICENSE
Apache 2.0
- Downloads last month
- 6
Model tree for usernameoccupied/MaterialSpecVision
Base model
Ultralytics/YOLOv8