usernameoccupied/MaterialSpecVision

MaterialSpecVision 🔎

release 1.0

WHAT IS MaterialSpecVision?

This is a very specific object detection model designed to detect heat numbers in material certificates, solving the challenge of locating them manually in poorly scanned or low-quality PDF documents.

WHY USE MaterialSpecVision?

Heat numbers, which are critical for traceability in material documentation, can often be difficult to identify due to inconsistent formatting, low resolution, or cluttered layouts.

WHO SHOULD USE MaterialSpecVision

Ideal for engineers and quality control teams, it streamlines the task of parsing certificates, even under challenging conditions.

TECHNICAL DETAILS

Trained on a dataset of over 2,000 material certificates in German, Italian, Spanish, and Chinese, ensuring robust performance across a wide range of formats and languages. Model ensures high accuracy and reliability, effectively identifying and highlighting heat numbers even in low quality material certificates.

"""Model Training Configuration:"""
model = YOLO('yolov8n.pt')
model.train(epochs=100, batch=16, imgsz=640)

USAGE EXAMPLES

Single-page Material Certificate (JPG format)

import matplotlib.pyplot as plt
from ultralytics import YOLO

model = YOLO("best.pt") # the current pre-trained model file you download from HF
results = model("your_material_certificate_file.jpg")

for r in results:
    filtered_boxes = r.boxes[r.boxes.conf > 0.4] # filtering confidence
    r.boxes = filtered_boxes
    im_bgr = r.plot(line_width=5, font_size=5, conf=True)
    plt.imshow(im_bgr)
    plt.axis('off')
    plt.show()

Multi-page Material Certificate (PDF format)
For PDFs, you need to convert each page to JPG images before processing. This requires installing Tesseract and Poppler.

import pytesseract
from pdf2image import convert_from_path
import os

import matplotlib.pyplot as plt
from ultralytics import YOLO

custom_config = r'--oem 3 --psm 4'
pytesseract.pytesseract.tesseract_cmd = r'\...\Tesseract-OCR\tesseract.exe' # download and install Tesseract
poppler_path=r'\...\poppler-24.08.0\Library\bin' # download and install poppler

pdf_file_name = "your_material_certificate.pdf"
pdf_file_path = rf":\...\{pdf_file_name}" # absolute path to your material certificate
images_to_process = []
model = YOLO("best.pt")

try:
    images = convert_from_path(pdf_file_path,
                               poppler_path=poppler_path,
                               dpi=300,
                               use_cropbox=True)
    for i, image in enumerate(images):
        orientation_data = pytesseract.image_to_osd(image)
        rotation_angle = int(orientation_data.split("Rotate: ")[1].split("\n")[0])
        full_path_img = f"{os.path.join(os.getcwd(), str(i) + pdf_file_name[:-4])}.jpg"
        images_to_process.append(full_path_img)
        if rotation_angle != 0:
            rotated_image = image.rotate(-rotation_angle, expand=True)
            rotated_image.save(full_path_img, "JPEG")
        else:
            image.save(full_path_img, "JPEG")
except:
    print(f"PDF with file name {pdf_file_name} didn't processed ")

for img_path in images_to_process:
    results = model(img_path)
    for r in results:
        filtered_boxes = r.boxes[r.boxes.conf > 0.1] # filtered confidence
        r.boxes = filtered_boxes
        im_bgr = r.plot(line_width=5, font_size=5, conf=True)
        plt.imshow(im_bgr)
        plt.axis('off')
        plt.show()

IF DETECTION ISN'T WORKING AS EXPECTED.

If detection doesn't work as expected, I’d love to review your material certificates for further investigation and potential inclusion in the next training session. Feel free to reach me out via my linkedin account: http://linkedin.com/in/sergey-zhmaev-7a325896

CITATION

If you use this model, please cite it as follows:

cff-version: 1.2.0
message: "If you use this model, please cite:"
authors:
  - family-names: Zhmaev
    given-names: Sergey
title: "Material Specifications AI Vision Model"
version: 1.0
doi: 10.57967/hf/4255

LICENSE

Apache 2.0

usernameoccupied
/

MaterialSpecVision

You need to agree to share your contact information to access this model