AutoTrain documentation

Object Detection

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Object Detection

Object detection is a form of supervised learning where a model is trained to identify and categorize objects within images. AutoTrain simplifies the process, enabling you to train a state-of-the-art object detection model by simply uploading labeled example images.

Preparing your data

To ensure your object detection model trains effectively, follow these guidelines for preparing your data:

Organizing Images

Prepare a zip file containing your images and metadata.jsonl.

Archive.zip
β”œβ”€β”€ 0001.png
β”œβ”€β”€ 0002.png
β”œβ”€β”€ 0003.png
β”œβ”€β”€ .
β”œβ”€β”€ .
β”œβ”€β”€ .
└── metadata.jsonl

Example for metadata.jsonl:

{"file_name": "0001.png", "objects": {"bbox": [[302.0, 109.0, 73.0, 52.0]], "category": [0]}}
{"file_name": "0002.png", "objects": {"bbox": [[810.0, 100.0, 57.0, 28.0]], "category": [1]}}
{"file_name": "0003.png", "objects": {"bbox": [[160.0, 31.0, 248.0, 616.0], [741.0, 68.0, 202.0, 401.0]], "category": [2, 2]}}

Please note that bboxes need to be in COCO format [x, y, width, height].

Image Requirements

  • Format: Ensure all images are in JPEG, JPG, or PNG format.

  • Quantity: Include at least 5 images to provide the model with sufficient examples for learning.

  • Exclusivity: The zip file should exclusively contain images and metadata.jsonl. No additional files or nested folders should be included.

Some points to keep in mind:

  • The images must be jpeg, jpg or png.
  • There should be at least 5 images per split.
  • There must not be any other files in the zip file.
  • There must not be any other folders inside the zip folder.

When train.zip is decompressed, it creates no folders: only images and metadata.jsonl.

Parameters

class autotrain.trainers.object_detection.params.ObjectDetectionParams

< >

( data_path: str = None model: str = 'google/vit-base-patch16-224' username: Optional = None lr: float = 5e-05 epochs: int = 3 batch_size: int = 8 warmup_ratio: float = 0.1 gradient_accumulation: int = 1 optimizer: str = 'adamw_torch' scheduler: str = 'linear' weight_decay: float = 0.0 max_grad_norm: float = 1.0 seed: int = 42 train_split: str = 'train' valid_split: Optional = None logging_steps: int = -1 project_name: str = 'project-name' auto_find_batch_size: bool = False mixed_precision: Optional = None save_total_limit: int = 1 token: Optional = None push_to_hub: bool = False eval_strategy: str = 'epoch' image_column: str = 'image' objects_column: str = 'objects' log: str = 'none' image_square_size: Optional = 600 early_stopping_patience: int = 5 early_stopping_threshold: float = 0.01 )

Parameters

  • data_path (str) — Path to the dataset.
  • model (str) — Name of the model to be used. Default is “google/vit-base-patch16-224”.
  • username (Optional[str]) — Hugging Face Username.
  • lr (float) — Learning rate. Default is 5e-5.
  • epochs (int) — Number of training epochs. Default is 3.
  • batch_size (int) — Training batch size. Default is 8.
  • warmup_ratio (float) — Warmup proportion. Default is 0.1.
  • gradient_accumulation (int) — Gradient accumulation steps. Default is 1.
  • optimizer (str) — Optimizer to be used. Default is “adamw_torch”.
  • scheduler (str) — Scheduler to be used. Default is “linear”.
  • weight_decay (float) — Weight decay. Default is 0.0.
  • max_grad_norm (float) — Max gradient norm. Default is 1.0.
  • seed (int) — Random seed. Default is 42.
  • train_split (str) — Name of the training data split. Default is “train”.
  • valid_split (Optional[str]) — Name of the validation data split.
  • logging_steps (int) — Number of steps between logging. Default is -1.
  • project_name (str) — Name of the project for output directory. Default is “project-name”.
  • auto_find_batch_size (bool) — Whether to automatically find batch size. Default is False.
  • mixed_precision (Optional[str]) — Mixed precision type (fp16, bf16, or None).
  • save_total_limit (int) — Total number of checkpoints to save. Default is 1.
  • token (Optional[str]) — Hub Token for authentication.
  • push_to_hub (bool) — Whether to push the model to the Hugging Face Hub. Default is False.
  • eval_strategy (str) — Evaluation strategy. Default is “epoch”.
  • image_column (str) — Name of the image column in the dataset. Default is “image”.
  • objects_column (str) — Name of the target column in the dataset. Default is “objects”.
  • log (str) — Logging method for experiment tracking. Default is “none”.
  • image_square_size (Optional[int]) — Longest size to which the image will be resized, then padded to square. Default is 600.
  • early_stopping_patience (int) — Number of epochs with no improvement after which training will be stopped. Default is 5.
  • early_stopping_threshold (float) — Minimum change to qualify as an improvement. Default is 0.01.

ObjectDetectionParams is a configuration class for object detection training parameters.

< > Update on GitHub