Results are not reproducible for me

#1
by ojasvii - opened

Hi Team,

I have downloaded the weights and dataset, and I am trying to reproduce the metrics for the object detection task on 500 images. However, I am getting very low-performance metrics. Using the COCO metric calculation, the average is less than 0.5.

Credit Mutuel Arkea org
edited Oct 2

Hi ojasvii,

It seems to me that COCO uses the XYWH format for coordinates, while the post_process_object_detection method returns bounding boxes in the XYXY format.
Could that be the source of your issue?

That makes sense, as the XY coordinates are shared between both formats, meaning the bounding box starts at the correct point. However, since the width and height are interpreted differently, it causes discrepancies in the bounding box dimensions and positioning. This would result in non-zero but lower performance metrics, which explains why you’re getting an average score of around 0.5.

We converted the cut format to xywh for coco metrics. We kept the threshold as 0.4 as mentioned in the code snapshot

Credit Mutuel Arkea org

OK, if you are using the mAP metric, then a score of 0.5 might be consistent with the evaluation, as it separately measures GIoU and bounding box class accuracy.
Maybe testing with a higher threshold could increase the mAP by reducing the number of false positives.
Sorry, I won’t be able to help further.

Credit Mutuel Arkea org
This comment has been hidden
Cyrile changed discussion status to closed

I have increased to 0.5 , it didn't help, do you remember applying any post or pre-processing

Credit Mutuel Arkea org

No, I don’t apply any additional pre-processing.
Do the visual results seem to match the mAP score you are measuring, or do the results appear better than what the mAP indicates?

Cyrile changed discussion status to open

image.png

this is how its looking like i think its many prediction per image
the metrics is very low because of that as well , thats why i asked , have any post processing is applied ?

Credit Mutuel Arkea org
edited Oct 3

Oh, I see, that makes more sense now. As we can see, the prediction itself is quite accurate, but there are too many overlapping bounding boxes nested within each other. My performance measurement for bounding boxes doesn’t account for this, as I start from the ground truth bounding box and reference the closest predicted one (in terms of GIoU). To reduce this issue, applying post-processing to remove bounding boxes nested inside others should help a lot.
Increasing the detection threshold or filtering by score could also help reduce this phenomenon by limiting low-confidence predictions.

Cyrile changed discussion status to closed

Sign up or log in to comment