Process YOLO results

Process YOLO results

Converts the output tensor of a YOLO deep learning model to generally usable data types. The tool first reshapes an input tensor to an N-by-(5 + classCount) matrix that has the parameters of one bounding box on each row. It filters out detections that have too low confidence and performs non-maximum suppression on the rest.


The output of a YOLO model. The shape is assumed to be (1 x K x S x S) or (K x S x S), where K = B * (classCount + 5), S is the number of vertical and horizontal image subdivisions and B is the number of anchor boxes per subdivision. For example in YOLO v2, S=13, B=5, classCount=20, so K = 125.
The image to which the model was applied. Both width and height are assumed to be P * S, where S is the number of vertical and horizontal image subdivisions, P is the number of pixels per subdivision. For example in YOLO v2, S=13 and P=32, so the image size must be 416 x 416. The image is needed because the tensor produced by a YOLO model does not contain information about the coordinate system of the image the model was applied to.
Minimum confidence for a bounding box to be accepted.
Overlap ratio threshold for pruning overlapping detections (non-maximum suppression). If the overlap ratio (intersection over union) of two detections with the same class is greater than this value, the detection with a lower confidence will be discarded. Set to one to disable non-maximum suppression.
The number of classes in the one-hot encoded class vector.
A B-by-2 matrix that contains the relative size of anchor boxes the YOLO model was trained with.


Upper left corner of each remaining detection as a coordinate frame that is aligned to the axes of the image coordinate system. A 4N-by-4 matrix.
The size of the bounding box in world coordinates. An N-by-2 matrix.
The class index of each detection. An N-by-1 matrix.
The confidence of each detection. An N-by-1 matrix.