How to use YOLOv8 or YOLOv8 OBB within a ROI (Region of Interest)?

elenavoicila · November 4, 2024, 12:53pm

This tutorial will include YOLOv8 OBB and YOLOv8. The tutorial will provide code with explanations, therefore you will need:

A best.pt file of the model training of YOLOv8 OBB or YOLOv8
An IDE (preferably Visual Studio Code)
The IDE will also require Python installed
pip install torch torchvision ultralytics opencv-python numpy into a Terminal

YOLOv8 OBB Implementation

Import Libraries: In your Python script, start by importing the required libraries:

import cv2 as cv
import torch
from ultralytics import YOLO

cv2: Used for capturing video and processing images.
torch: Required for running the YOLOv8 OBB model.
YOLO: The specific class for loading and using the YOLOv8 OBB model.

Initialise Camera and Model: Set up the camera and load the YOLOv8 OBB or YOLOv8 model:

camera_index = 1  # Use 0 for the default camera
model = YOLO(r"path/to/best.pt")  # Path to your model

camera_index: Determines which camera to use (0 is usually the default).
model: Loads the pre-trained YOLOv8 OBB model from the specified path.

Start Video Capture: Begin capturing video from the camera:

cap = cv.VideoCapture(camera_index)

cap: An object that represents the video capture. You can read frames from this object.

Define the Region of Interest (ROI): Specify the coordinates and dimensions of the ROI:

roi_x, roi_y, roi_w, roi_h = 515, 0, 250, 720  # x, y, width, height

roi_x and roi_y: The top-left corner of the ROI.
roi_w and roi_h: The width and height of the ROI.

Main Loop for Processing Frames: Create a loop that continuously captures frames from the camera:

while True:
    ret, frame = cap.read()
    if not ret:
        break

ret: A boolean that indicates if the frame was successfully captured.
frame: The captured frame from the camera.

Crop the ROI from the Frame: Extract the ROI from the captured frame:

roi_frame = frame[roi_y:roi_y + roi_h, roi_x:roi_x + roi_w]  # Crop ROI

This selects the portion of the frame that corresponds to the defined ROI.

Run the YOLOv8 Model: Pass the cropped ROI to the YOLOv8 model for detection:

results = model(roi_frame, verbose=False, conf=0.75)  # Run model

results: Contains the detection results, including bounding boxes and class labels.
conf: Sets the confidence threshold for detections (0.75 means only detections with confidence above 75% will be considered).

Annotate Results on the Original Frame: If objects are detected, draw bounding boxes and labels on the original frame:

if results and results[0].obb:
    for obb in results[0].obb:
        vertices = obb.xyxyxyxy[0].cpu().numpy()
        vertices[:, 0] += roi_x
        vertices[:, 1] += roi_y
        points = vertices.astype(int)

obb: Refers to the oriented bounding box for each detected object.
vertices: The coordinates of the bounding box vertices.
The coordinates are adjusted to account for the ROI position.

Draw the Bounding Box and Labels: Visualise the results by drawing lines and text on the original frame:

for j in range(len(points)):
    cv.line(frame, tuple(points[j]), tuple(points[(j + 1) % len(points)]), (0, 255, 0), 2)
label = results[0].names[int(obb.cls[0])]
cv.putText(frame, label, (points[0][0], points[0][1] - 10), cv.FONT_HERSHEY_SIMPLEX, 0.9, (255, 255, 255), 2)

cv.line: Draws the bounding box around the detected object.
cv.putText: Displays the class label near the detected object.

Draw the ROI Rectangle: Visually outline the ROI on the frame:

cv.rectangle(frame, (roi_x, roi_y), (roi_x + roi_w, roi_y + roi_h), (0, 255, 0), 2)  # Draw ROI

Display the Annotated Frame: Show the updated frame with annotations in a window:

cv.imshow("YOLOv8 OBB Detection", frame)

Exit on Key Press: Add functionality to exit the loop:

if cv.waitKey(1) == ord('q'):
    break

Release Resources: At the end of your script, ensure that you release the video capture and close all windows:

cap.release()
cv.destroyAllWindows()

Here is the full code for the YOLOv8 OBB detection within a ROI:

import cv2 as cv
import torch
from ultralytics import YOLO

camera_index = 1
model = YOLO(r"path/to/best.pt")

cap = cv.VideoCapture(camera_index)
roi_x, roi_y, roi_w, roi_h = 515, 0, 250, 720

while True:
    ret, frame = cap.read()
    if not ret:
        break

    roi_frame = frame[roi_y:roi_y + roi_h, roi_x:roi_x + roi_w]
    results = model(roi_frame, verbose=False, conf=0.75)

    if results and results[0].obb:
        for obb in results[0].obb:
            vertices = obb.xyxyxyxy[0].cpu().numpy()
            vertices[:, 0] += roi_x
            vertices[:, 1] += roi_y
            points = vertices.astype(int)

            for j in range(len(points)):
                cv.line(frame, tuple(points[j]), tuple(points[(j + 1) % len(points)]), (0, 255, 0), 2)
            label = results[0].names[int(obb.cls[0])]
            cv.putText(frame, label, (points[0][0], points[0][1] - 10), cv.FONT_HERSHEY_SIMPLEX, 0.9, (255, 255, 255), 2)

    cv.rectangle(frame, (roi_x, roi_y), (roi_x + roi_w, roi_y + roi_h), (0, 255, 0), 2)
    cv.imshow("YOLOv8 OBB Detection", frame)

    if cv.waitKey(1) == ord('q'):
        break

cap.release()
cv.destroyAllWindows()

YOLOv8 Implementation:

Import Libraries: Start your Python script by importing the necessary libraries:

import cv2 as cv
import torch
from ultralytics import YOLO

Initialise Camera and Model: Set up the camera and load the YOLOv8 model:

camera_index = 1  # Use 0 for the default camera
model = YOLO(r"C:/path/to/best.pt")  # Path to your YOLOv8 model

camera_index: Selects which camera to use.
model: Loads the YOLOv8 model from the specified path.

Start Video Capture: Begin capturing video from the camera:

cap = cv.VideoCapture(camera_index)

Define the Region of Interest (ROI): Specify the coordinates and dimensions for the ROI:

roi_x, roi_y, roi_w, roi_h = 515, 0, 250, 720  # x, y, width, height

Main Loop for Processing Frames: Create a loop to continuously capture frames:

while True:
    ret, frame = cap.read()
    if not ret:
        break

Crop the ROI from the Frame: Extract the ROI from the captured frame:

roi_frame = frame[roi_y:roi_y + roi_h, roi_x:roi_x + roi_w]  # Crop ROI

Run the YOLOv8 Model: Pass the cropped ROI to the YOLOv8 model for detection:

results = model(roi_frame, verbose=False, conf=0.75)  # Run model

Annotate Results on the Original Frame: If objects are detected, draw bounding boxes and labels on the original frame:

if results:
    for box in results[0].boxes.xyxy:  # Access bounding boxes
        x1, y1, x2, y2 = map(int, box)  # Convert to integers
        cv.rectangle(frame, (x1 + roi_x, y1 + roi_y), (x2 + roi_x, y2 + roi_y), (0, 255, 0), 2)
        label = results[0].names[int(box.cls)]  # Get class name
        cv.putText(frame, label, (x1 + roi_x, y1 + roi_y - 10), cv.FONT_HERSHEY_SIMPLEX, 0.9, (255, 255, 255), 2)

Draw the ROI Rectangle: Outline the ROI on the original frame:

cv.rectangle(frame, (roi_x, roi_y), (roi_x + roi_w, roi_y + roi_h), (0, 255, 0), 2)  # Draw ROI

Display the Annotated Frame: Show the updated frame with annotations:

cv.imshow("YOLOv8 Detection", frame)

Exit on Key Press: Add functionality to exit the loop:

if cv.waitKey(1) == ord('q'):
    break

Release Resources: At the end of your script, release the video capture and close all windows:

cap.release()
cv.destroyAllWindows()

Here is the full code for the YOLOv8 detection within a ROI:

import cv2 as cv
import torch
from ultralytics import YOLO

camera_index = 1
model = YOLO(r"C:/path/to/best.pt")

cap = cv.VideoCapture(camera_index)
roi_x, roi_y, roi_w, roi_h = 515, 0, 250, 720

while True:
    ret, frame = cap.read()
    if not ret:
        break

    roi_frame = frame[roi_y:roi_y + roi_h, roi_x:roi_x + roi_w]
    results = model(roi_frame, verbose=False, conf=0.75)

    if results:
        for box in results[0].boxes.xyxy:
            x1, y1, x2, y2 = map(int, box)
            cv.rectangle(frame, (x1 + roi_x, y1 + roi_y), (x2 + roi_x, y2 + roi_y), (0, 255, 0), 2)
            label = results[0].names[int(box.cls)]
            cv.putText(frame, label, (x1 + roi_x, y1 + roi_y - 10), cv.FONT_HERSHEY_SIMPLEX, 0.9, (255, 255, 255), 2)

    cv.rectangle(frame, (roi_x, roi_y), (roi_x + roi_w, roi_y + roi_h), (0, 255, 0), 2)
    cv.imshow("YOLOv8 Detection", frame)

    if cv.waitKey(1) == ord('q'):
        break

cap.release()
cv.destroyAllWindows()

Mathijs · November 5, 2024, 8:35pm

I would like to make a distinction between two different methods for ROI

Image region ROI

This is what is explained in the how-to.

This can be powerful if you know the objects you want to find are in a specified area. This way you can reduce computing power and speed up inference.

If you are unsure about the exact location of your objects, this method is not for you.

Results ROI

To make sure objects are completely visible it is a good idea to filter results after running inference on a full resolution image instead of cropping the image.

The easiest way is to use the xywh for regular boxes or xywhr for Oriented Bounding Boxes so you can check if the center point is within your ROI.

An easy way to use this for regular bounding boxes (untested):

from ultralytics import YOLO

# Load a pretrained YOLO11n model
model = YOLO("yolo11n.pt")

# Run inference on an image
results = model("bus.jpg")  # results list

filtered_results = []

# View results
for r in results:
    for box in boxes:
        # Get the center coordinates of the box
        box_center_x = box.xywh[0]
        box_center_y = box.xywh[1]
        
        # Check if the center of the box is within the ROI
        if (roi_x <= box_center_x <= roi_x + roi_w and
            roi_y <= box_center_y <= roi_y + roi_h):
            filtered_results.append(r)  # Add result to filtered_results if the box center is within the ROI
            print(box.xywh)  # Print the xywh for each box