Video tracking
Last updated
Was this helpful?
Last updated
Was this helpful?
One of the most common tasks in the video labeling toolbox is the video tracking. Supervisely has the top AI models and automatic video annotation tools you can use to efficiently track objects on videos.
Visual Object Tracking involves predicting the position of a target object in each frame of a video. The primary subtasks in Visual Object Tracking include:
Single Object Tracking (SOT)
Multiple Object Tracking (MOT)
Semi-Supervised Class-Agnostic Multiple Object Tracking
Video Object Segmentation (VOS)
Creating custom datasets for tracking is labor-intensive. For example, a one-hour video at 24 frames per second contains 86,400 frames. If each frame contains 8-12 objects, this results in about a million objects to track. Automating this process with AI models and tools can significantly reduce the workload.
Apart from other solutions, Supervisely is built like an OS. Instead of having a fixed number of video tracking algorithms, we provide a constantly growing Ecosystem of the best models. Pick the one you like, deploy it on your agent, select it in the Track Settings panel — and enjoy!
SOT involves tracking one object throughout the video based on a manual annotation on the first frame. The annotation, called a template, is used by a neural network to locate the object in subsequent frames. These models are class-agnostic, meaning they can track any object based on the initial annotation.
Label the first frame: Annotate the target object with a bounding box on the first frame.
Track automatically: Use a class-agnostic neural network to track the object in subsequent frames automatically.
MOT detects and tracks multiple objects of predefined classes, estimating their trajectories. The process involves two steps: detecting objects in each frame and associating detections across frames to form tracklets. This approach is known as the Tracking-by-Detection paradigm.
YOLO
DeepSort Algorithm
Object detection: Use a detection model like YOLOv5 to predict bounding boxes on each frame.
Tracking algorithm: Apply a tracking algorithm (e.g., DeepSort) to link detections and form object trajectories.
This approach combines the simplicity of SOT with the ability to track multiple objects. By applying an SOT model to each object on the first frame, users can track and correct multiple objects simultaneously, enhancing annotation speed and accuracy.
Apply SOT to each object: Use an SOT model to track each object from the first frame.
Correct and re-track: Correct tracking predictions and re-track objects as needed.
VOS tracks objects in videos using masks instead of bounding boxes. The user labels the object mask on the first frame, and the model segments and tracks the object in subsequent frames.
Use a single command in the terminal to connect your personal computer with GPU to the Supervisely account.
Deploy the SAM for fast object segmentation on the first frame.
Start the app, select a pre-trained model, and deploy via the GUI.
Interactive segmentation allows users to provide feedback to the model by marking positive and negative points on the image.
Serve the XMem model on a computer with GPU.
Go to the Neural Networks page and select the XMem model for video segmentation.
Use the video labeling toolbox to select objects and track masks in subsequent frames.
Apply the tracking algorithm and review results.
Check out our tutorial or or watch this 5-minute video to learn what object tracking is and how to track objects in your videos with the best models and tools.
Supervisely Apps: ,
Watch this