Splitting data
Splitting data into training, validation, and testing sets is a common practice in machine learning projects. It helps to evaluate the performance of the model on unseen data and prevent overfitting. In this guide, we'll explore different methods to split data using the Supervisely Ecosystem Apps and the Supervisely Python SDK.
Splitting Data Using Supervisely Ecosystem Apps
Splitting data into training and testing sets is a crucial step in machine learning projects. Here are some apps from the Supervisely Ecosystem that can help you with this task:
Assign train/val tags to images. This app allows you to assign tags to images in a dataset to split them into training, validation, and testing sets. You can specify the percentage of images for each set and assign tags accordingly. The resulting project can be used in training apps to create sets using tags.
Split datasets. This app allows you to split selected datasets into parts according to the specified percentage/number of images/number of parts. You can choose to split the dataset randomly or by the order of images. The resulting datasets can be created in the same project or in a new one.
Splitting Data Using Supervisely Python SDK
Here is an example of how you can split a project into training and testing sets using the Supervisely Python SDK:
Splitting by percentage:
Splitting by dataset names:
Splitting by tags:
All the above methods will return two lists of ItemInfo
objects that represent the training and validation sets items.
You can use these items to get the corresponding image name and path, annotation path, and dataset name.
Last updated