Supervisely
AboutAPI ReferenceSDK Reference
  • 🤖What's Supervisely
  • 🚀Ecosystem of Supervisely Apps
  • 💡FAQ
  • 📌Getting started
    • How to import
    • How to annotate
    • How to invite team members
    • How to connect agents
    • How to train models
  • 🔁Import and Export
    • Import
      • Overview
      • Import using Web UI
      • Supported annotation formats
        • Images
          • 🤖Supervisely JSON
          • 🤖Supervisely Blob
          • COCO
          • Yolo
          • Pascal VOC
          • Cityscapes
          • Images with PNG masks
          • Links from CSV, TXT and TSV
          • PDF files to images
          • Multiview images
          • Multispectral images
          • Medical 2D images
          • LabelMe
          • LabelStudio
          • Fisheye
          • High Color Depth
        • Videos
          • Supervisely
        • Pointclouds
          • Supervisely
          • .PCD, .PLY, .LAS, .LAZ pointclouds
          • Lyft
          • nuScenes
          • KITTI 3D
        • Pointcloud Episodes
          • Supervisely
          • .PCD, .PLY, .LAS, .LAZ pointclouds
          • Lyft
          • nuScenes
          • KITTI 360
        • Volumes
          • Supervisely
          • .NRRD, .DCM volumes
          • NIfTI
      • Import sample dataset
      • Import into an existing dataset
      • Import using Team Files
      • Import from Cloud
      • Import using API & SDK
      • Import using agent
    • Migrations
      • Roboflow to Supervisely
      • Labelbox to Supervisely
      • V7 to Supervisely
      • CVAT to Supervisely
    • Export
  • 📂Data Organization
    • Core concepts
    • MLOps Workflow
    • Projects
      • Datasets
      • Definitions
      • Collections
    • Team Files
    • Disk usage & Cleanup
    • Quality Assurance & Statistics
      • Practical applications of statistics
    • Operations with Data
      • Data Filtration
        • How to use advanced filters
      • Pipelines
      • Augmentations
      • Splitting data
      • Converting data
        • Convert to COCO
        • Convert to YOLO
        • Convert to Pascal VOC
    • Data Commander
      • Clone Project Meta
  • 📝Labeling
    • Labeling Toolboxes
      • Images
      • Videos 2.0
      • Videos 3.0
      • 3D Point Clouds
      • DICOM
      • Multiview images
      • Fisheye
    • Labeling Tools
      • Navigation & Selection Tools
      • Point Tool
      • Bounding Box (Rectangle) Tool
      • Polyline Tool
      • Polygon Tool
      • Brush Tool
      • Mask Pen Tool
      • Smart Tool
      • Graph (Keypoints) Tool
      • Frame-based tagging
    • Labeling Jobs
      • Labeling Queues
      • Labeling Consensus
      • Labeling Statistics
    • Labeling with AI-Assistance
  • 🤝Collaboration
    • Admin panel
      • Users management
      • Teams management
      • Server disk usage
      • Server trash bin
      • Server cleanup
      • Server stats and errors
    • Teams & workspaces
    • Members
    • Issues
    • Guides & exams
    • Activity log
    • Sharing
  • 🖥️Agents
    • Installation
      • Linux
      • Windows
      • AMI AWS
      • Kubernetes
    • How agents work
    • Restart and delete agents
    • Status and monitoring
    • Storage and cleanup
    • Integration with Docker
  • 🔮Neural Networks
    • Overview
    • Inference & Deployment
      • Overview
      • Supervisely Serving Apps
      • Deploy & Predict with Supervisely SDK
      • Using trained models outside of Supervisely
    • Model Evaluation Benchmark
      • Object Detection
      • Instance Segmentation
      • Semantic Segmentation
      • Custom Benchmark Integration
    • Custom Model Integration
      • Overview
      • Custom Inference
      • Custom Training
    • Legacy
      • Starting with Neural Networks
      • Train custom Neural Networks
      • Run pre-trained models
  • 👔Enterprise Edition
    • Get Supervisely
      • Installation
      • Post-installation
      • Upgrade
      • License Update
    • Kubernetes
      • Overview
      • Installation
      • Connect cluster
    • Advanced Tuning
      • HTTPS
      • Remote Storage
      • Single Sign-On (SSO)
      • CDN
      • Notifications
      • Moving Instance
      • Generating Troubleshoot Archive
      • Storage Cleanup
      • Private Apps
      • Data Folder
      • Firewall
      • HTTP Proxy
      • Offline usage
      • Multi-disk usage
      • Managed Postgres
      • Scalability Tuning
  • 🔧Customization and Integration
    • Supervisely .JSON Format
      • Project Structure
      • Project Meta: Classes, Tags, Settings
      • Tags
      • Objects
      • Single-Image Annotation
      • Single-Video Annotation
      • Point Cloud Episodes
      • Volumes Annotation
    • Developer Portal
    • SDK
    • API
  • 💡Resources
    • Changelog
    • GitHub
    • Blog
    • Ecosystem
Powered by GitBook
On this page
  • Splitting Data Using Supervisely Ecosystem Apps
  • Splitting Data Using Supervisely Python SDK

Was this helpful?

  1. Data Organization
  2. Operations with Data

Splitting data

PreviousAugmentationsNextConverting data

Last updated 4 months ago

Was this helpful?

Splitting data into training, validation, and testing sets is a common practice in machine learning projects. It helps to evaluate the performance of the model on unseen data and prevent overfitting. In this guide, we'll explore different methods to split data using the Supervisely Ecosystem Apps and the Supervisely Python SDK.

Splitting Data Using Supervisely Ecosystem Apps

Splitting data into training and testing sets is a crucial step in machine learning projects. Here are some apps from the Supervisely Ecosystem that can help you with this task:

  • . This app allows you to assign tags to images in a dataset to split them into training, validation, and testing sets. You can specify the percentage of images for each set and assign tags accordingly. The resulting project can be used in training apps to create sets using tags.

  • . This app allows you to split selected datasets into parts according to the specified percentage/number of images/number of parts. You can choose to split the dataset randomly or by the order of images. The resulting datasets can be created in the same project or in a new one.

Splitting Data Using Supervisely Python SDK

Here is an example of how you can split a project into training and testing sets using the Supervisely Python SDK:

import supervisely as sly

# Read the project
project_fs = sly.Project("./sly_project", sly.OpenMode.READ)
  • Splitting by percentage:

train_n = int(project_fs.total_items * 0.8)
val_n = project_fs.total_items - train_n
train_set, val_set =  project_fs.get_train_val_splits_by_count("./sly_project", train_n, val_n)
  • Splitting by dataset names:

train_set, val_set =  project_fs.get_train_val_splits_by_dataset("./sly_project", ["ds1", "ds2"], ["ds3"])
  • Splitting by tags:

train_set, val_set =  project_fs.get_train_val_splits_by_tag("./sly_project", ["tag1", "tag2"], ["tag3"])

All the above methods will return two lists of ItemInfo objects that represent the training and validation sets items.

class ItemInfo(NamedTuple):
    dataset_name: str  # Item's dataset name
    name: str  # Item's name
    img_path: str  # Full image file path of item
    ann_path: str  # Full annotation file path of item

You can use these items to get the corresponding image name and path, annotation path, and dataset name.

for item in train_set:
    print(f"{item.name=}, {item.img_path=}, {item.ann_path=}")
📂
Assign train/val tags to images
Split datasets