Custom Training
Last updated
Was this helpful?
Last updated
Was this helpful?
Welcome to this step-by-step tutorial on building model training applications with Supervisely SDK! This guide will show you how to set up a training app with a built-in user interface and all the necessary tools to seamlessly manage the training process. Supervisely SDK has a dedicated TrainApp
class that takes care of the heavy lifting, so you can focus directly on your model and training logic without worrying about the underlying infrastructure.
We'll use the example to walk you through the process.
What the TrainApp
class offers:
Built-in GUI: Simple and easy-to-follow customizable interface.
Train and Val Splits: Handles splitting of your project into training and validation sets.
Data Preparation: Easily convert Supervisely annotation format into one of the popular formats with a single line of code (e.g., COCO annotation format).
Project Versioning: Saves project versions to reproduce training with the same data and annotations.
Model Evaluation: automatically runs evaluation of your model and generate a detailed report with metrics and visualizations.
Model Export: You can add exporting of your model to ONNX or TensorRT format.
Model Saving: Automatically saves your model and related files to Supervisely Team Files.
Let's dive into the steps required to integrate your custom model using the TrainApp
class.
: Create a models.json
file with a list of model configurations.
: Define default hyperparameters and save to a .yaml
file.
: Add optional features to control the GUI layout and behavior.
: Initialize the TrainApp
class with the required parameters.
: Implement your custom model training logic using the TrainApp
wrapper.
: Enhance your training app with additional features like a progress bar or model evaluation.
: Launch the training app locally and deploy it to the Supervisely platform.
To enable selection of a model configuration for training, create a models.json
file. This JSON file consists of a list of dictionaries, each detailing a specific model configuration. The information from this file will populate a table in your app's GUI, allowing users to select a model.
You can also add information or URLs of pretrained checkpoints and weights to enable fine-tuning of existing models.
Example models.json
Example GUI preview:
Each dictionary item in models.json
represents a single model as a row, with all its fields, except for the meta
field, acting as columns. You can customize these fields to display the necessary information.
meta
)Each model configuration must have a meta
field. This field is not displayed in the table but contains essential information required in training to properly build the model.
Here are the required fields:
(required) task_type
: A computer vision task type (e.g., object detection).
(required) model_name
: Model configuration name.
(required) model_files
: A dict with files needed to load the model, such as model weights, config file. You can extend it with additional files if needed.
(optional) checkpoint
: Path or URL to the model weights to enable fine-tuning.
(optional) config
: Path to the model configuration file.
(optional) Any additional files can be added to the model_files
dictionary that are required for your model.
Define your default hyperparameters save to a .yaml
file (e.g., hyperparameters.yaml
). Path to this file is then passed to the TrainApp
for training configuration.
You can access hyperparameters later in the code by using train.hyperparameters
in your training logic.
Example hyperparameters.yaml
You can provide additional options to control the GUI layout and behavior. Create an app_options.yaml
file to enable or disable features.
Example app_options.yaml
Available options
Now that you have prepared the necessary files, you can initialize the TrainApp
class with the required parameters. The TrainApp
class is the core component that manages the training process and provides a user-friendly interface for interacting with the training app.
framework_name
str
Name of the ML framework used for training
models
Union[str, List[dict]]
Path to models.json
file with model configurations, or a list of the same JSON
hyperparameters
str
Path to hyperparameters.yaml
file
app_options
Optional[str]
Path to app_options.yaml
file
work_dir
Optional[str]
Local path for storing intermediate files, such as downloaded model files
train.work_dir
- Path to the working directory. Contains intermediate files, such as downloaded model files.
train.output_dir
- Path to the output directory. Contains training results, logs, and checkpoints.
Project-related Attributes
train.project_id
- Supervisely project ID
train.project_name
- Project name
train.project_info
- Contains project information (ProjectInfo
object)
train.project_dir
- Project directory path
train.train_dataset_dir
- Training dataset directory path
train.val_dataset_dir
- Validation dataset directory path
Model-related Attributes
train.model_name
- Name of the selected model
train.model_source
- Indicates if the model is pretrained or custom (trained in Supervisely)
train.model_files
- Dictionary containing paths to model files (e.g., checkpoint
and optional config
)
train.hyperparameters
- Dictionary of selected hyperparameters
train.classes
- List of selected class names
train.device
- Selected CUDA device
TrainApp
Create an instance of the TrainApp
by providing the framework name, model configuration file, hyperparameters file, and app options file.
Data Conversion Example:
All training code should be implemented in the function under the @train.start
decorator and should return the experiment information dictionary.
Returned dictionary should contain the following fields:
model_name
- Name of the model used for training.
model_files
- Dictionary with paths to additional model files (e.g. config
). These files together with checkpoints will be uploaded to Supervisely Team Files automatically.
checkpoints
- A list of checkpoint paths, or an output directory with checkpoints. These checkpoints will be uploaded to Supervisely Team Files automatically.
best_checkpoint
- Name of the best checkpoint file.
These fields will be validated upon training completion, and if one of them is missing, the training will be considered as failed.
In this example training logic and loop are inside solver.fit()
function.
Training Routine with @train.start
Decorator
You can enhance your training application by adding additional features like a progress bar or model evaluation. These features provide valuable feedback to the user and help in monitoring the training process.
Enhance your training feedback by incorporating a progress bar via the Supervisely train_logger
.
Simply import train_logger
from supervisely.nn.training
and use it in your training loop.
Now that you've integrated your custom model, you're ready to launch the training application. You can choose to run it locally for testing or deploy it directly to the Supervisely platform. The training app functions like any other Supervisely app, but with a built-in GUI.
Prepare environment file before running the app:
You can run and debug your training app locally using the following shell command:
If you are a VSCode user, you can create .vscode/launch.json
configuration to run and debug your training app locally.
Example launch.json
Follow steps below to release your app:
[Optional] git clone your app code if you haven't done it yet:
[Optional] create a python virtual environment:
Install Supervisely SDK:
Provide your Supervisely credentials in one of the following ways:
Create a file supervisely.env
in the home directory with the fields:
Set environment variables:
Run CLI command to deploy the app:
If the app code is in a subdirectory, you can specify the path to the app code (directory with config.json
file):
After training is completed successfully, the TrainApp
will automatically prepare and upload the model and all related files to Supervisely storage. During this finalization phase, TrainApp executes several important steps to ensure your experiment's outputs are correctly validated, processed, and stored.
Standard Supervisely storage path for the artifacts: /experiments/{project_id}_{project_name}/{task_id}_{framework_name}/
Here's what happens:
Validating of experiment_info
The system checks the experiment_info
dictionary to ensure all required fields (like model name, file paths, etc.) are correct and complete. This step is essential to prevent any missing or incorrect metadata.
Postprocess Training Artifacts Additional processing is applied to the raw training outputs (e.g., cleaning up temporary files, formatting logs, and aggregating metrics) to generate a standardized set of artifacts.
Postprocess Training and Validation Splits The splits for training and validation data are further refined if needed. This step ensures consistency and prepares the splits for future reference or re-training if necessary.
Upload Model Files and Checkpoints to Supervisely Storage All model files provided in experiment info and checkpoints (e.g., best, intermediate, and last checkpoints) are automatically uploaded to the Supervisely storage so that they are safely stored and can be accessed later.
Create and Upload model_meta.json A metadata file (model_meta.json
) is generated, which includes essential details about the model (such as its architecture, training parameters, and version). This file is then uploaded along with other artifacts.
Run Model Benchmarking (if enabled) If model benchmarking is enabled in your app_options
, the system will run automated evaluation and generate a detailed report with metrics and visualizations. This report is then uploaded to the Supervisely platform.
Export Model to ONNX and TensorRT (if enabled) For ease of deployment, the model may be automatically exported to additional formats like ONNX and TensorRT. This ensures compatibility with different serving environments.
Generate and Upload Additional Training Files Other supplementary files such as logs, experiment reports, and configuration files are packaged and uploaded. These additional artifacts help in future debugging, auditing, or model evaluation.
The final training artifacts are organized and stored in the Supervisely storage directory. This directory contains all the necessary files and metadata related to the training experiment.
Output artifacts directory structure:
Experiment info contains all the necessary information about the training experiment. This file is generated automatically by the TrainApp
and is used to store metadata, paths to model files, checkpoints, and other artifacts, making it easy to track and manage the training process.
This file is essential and used by the Supervisely platform to organize and display models in a structured manner in training and serving apps.
Example experiment_info.json
This is an example of the final experiment information file that is generated:
Fields explanation:
experiment_name, framework_name, model_name, task_type: General metadata about the training experiment.
project_id, task_id: IDs used to uniquely identify the project and the training task within Supervisely.
model_files: Contains paths to the model configuration file(s).
checkpoints & best_checkpoint: Lists the checkpoints produced during training and indicates the best-performing one.
export: Shows the export file for ONNX (or TensorRT if enabled).
app_state: Location of the app_state.json
file for debugging and re-runs.
train_val_split: Location of the train_val_split.json
file containing the split information.
hyperparameters: Path to the hyperparameters file used for training.
artifacts_dir & datetime: Directory where artifacts are stored and the timestamp of the experiment.
evaluation_report and evaluation_metrics: Information from the model benchmarking process (if run).
logs: Location and type of training logs for review in the Supervisely interface.
If you'd like to export your trained model to the ONNX or TensorRT format, you can easily do so using dedicated decorators. Simply add the @train.export_onnx
and @train.export_tensorrt
decorators to your export functions. These functions should return the file path of the exported model, and the TrainApp will take care of uploading it to Supervisely storage automatically.
In this initial step, the app displays the project on which the training is run. It also offers an option to cache the project on the training agent for future use.
This step allows you to split your project data into training and validation sets. You can choose among different splitting methods such as random, tags, or by datasets.
Select the classes you want your model to train on. You can choose multiple classes from the provided classes table.
Here, you can choose the model you wish to train. Select from pretrained models or your own custom models (trained previously in Supervisely). Once trained, your custom model will automatically appear in the custom models table next time you run the app.
Enter your experiment name, choose the CUDA device (if enabled), and start the training process. Once training begins, previous steps will be locked to prevent changes during the run.
View real-time training logs and progress via a progress bar (if implemented). This step also provides a link to the TensorBoard dashboard for more detailed monitoring (if implemented).
After training finishes, this step displays the final training artifacts along with links to the stored files. These artifacts are automatically uploaded and organized in the Supervisely storage.
Default artifacts location is: /experiments/{project_id}_{project_name}/{task_id}_{framework_name}/
Once the training process is complete, you can view the TensorBoard logs in the logs
directory within the artifacts folder.
The default location for these logs is: /experiments/{project_id}_{project_name}/{task_id}_{framework_name}/logs/
The app_state.json
file captures the state of the TrainApp
right before training begins. This file is incredibly useful for debugging and troubleshooting issues that might arise during training. It allows developers to quickly restart the training process without having to reconfigure all the settings via the GUI.
You can access app_state
in the training code by using train.app_state
and dump it locally for debugging purposes.
Why Use app_state.json?
Quick Debugging: If something goes wrong during training, you can inspect app_state.json
file to see the exact configuration and quickle re-run the app without manually reselecting settings in the GUI every time.
State Preservation: It preserves the state of the training app in a human-readable .json
format.
How to use app_state.json
Call load_from_app_state
method to load the app state from the app_state.json
file after the app is initialized.
Example of app_state.json
:
input
description: Contains the input data for the training app.
fields:
project_id
– The ID of the project to use for training.
train_val_split
description: Configures how the dataset is split into training and validation sets.
fields:
method
: The splitting method (e.g., random
, tags
, or datasets
).
If method
is random
:
split
: Specifies which split to use (e.g., train or val).
percent
: The percentage of the dataset to include for training.
If method
is tags
:
train_tag
: The tag used for the training split.
val_tag
: The tag used for the validation split.
untagged_action
: Action for untagged images (options: train, val, or ignore).
If method
is datasets
:
train_datasets
: IDs of datasets used for training.
val_datasets
: IDs of datasets used for validation.
classes
description: List of the classes to be used during training. Must match the class names in the project.
model
description: Specifies the model configuration for training.
fields:
source
: The model source, such as Pretrained models
or Custom model
.
If source
is Pretrained models
:
model_name
: The name of the pretrained model (should match an entry in models.json
).
If source
is Custom model
:
task_id
: The training session ID containing the custom checkpoint.
checkpoint
: The name of the custom checkpoint to be used.
hyperparameters
description: Contains the hyperparameters (in .yaml
format) used for training.
options
description: Provides additional optional configurations for the training app.
fields:
model_benchmark
: Configuration for model benchmarking.
enable
: Boolean to enable or disable benchmarking.
speed_test
: Boolean to enable or disable a speed test.
cache_project
: Boolean to enable or disable caching of the project.
export
: Configurations for model export.
ONNXRuntime
: Option for enabling/disabling ONNX model export.
TensorRT
: Option for enabling/disabling TensorRT model export.
📄 See source file for the RT-DETRv2
📄 See source file for the RT-DETRv2
📄 See source file for the RT-DETRv2
description: If true
, adds a checkbox allowing to export model to ONNX format. Requires implementing of .
description: If true
, adds a checkbox allowing to export model to TensorRT format. Requires implementing of .
train.project_meta
- object with classes/tags info
train.sly_project
- Supervisely object
train.model_info
- Entry from the models.json
for the selected model if model is selected from the list of pretrained models, otherwise dict.
Now it's time to integrate your custom model using the TrainApp class. We'll use the RT-DETRv2 model as an example. You can always refer to the source code for Train RT-DETRv2 on .
The TrainApp
gives you access to the Supervisely containing the train
and val
datasets. Convert these datasets to the desired format (e.g., COCO) and run your training routine. You can use built-in converters like to_coco
to convert your datasets to the required format or write your custom converter if required.
📄 See source code for the RT-DETRv2 and
If you plan to use , you need to implement Inference class and register it for TrainApp. Refer to the guide for more information.
- Setting up environment variables for Supervisely App development
- Basics of Supervisely App development
- Simple example of a Supervisely App
After running the app, you can access it at .
Releasing your private app to the Supervisely platform is very simple. You will need a git repository with your app code, Supervisely SDK and credentials to access the platform. You can refer to for more information.
model_meta: Path to the model meta file. Contains object in .json
format.
The graphical user interface (GUI) for the training app is a pre-built template based on . Each step is organized into a separate that groups related settings together, making it intuitive to navigate through the training settings.
Set and adjust the training hyperparameters in this step. You can also enable model benchmarking (if enabled and implemented) and export your model to ONNX or TensorRT formats (if enabled and implemented). The hyperparameters are fully customizable using an widget, allowing you to add as many variables and values as needed.
To open and view the log file, you can use app.