Custom Training
Overview
Welcome to this step-by-step tutorial on building model training applications with Supervisely SDK! This guide will show you how to set up a training app with a built-in user interface and all the necessary tools to seamlessly manage the training process. Supervisely SDK has a dedicated TrainApp
class that takes care of the heavy lifting, so you can focus directly on your model and training logic without worrying about the underlying infrastructure.
We'll use the Train RT-DETRv2 example to walk you through the process.
What the TrainApp
class offers:
Built-in GUI: Simple and easy-to-follow customizable interface.
Train and Val Splits: Handles splitting of your project into training and validation sets.
Data Preparation: Easily convert Supervisely annotation format into one of the popular formats with a single line of code (e.g., COCO annotation format).
Project Versioning: Saves project versions to reproduce training with the same data and annotations.
Model Evaluation: Model Evaluation Benchmark automatically runs evaluation of your model and generate a detailed report with metrics and visualizations.
Model Export: You can add exporting of your model to ONNX or TensorRT format.
Model Saving: Automatically saves your model and related files to Supervisely Team Files.
Step-by-Step Implementation
Let's dive into the steps required to integrate your custom model using the TrainApp
class.
Prepare Model Configurations: Create a
models.json
file with a list of model configurations.Prepare Hyperparameters: Define default hyperparameters and save to a
.yaml
file.Prepare App Options: Add optional features to control the GUI layout and behavior.
The TrainApp: Initialize the
TrainApp
class with the required parameters.Integrate Your Custom Model: Implement your custom model training logic using the
TrainApp
wrapper.Enhancements: Enhance your training app with additional features like a progress bar or model evaluation.
Run the Application: Launch the training app locally and deploy it to the Supervisely platform.
1. Prepare Model Configurations
To enable selection of a model configuration for training, create a models.json
file. This JSON file consists of a list of dictionaries, each detailing a specific model configuration. The information from this file will populate a table in your app's GUI, allowing users to select a model.
You can also add information or URLs of pretrained checkpoints and weights to enable fine-tuning of existing models.
📄 See source file for the RT-DETRv2 models.json
Example models.json
Example GUI preview:
Table Fields
Each dictionary item in models.json
represents a single model as a row, with all its fields, except for the meta
field, acting as columns. You can customize these fields to display the necessary information.
Technical Field (meta
)
meta
)Each model configuration must have a meta
field. This field is not displayed in the table but contains essential information required in training to properly build the model.
Here are the required fields:
(required)
task_type
: A computer vision task type (e.g., object detection).(required)
model_name
: Model configuration name.(required)
model_files
: A dict with files needed to load the model, such as model weights, config file. You can extend it with additional files if needed.(optional)
checkpoint
: Path or URL to the model weights to enable fine-tuning.(optional)
config
: Path to the model configuration file.(optional) Any additional files can be added to the
model_files
dictionary that are required for your model.
2. Prepare Hyperparameters
Define your default hyperparameters save to a .yaml
file (e.g., hyperparameters.yaml
). Path to this file is then passed to the TrainApp
for training configuration.
You can access hyperparameters later in the code by using train.hyperparameters
in your training logic.
📄 See source file for the RT-DETRv2 hyperparameters.yaml
Example hyperparameters.yaml
3. Prepare App Options
You can provide additional options to control the GUI layout and behavior. Create an app_options.yaml
file to enable or disable features.
📄 See source file for the RT-DETRv2 app_options.yaml
Example app_options.yaml
Available options
4. TrainApp Class
Now that you have prepared the necessary files, you can initialize the TrainApp
class with the required parameters. The TrainApp
class is the core component that manages the training process and provides a user-friendly interface for interacting with the training app.
TrainApp Signature
framework_name
str
Name of the ML framework used for training
models
Union[str, List[dict]]
Path to models.json
file with model configurations, or a list of the same JSON
hyperparameters
str
Path to hyperparameters.yaml
file
app_options
Optional[str]
Path to app_options.yaml
file
work_dir
Optional[str]
Local path for storing intermediate files, such as downloaded model files
Important TrainApp Attributes
train.work_dir
- Path to the working directory. Contains intermediate files, such as downloaded model files.train.output_dir
- Path to the output directory. Contains training results, logs, and checkpoints.
Project-related Attributes
train.project_id
- Supervisely project IDtrain.project_name
- Project nametrain.project_info
- Contains project information (ProjectInfo
object)train.project_meta
- ProjectMeta object with classes/tags infotrain.project_dir
- Project directory pathtrain.train_dataset_dir
- Training dataset directory pathtrain.val_dataset_dir
- Validation dataset directory pathtrain.sly_project
- Supervisely sly.Project object
Model-related Attributes
train.model_name
- Name of the selected modeltrain.model_source
- Indicates if the model is pretrained or custom (trained in Supervisely)train.model_files
- Dictionary containing paths to model files (e.g.,checkpoint
and optionalconfig
)train.model_info
- Entry from themodels.json
for the selected model if model is selected from the list of pretrained models, otherwise experiment info dict.train.hyperparameters
- Dictionary of selected hyperparameterstrain.classes
- List of selected class namestrain.device
- Selected CUDA device
5. Integrate Your Custom Model
Now it's time to integrate your custom model using the TrainApp class. We'll use the RT-DETRv2 model as an example. You can always refer to the source code for Train RT-DETRv2 on GitHub.
5.1 Initialize Your Imports
5.2 Initialize the TrainApp
TrainApp
Create an instance of the TrainApp
by providing the framework name, model configuration file, hyperparameters file, and app options file.
5.3 Prepare Training data
The TrainApp
gives you access to the Supervisely Project containing the train
and val
datasets. Convert these datasets to the desired format (e.g., COCO) and run your training routine. You can use built-in converters like to_coco
to convert your datasets to the required format or write your custom converter if required.
Use train.sly_project
to access the Supervisely project in the training code.
Data Conversion Example:
5.4 Implement Training Routine
All training code should be implemented in the function under the @train.start
decorator and should return the experiment information dictionary.
Returned dictionary should contain the following fields:
model_name
- Name of the model used for training.model_files
- Dictionary with paths to additional model files (e.g.config
). These files together with checkpoints will be uploaded to Supervisely Team Files automatically.checkpoints
- A list of checkpoint paths, or an output directory with checkpoints. These checkpoints will be uploaded to Supervisely Team Files automatically.best_checkpoint
- Name of the best checkpoint file.
These fields will be validated upon training completion, and if one of them is missing, the training will be considered as failed.
In this example training logic and loop are inside solver.fit()
function.
📄 See source code for the RT-DETRv2 main.py and training logic
Training Routine with @train.start
Decorator
6. Enhancements
You can enhance your training application by adding additional features like a progress bar or model evaluation. These features provide valuable feedback to the user and help in monitoring the training process.
6.1 Progress Bar with the Train Logger ⏱️
Enhance your training feedback by incorporating a progress bar via the Supervisely train_logger
.
Simply import train_logger
from supervisely.nn.training
and use it in your training loop.
6.2 Register the Inference Class for model evaluation 📊
If you plan to use Evaluation Model Benchmark, you need to implement Inference class and register it for TrainApp. Refer to the Integrate Custom Inference guide for more information.
7. Run the Application
Now that you've integrated your custom model, you're ready to launch the training application. You can choose to run it locally for testing or deploy it directly to the Supervisely platform. The training app functions like any other Supervisely app, but with a built-in GUI.
Useful links:
Environment variables - Setting up environment variables for Supervisely App development
App developemnt - Basics of Supervisely App development
Hello world! - Simple example of a Supervisely App
Run and debug locally
Prepare environment file before running the app:
You can run and debug your training app locally using the following shell command:
After running the app, you can access it at http://localhost:8000.
If you are a VSCode user, you can create .vscode/launch.json
configuration to run and debug your training app locally.
Example launch.json
Deploy to Supervisely
Releasing your private app to the Supervisely platform is very simple. You will need a git repository with your app code, Supervisely SDK and credentials to access the platform. You can refer to Add private app for more information.
Follow steps below to release your app:
[Optional] git clone your app code if you haven't done it yet:
[Optional] create a python virtual environment:
Install Supervisely SDK:
Provide your Supervisely credentials in one of the following ways:
Create a file
supervisely.env
in the home directory with the fields:Set environment variables:
Run CLI command to deploy the app:
If the app code is in a subdirectory, you can specify the path to the app code (directory with
config.json
file):
Final Steps 📦
After training is completed successfully, the TrainApp
will automatically prepare and upload the model and all related files to Supervisely storage. During this finalization phase, TrainApp executes several important steps to ensure your experiment's outputs are correctly validated, processed, and stored.
Standard Supervisely storage path for the artifacts: /experiments/{project_id}_{project_name}/{task_id}_{framework_name}/
Here's what happens:
Validating of
experiment_info
The system checks theexperiment_info
dictionary to ensure all required fields (like model name, file paths, etc.) are correct and complete. This step is essential to prevent any missing or incorrect metadata.Postprocess Training Artifacts Additional processing is applied to the raw training outputs (e.g., cleaning up temporary files, formatting logs, and aggregating metrics) to generate a standardized set of artifacts.
Postprocess Training and Validation Splits The splits for training and validation data are further refined if needed. This step ensures consistency and prepares the splits for future reference or re-training if necessary.
Upload Model Files and Checkpoints to Supervisely Storage All model files provided in experiment info and checkpoints (e.g., best, intermediate, and last checkpoints) are automatically uploaded to the Supervisely storage so that they are safely stored and can be accessed later.
Create and Upload model_meta.json A metadata file (
model_meta.json
) is generated, which includes essential details about the model (such as its architecture, training parameters, and version). This file is then uploaded along with other artifacts.Run Model Benchmarking (if enabled) If model benchmarking is enabled in your
app_options
, the system will run automated evaluation and generate a detailed report with metrics and visualizations. This report is then uploaded to the Supervisely platform.Export Model to ONNX and TensorRT (if enabled) For ease of deployment, the model may be automatically exported to additional formats like ONNX and TensorRT. This ensures compatibility with different serving environments.
Generate and Upload Additional Training Files Other supplementary files such as logs, experiment reports, and configuration files are packaged and uploaded. These additional artifacts help in future debugging, auditing, or model evaluation.
Training Artifacts
The final training artifacts are organized and stored in the Supervisely storage directory. This directory contains all the necessary files and metadata related to the training experiment.
Output artifacts directory structure:
Experiment Info
Experiment info contains all the necessary information about the training experiment. This file is generated automatically by the TrainApp
and is used to store metadata, paths to model files, checkpoints, and other artifacts, making it easy to track and manage the training process.
This file is essential and used by the Supervisely platform to organize and display models in a structured manner in training and serving apps.
Example experiment_info.json
This is an example of the final experiment information file that is generated:
Fields explanation:
All paths listed in the experiment_info.json
are relative to the artifacts_dir
field.
experiment_name, framework_name, model_name, task_type: General metadata about the training experiment.
project_id, task_id: IDs used to uniquely identify the project and the training task within Supervisely.
model_files: Contains paths to the model configuration file(s).
checkpoints & best_checkpoint: Lists the checkpoints produced during training and indicates the best-performing one.
export: Shows the export file for ONNX (or TensorRT if enabled).
app_state: Location of the
app_state.json
file for debugging and re-runs.model_meta: Path to the model meta file. Contains ProjectMeta object in
.json
format.train_val_split: Location of the
train_val_split.json
file containing the split information.hyperparameters: Path to the hyperparameters file used for training.
artifacts_dir & datetime: Directory where artifacts are stored and the timestamp of the experiment.
evaluation_report and evaluation_metrics: Information from the model benchmarking process (if run).
logs: Location and type of training logs for review in the Supervisely interface.
Additional Resources 📚
Export Model to ONNX and TensorRT
If you'd like to export your trained model to the ONNX or TensorRT format, you can easily do so using dedicated decorators. Simply add the @train.export_onnx
and @train.export_tensorrt
decorators to your export functions. These functions should return the file path of the exported model, and the TrainApp will take care of uploading it to Supervisely storage automatically.
About GUI Layout
The graphical user interface (GUI) for the training app is a pre-built template based on Supervisely Widgets. Each step is organized into a separate Card that groups related settings together, making it intuitive to navigate through the training settings.
Step 1. Project Options
In this initial step, the app displays the project on which the training is run. It also offers an option to cache the project on the training agent for future use.
Step 2. Train and Val Splits
This step allows you to split your project data into training and validation sets. You can choose among different splitting methods such as random, tags, or by datasets.
Step 3. Classes Selector
Select the classes you want your model to train on. You can choose multiple classes from the provided classes table.
Step 4. Model Selector
Here, you can choose the model you wish to train. Select from pretrained models or your own custom models (trained previously in Supervisely). Once trained, your custom model will automatically appear in the custom models table next time you run the app.
Step 5. Hyperparameters
Set and adjust the training hyperparameters in this step. You can also enable model benchmarking (if enabled and implemented) and export your model to ONNX or TensorRT formats (if enabled and implemented). The hyperparameters are fully customizable using an Editor widget, allowing you to add as many variables and values as needed.
Step 6. Training Process
Enter your experiment name, choose the CUDA device (if enabled), and start the training process. Once training begins, previous steps will be locked to prevent changes during the run.
Step 7. Training Logs
View real-time training logs and progress via a progress bar (if implemented). This step also provides a link to the TensorBoard dashboard for more detailed monitoring (if implemented).
Step 8. Training Artifacts
After training finishes, this step displays the final training artifacts along with links to the stored files. These artifacts are automatically uploaded and organized in the Supervisely storage.
Default artifacts location is: /experiments/{project_id}_{project_name}/{task_id}_{framework_name}/
How to see TensorBoard logs after the training is finished
Once the training process is complete, you can view the TensorBoard logs in the logs
directory within the artifacts folder.
The default location for these logs is: /experiments/{project_id}_{project_name}/{task_id}_{framework_name}/logs/
To open and view the log file, you can use Tensorboard Experiments Viewer app.
Debugging with app_state.json 🐞
The app_state.json
file captures the state of the TrainApp
right before training begins. This file is incredibly useful for debugging and troubleshooting issues that might arise during training. It allows developers to quickly restart the training process without having to reconfigure all the settings via the GUI.
You can access app_state
in the training code by using train.app_state
and dump it locally for debugging purposes.
Why Use app_state.json?
Quick Debugging: If something goes wrong during training, you can inspect
app_state.json
file to see the exact configuration and quickle re-run the app without manually reselecting settings in the GUI every time.State Preservation: It preserves the state of the training app in a human-readable
.json
format.
How to use app_state.json
Call load_from_app_state
method to load the app state from the app_state.json
file after the app is initialized.
Example of app_state.json
:
Fields in app_state.json
input
description: Contains the input data for the training app.
fields:
project_id
– The ID of the project to use for training.
train_val_split
description: Configures how the dataset is split into training and validation sets.
fields:
method
: The splitting method (e.g.,random
,tags
, ordatasets
).If
method
israndom
:split
: Specifies which split to use (e.g., train or val).percent
: The percentage of the dataset to include for training.
If
method
istags
:train_tag
: The tag used for the training split.val_tag
: The tag used for the validation split.untagged_action
: Action for untagged images (options: train, val, or ignore).
If
method
isdatasets
:train_datasets
: IDs of datasets used for training.val_datasets
: IDs of datasets used for validation.
classes
description: List of the classes to be used during training. Must match the class names in the project.
model
description: Specifies the model configuration for training.
fields:
source
: The model source, such asPretrained models
orCustom model
.If
source
isPretrained models
:model_name
: The name of the pretrained model (should match an entry inmodels.json
).
If
source
isCustom model
:task_id
: The training session ID containing the custom checkpoint.checkpoint
: The name of the custom checkpoint to be used.
hyperparameters
description: Contains the hyperparameters (in
.yaml
format) used for training.
options
description: Provides additional optional configurations for the training app.
fields:
model_benchmark
: Configuration for model benchmarking.enable
: Boolean to enable or disable benchmarking.speed_test
: Boolean to enable or disable a speed test.
cache_project
: Boolean to enable or disable caching of the project.export
: Configurations for model export.ONNXRuntime
: Option for enabling/disabling ONNX model export.TensorRT
: Option for enabling/disabling TensorRT model export.
Last updated
Was this helpful?