Data Folder

Your installation of Supervisely platform uses the DATA_PATH value to configure where to store its persistent data. By default, this value is set to /supervisely/data. This guide explains what kind of data can be found inside this folder, requirements and the cleanup.

Folder
Avg Size
Fast drive
Can be safely cleaned

db

2-10Gb+

required

No

logs

10Mb - 4Gb+

not required

Yes

net-server

1Mb

not required

Almost

proxy_cache

100Mb - 10Gb+

preferable

Yes

rabbitmq

100Mb - 2Gb

preferable

Almost

redis

10Mb - 2Gb

not required

Almost

redis-json

10Mb - 1Gb

not required

Almost

storage

10Gb - 100Gb+

not required

No

.
├── db
├── logs
├── net-server
├── proxy_cache
├── rabbitmq
├── redis
├── redis-json
└── storage

Never set DATA_PATH pointing to a network share (NFS/SMB/ESB/etc), because it affects the performance significantly. Instead, you should just symlink every folder that doesn't require a fast drive to a network share. In most cases it's just the "storage" folder.

db

This subfolder is used by PostgreSQL relational database. This is the primary database Supervisely uses to store your annotations, users, dataset structures, and so on. Contents of this folder are shared with postgres Docker container. The size of the database usually does not exceed 10 Gb.

It's advised to store this folder on a fast SSD drive. If you store it on a slow HDD drive, you may experience performance issues.

This database does not store your actual images or videos, only URLs or file hashes.

Fast drive: required for the best performance Can be safely cleaned: No, you will lose all your annotations and projects.

logs

This subfolder is used by Vector logs parsing and transforming service (vector Docker container). Vector dumps the logs into the logs subfolder in Zstandard JSON lines format. Logs can be easily obtained by running the sudo supervisely troubleshoot command.

By default we do not clean this folder automatically.

Fast drive: optional, doesn't affect the performance Can be safely cleaned: Yes

proxy_cache

This subfolder is used by Nginx to cache certain resources for fast access of frequently used assets, mainly small previews of images and video frames. The size of this folder can be configured via CACHE_STORAGE_SIZE setting.

Fast drive: preferred, but not required Can be safely cleaned: Yes

rabbitmq

This subfolder is used by RabbitMQ message broker. This is a temporary storage to queue tasks. If you clean this folder, running tasks will be stopped an may end up in an invalid state

Fast drive: preferred, but not required Can be safely cleaned: Almost

redis & redis-json

This subfolder is used by Redis cache database. This is a storage for temporary data that is also available in the main database (PostgreSQL), but is duplicated for fast access. For example, users' online status is cached there. If you clean this folder, some minor information such as real-time logs can be lost

Fast drive: optional, doesn't affect the performance Can be safely cleaned: Almost

storage

This subfolder is used by various services to store permanent files, such as images and other assets.

Some of the examples:

  • Images

  • Videos

  • Point cloud files

  • Model checkpoints

  • Application posters

  • Jupyter notebooks

  • Task data

Usually, we generate a unique file name or use file hash instead of the original file name.

You can configure Supervisely to use remote storage, such as S3, instead of this folder. In this case this folder will be empty, and actual files will be stored as blob objects in the selected cloud.

You will find two subfolders, *-public and *-private inside this folder. Those names do not reflect the actual privacy of folder contents; both folders are completely private and not publicly accessible; those names are legacy.

Fast drive: completely optional, required in very rare cases Can be safely cleaned: No, you will lose all your images, videos, and other assets.

Last updated