Remote Storage

In Enterprise Edition you can not only store files on a hard drive, but also connect Azure Blob Storage, Google Cloud or any S3 compatible storage (i.e. AWS S3).

You can upload files from your PC to connected cloud storage or use already uploaded files from cloud storage as a source (without duplicating it).

How we store files

Supervisely uses DATA_PATH from .env (defaults to /supervisely/data) to keep caches, database and etc. But we are interested in storage subfolder generated content, like uploaded images or neural networks are stored.

You can find two subfolders here:

  • <something>-public/

  • <something>-private/

That's because we maintain the same structure in local storage as if you would use a remote storage. In that case those two folders are buckets or containers. You may notice that one has "public" in it's name, but it only reflects the kind of data we store in it. Both buckets are private and does not provide anonymous read.

Configure Supervisely to use S3 compatible storage (Amazon S3, Minio)

Edit .env configuration file - you can find it by running supervisely where command.

Change STORAGE_PROVIDER from http (local hard drive) to minio (S3 storage backend).

Also, you need to provide STORAGE_ACCESS_KEY and STORAGE_SECRET_KEY credentials along with endpoint of your S3 storage.

For example, here are settings for Amazon S3:

  • STORAGE_ENDPOINT=s3.amazonaws.com

  • STORAGE_PORT=443

So in the end, here is how your .env settings could look like:

Execute sudo supervisely up -d to apply the new settings.

If you're working with large files (4GB+) you might also want to add permission for "s3:ListBucketMultipartUploads" at the bucket level, so Supervisely can initiate multipart uploads for larger artifacts.

Configure Supervisely to use Azure Blob Storage

Edit .env configuration file - you can find it by running supervisely where command.

Change STORAGE_PROVIDER from http (local hard drive) to azure (Azure storage backend).

Also, you need to provide STORAGE_ACCESS_KEY (your storage account name) and STORAGE_SECRET_KEY (secret key) credentials along with endpoint of your blob storage.

Here is how your .env settings could look like:

Execute sudo supervisely up -d to apply the new settings

Configure Supervisely to use Google Cloud Storage

Edit .env configuration file - you can find it by running supervisely where command.

Change STORAGE_PROVIDER from http (local hard drive) to google (GCS backend).

Also, you need to provide STORAGE_CREDENTIALS_PATH credentials file generated by Google.

Here is how your .env settings could look like:

Now create docker-compose.override.yml under cd $(sudo supervisely where):

Execute sudo supervisely up -d to apply the new settings

Migration from local storage

Now, copy your current storage to an S3. As we mentioned before, because we maintain the same structure in local filesystem, copying will be enough.

We suggest to use minio/mc to copy the files.

Run minio/mc docker image and execute the following commands:

Finally, restart services to apply new configuration: supervisely up -d.

Keys from IAM Role

IAM Roles are only supported for AWS S3.

For Enterprise Edition:

If you want to use IAM Role you must specify STORAGE_IAM_ROLE=<role_name> in .env file then STORAGE_ACCESS_KEY and STORAGE_SECRET_KEY variables can be ommited.

For Community Edition or Enterprise Edition (if deployed without AWS):

If you have a bucket with data and you want to connect to it securely outside AWS there are 2 ways: sharing access and secret key pair or using IAM Roles Anywhere. In this section, we will describe how to do it with IAM Roles Anywhere.

Steps to configure IAM Roles Anywhere:

1. Generate certificates and keys.

We've prepared two bash scripts for you. Download ⬇︎ cert.zip and extract them. Update genCACert.sh and genCert.sh scripts with your values for DURATION_DAYS, CN, OU, O variables. You can set any values you want.

Generate master certificate and key (will be used in the AWS trust anchor)

Generate certificates and keys for the client (Supervisely):

2. Create a trust anchor.

Open AWS Console and go to Roles Anywhere service and create a trust anchor.

Roles Anywhere
Roles Anywhere
Create trust anchor

Copy contents of the ca.crt file, generated earlier, and paste it into the External certificate bundle field.

3. Create an IAM role.

To create a profile, you need to create an IAM role. Refer to the AWS documentation for more information.

Once you have created the IAM role, go to the IAM role trust policy settings and add a new trust relationship (you can copy it from the created trust anchor).

Trust relationships

4. Create a profile.

Return to the Roles Anywhere service and create a new profile.

Roles Anywhere

Select the IAM role you've created earlier.

Create profile

5. Configure remote storage settings in Supervisely.

Open the remote storage settings in Supervisely, switch to the IAM Anywhere tab and fill in the fields.

Remote storage settings
Remote storage settings

In the certificate field, you need to paste the content of the company.pem file. Please note that the content must be base64 encoded. You can get it by running the following command:

In the signing private key field, you need to paste the content of the company.key file. As with the certificate, the content must be base64 encoded. You can get it by running the following command:

Don't forget to add S3 bucket name and save the settings.

Frontend caching

Since AWS and Azure can be quite price in case of heavy reads, we enable image caching by default.

If the image is not in the preview cache but in the STORAGE cache it will be generated and put into previews cache, but it will not be fetched from the remote server.

Here are the default values (you can alter them via docker-compose.override.yml file):

If you already have some files on Amazon S3/Google Cloud Storage/Azure Storage and you don't want to upload and store those files in Supervisely, you can use the "Links" plugin to link the files to Supervisely server.

Instead of uploading actual files (i.e. images), you will need to upload .txt file(s) that contains a list of URLs to your files. If your URLs are publicly available (i.e. link looks like https://s3-us-west-2.amazonaws.com/test1/abc and you can open it in your web browser directly), then you can stop reading and start uploading.

If your files are protected, however, you will need to provide credentials in the instance settings or manually create configuration file.

Azure SAS Token minimal permissions

File system provider

If you want to use images from your host where supervisely instance running, you can use "File system" provider image

  • Folder path on the server - path to folder on the host server that will be mounted

  • Storage ID (bucket) - mouted folder identifyer. It will be used in links to mounted folder

For instance, for the example above, when you want to add a new assets (image or video) with local path on your hard drive /data/datasets/persons/image1.jpg, use the following format in API, SDK or corresponding application: fs://local-datasets/persons/image1.jpg

Manual configuration

If you are brave enough, you can create configuration files manually:

Example configuration file:

Links file structure:

Links file example:

Create a new file docker-compose.override.yml under cd $(sudo supervisely where):

Then execute the following to apply the changes:

Google Cloud Storage secret file example, docker-compose.override.yml:

Migrating existing projects to Cloud Storage

If you want to migrate only some of the projects that exist in the Supervisely storage to the linked cloud, you can achieve this using the following code snippet.

The code snippet:

  • Is designed to change links only for entities that are not linked yet, it means they are stored in Supervisely storage.

  • Will change links only when all entities are uploaded to remote storage.

  • Can be run again in case of failure. Will not re-upload entities that are already uploaded to remote storage.

  • Save nested datasets in remote storage as a flat structure. All datasets will be placed in the project directory.

  • Will not delete entities from Supervisely storage after migration.

Function to use in your code: migrate_project(project: Union[sly.ProjectInfo, int])

Remember to configure the REMOTE_BUCKET and MIGRATION_DIR constants in the code snippet before use.

Click to see the code snippet

If you need to keep the nested dataset structure in remote storage

You can modify the script to create nested directories in the remote storage. To do this, you need to change the remote path of the entity to include the dataset name. For that, you can replace api.dataset.get_tree(...) with api.dataset.get_list(...) and iterate over the tree. Then, you can modify the remote path of the entity to include the nested dataset ID.

If you have already uploaded entities to remote storage

You will be able just set remote links for them. There are two ways:

  1. To create your own entities_map, that corresponds to the structure used in code above and redefine in section Global Variables

  2. To use SDK API methods with the lists of entity IDs and remote links:

    • ImageApi(...).set_remote(...)

    • VideoApi(...).set_remote(...) For better performance, you can use the function sly.batched to split the list of entities and remote links into batches. It is recommended to create batches not more than 1000 items per batch.

Last updated