Status and monitoring
Agent Status
Supervisely Agents have 3 possible statuses:
Running - The agent has established a proper connection with the server and can perform tasks.
Waiting - The agent is either not deployed or can't connect to the server.
Network Issue - The connection between the agent and the server is unstable.
This information is available at the main Cluster page and on the individual agent pages.
Here are the troubleshoot guides for the most common issues causing the latter two statuses:
Waiting
The agent is either not deployed or can't connect to the server. In case the agent is not deployed, please follow the deployment guide here.
In case the agent can't connect to the server, connect to the server via SSH and check the container logs.
There might be various reasons causing this:
Incorrect
SERVER_ADDRESS
- make sure that the address the agent is using for communication is reachable and correct.Page not found 404 - most likely caused by a 301 HTTP -> HTTPS redirect. Either change the SERVER_ADDRESS on the server or the SERVER_ADDRESS for the agent in the Start -> Team Cluster - Instructions -> Advanced settings
ECONNREFUSED - check if you need to use an HTTP_PROXY/HTTPS_PROXY
certificate verify failed - in case you're using a self-signed certificate you will need to combine your system CA certificates and your custom one and provide the path to the file in the advanced settings, CA cert section
Network issue
To debug Supervisely NET client you can use the following command:
Supervisely NET server is unreachable - make sure that the address and port are correct. Check your firewall settings so that it's not blocking the access
subnet conflict - the default Supervisely NET subnet is 10.8.0.1/16. If the server where the agent is deployed has a conflicting subnet then you either need to change other subnets or change Supervisely NET subnet. This setting can only be configured on the Supervisely server itself.
Agent Monitoring
Now you should see your node in the dashboard with the status "Running". It means that you can run tasks on that agent. You can click on the node name (generated automatically) to see the hardware info, "htop" and "nvidia-smi" (if available) commands output and other system information for node's health monitoring.
Open the node dashboard
Go to the "Cluster" page. Click on the name of the relevant node.
Node dashboard
General information
Local Agent storage information
Nvidia-smi
htop
All information refreshes automatically every few seconds.
Manage local Agent storage
Clean your data carefully. It is especially crucial for the NN weights. Before removing the NN weights from the folder check that all important models were uploaded to Supervisely Server.
User can clean cached images, stored NN weights and temporary task directory. Agent storage shows actual information about used space.
To clean some of the directories just choose the action you want.
Last updated