# Model Evaluation Benchmark

## Overview

After training a model, it is crucial to evaluate it on a validation set to assess its performance. Supervisely provides you a built-in evaluation benchmark that **automatically** runs after each model training experiment. Our evaluation benchmark has a large set of evaluation tools, visualizations, and metrics to help you understand how well does your model actually performs.

**Supervisely offers the most advanced evaluation tool on the market!**

![Benchmark Dashboard](https://1080806899-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M4BHwRbuyIoH-xoF3Gv%2Fuploads%2Fgit-blob-9745814bc8c23952a063171a13730d83e2340876%2Fbenchmark-dashboard-75.png?alt=media)

### Supported Task Types

Evaluation Benchmark now supports the following **task types**:

* [Object Detection](https://docs.supervisely.com/neural-networks/model-evaluation-benchmark/object-detection)
* [Instance Segmentation](https://docs.supervisely.com/neural-networks/model-evaluation-benchmark/instance-segmentation)
* [Semantic Segmentation](https://docs.supervisely.com/neural-networks/model-evaluation-benchmark/semantic-segmentation)

Read detailed documentation for each task type by following the links above.

## How to Open Evaluation Results

After training, you can open the evaluation results either from the training session (application) where the model was trained or from the [**Experiments**](https://docs.supervisely.com/neural-networks/training/experiments) table in Supervisely. The second option is more convenient, because the Experiments table organizes all your training experiments in one place. To open the evaluation results, find the experiment you are interested in, and click on **Evaluation Report** link.

![Open Evaluation from Experiments](https://1080806899-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M4BHwRbuyIoH-xoF3Gv%2Fuploads%2Fgit-blob-1a000441536836864f2f599d980615d61e8f5ad2%2Fopen-evaluation-from-table-2.png?alt=media)

## Compare Evaluation Results

You can also compare different models that were trained on the same dataset side-by-side. This is also possible from the **Experiments** table. Select two or more experiments, and click on the **Compare** button in the dropdown menu.

![Compare Experiments](https://1080806899-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-M4BHwRbuyIoH-xoF3Gv%2Fuploads%2Fgit-blob-30e59f0b680b80c9a090f7d230df34881e765311%2Fcompare-training-metrics1.jpg?alt=media)

## Run Evaluation & Comparison Manually via Apps

The evaluation is run automatically after each training experiment if you did not disable it in the training configuration. However, you can also run the evaluation manually on any dataset or a model using the following app:

* [Evaluator for Model Benchmark](https://ecosystem.supervisely.com/apps/model-benchmark)

In this case, you need first to deploy your model manually using a corresponding serving app (e.g., **Serve YOLO**, **Serve DEIM**). Then launch the **Evaluator for Model Benchmark** app, select the model and the Ground Truth project, and click **Evaluate**. This will run your model on the selected dataset and generate the evaluation report with all the metrics and visualizations.
