Custom Metrics ====================== This guide explains how to create custom metrics using the NeuroBench framework. Metrics are categorized into two types: **Static Metrics** and **Workload Metrics**. Each type has a specific purpose and use case, and this document provides examples for defining metrics for each type. Metric Types ------------ The following metric types are available: - **Static Metrics**: Evaluate fixed properties of the model, such as the number of parameters. - **Workload Metrics**: Evaluate the model's performance during inference execution. By default, workload metrics will averaged across batched inference. Metrics like classification accuracy can be joined this way. Other workload metrics may depend on all data and inferences and cannot be averaged across batches, such as an R^2 score. These workload metrics should be *AccumulatedMetrics*, which is a subclass of workload metrics which stores performance over multiple batches and computes final metric values once data is processed. Static Metric Abstract Base Class ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. automodule:: neurobench.metrics.abstract.static_metric :members: :undoc-members: :show-inheritance: Workload Metric Abstract Base Class ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. automodule:: neurobench.metrics.abstract.workload_metric :members: WorkloadMetric :undoc-members: :exclude-members: requires_hooks Accumulated Metric Abstract Base Class ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. automodule:: neurobench.metrics.abstract.workload_metric :members: AccumulatedMetric :undoc-members: :exclude-members: requires_hooks Defining Custom Metrics ----------------------- Here’s how you can define your own metrics for each type: Static Metrics ^^^^^^^^^^^^^^ Static metrics are used to evaluate fixed properties of a model, such as the total number of parameters. Example: .. code-block:: python from neurobench.metrics.abstract import StaticMetric class ParameterCountMetric(StaticMetric): """ Metric to count the total number of parameters in a model. """ def __call__(self, model): return sum(p.numel() for p in model.parameters()) Workload Metrics ^^^^^^^^^^^^^^^^ Workload metrics (which are not Accumulated) evaluate the model’s performance for each batch of input data, and the result is averaged across batches. Example: .. code-block:: python from neurobench.metrics.abstract import WorkloadMetric class AccuracyMetric(WorkloadMetric): """ Metric to compute accuracy for a single batch. """ def __call__(self, model, preds, data): inputs, labels = data correct = (preds.argmax(dim=1) == labels).sum().item() return correct / len(labels) * 100 Accumulated Metrics ^^^^^^^^^^^^^^^^^^^ Accumulated metrics are a type of workload metric which stores performance information over multiple batches and compute a final result. These should be used when the metric should not be averaged across batches. Example: .. code-block:: python from neurobench.metrics.abstract import AccumulatedMetric class MaximumLossMetric(AccumulatedMetric): """ Metric to compute the maximum loss over multiple batches. """ def __init__(self): super().__init__() self.max_loss = 0.0 self.num_batches = 0 def __call__(self, model, preds, data): _, labels = data loss = loss_function(preds, labels).item() if loss > self.max_loss: self.max_loss = loss self.num_batches += 1 def compute(self): return self.max_loss def reset(self): self.max_loss = 0.0 self.num_batches = 0 Plugging Metrics into the Benchmark ^^^^^^^^^^^^^^^^^^^ The `Benchmark` class expects metrics to be passed as lists grouped by type. You also need to configure any `postprocessors` if required by your model. Here’s an example of integrating both built-in and custom metrics into the `Benchmark` class: Example: .. code-block:: python from neurobench.metrics import Benchmark from neurobench.postprocessors import ChooseMaxCount # Define datasets and the model model = Model() # Replace with your model instance test_set_loader = DataLoader() # Replace with your DataLoader instance # Define postprocessors (if applicable) postprocessors = [ChooseMaxCount()] # Define metrics static_metrics = [ParameterCountMetric] # Replace with your custom static metric. Do not initialize the classes. workload_metrics = [AccuracyMetric, MaximumLossMetric] # Replace with your custom workload and accumulated metrics. Do not initialize the classes. # Create the Benchmark instance benchmark = Benchmark( model=model, dataloader=test_set_loader, postprocessors=postprocessors, metrics=[static_metrics, workload_metrics] ) # Run the benchmark results = benchmark.run(verbose=True) # Access results print(results) Summary of Custom Metrics ^^^^^^^^^^^^^^^^^^^^^^^^^ The following table summarizes the available metric types: .. list-table:: Metric Types :header-rows: 1 * - Metric Type - Base Class - Key Methods - Use Case * - Static Metric - ``StaticMetric`` - ``__call__`` - Evaluate static properties of the model. * - Workload Metric - ``WorkloadMetric`` - ``__call__`` - Averages over batched inference. * - Accumulated Metric - ``AccumulatedMetric`` - ``__call__``, ``compute``, ``reset`` - Accumulates performance info over multiple batches.