neurobench.benchmarks

Benchmark

class neurobench.benchmarks.Benchmark(model: NeuroBenchModel, dataloader: torch.utils.data.DataLoader | None, preprocessors: List[NeuroBenchPreProcessor | Callable[[Tuple[torch.Tensor, torch.Tensor]], Tuple[torch.Tensor, torch.Tensor]]] | None, postprocessors: List[NeuroBenchPostProcessor | Callable[[torch.Tensor], torch.Tensor]] | None, metric_list: List[List[Type[StaticMetric | WorkloadMetric]]])[source]

Bases: object

Top-level benchmark class for running benchmarks.

__init__(model: NeuroBenchModel, dataloader: torch.utils.data.DataLoader | None, preprocessors: List[NeuroBenchPreProcessor | Callable[[Tuple[torch.Tensor, torch.Tensor]], Tuple[torch.Tensor, torch.Tensor]]] | None, postprocessors: List[NeuroBenchPostProcessor | Callable[[torch.Tensor], torch.Tensor]] | None, metric_list: List[List[Type[StaticMetric | WorkloadMetric]]])[source]
Parameters:
  • model – A NeuroBenchModel.

  • dataloader – A PyTorch DataLoader.

  • preprocessors – A list of NeuroBenchPreProcessors or callable functions (e.g. lambda) with matching interfaces.

  • postprocessors – A list of NeuroBenchPostProcessors or callable functions (e.g. lambda) with matching interfaces.

  • metric_list – A list of lists of StaticMetric and WorkloadMetric classes of metrics to run. First item is StaticMetrics, second item is WorkloadMetrics.

run(quiet: bool = False, verbose: bool = False, dataloader: torch.utils.data.DataLoader | None = None, preprocessors: NeuroBenchPreProcessor | Callable[[Tuple[torch.Tensor, torch.Tensor]], Tuple[torch.Tensor, torch.Tensor]] | None = None, postprocessors: NeuroBenchPostProcessor | Callable[[torch.Tensor], torch.Tensor] | None = None, device: str | None = None) Dict[str, Any][source]

Runs batched evaluation of the benchmark.

Parameters:
  • dataloader (Optional) – override DataLoader for this run.

  • preprocessors (Optional) – override preprocessors for this run.

  • postprocessors (Optional) – override postprocessors for this run.

  • quiet (bool, default=False) – If True, output is suppressed.

  • verbose (bool, default=False) – If True, metrics for each bach will be printed. If False (default), metrics are accumulated and printed after all batches are processed.

  • device (Optional) – use device for this run (e.g. ‘cuda’ or ‘cpu’).

Returns:

A dictionary of results.

Return type:

Dict[str, Any]

save_benchmark_results(file_path: str, file_format: Literal['json', 'csv', 'txt'] = 'json') None[source]

Save benchmark results to a specified file in the chosen format.

Parameters:
  • file_path (str) – Path to the output file (excluding the extension). The method automatically appends the appropriate extension based on the chosen file format.

  • file_format (Literal["json", "csv", "txt"], default="json") –

    The format in which the results should be saved. Supported formats:

    • ”json”: Saves the results as a JSON file with formatted indentation.

    • ”csv”: Saves the results as a CSV file with keys as headers and values as the first row.

    • ”txt”: Saves the results as a plain text file with one key-value pair per line.

Raises:

ValueError – If the provided file_format is not one of the supported formats (“json”, “csv”, “txt”).

to_nir(dummy_input: torch.Tensor, filename: str, **kwargs) None[source]

Exports the model to the NIR (Neural Intermediate Representation) format.

Parameters:
  • dummy_input (torch.Tensor) – A sample input tensor that matches the input shape of the model. This is required for tracing the model during export.

  • filename (str) – The file path where the exported NIR file will be saved.

  • **kwargs – Additional keyword arguments passed to the export_to_nir function for customization during the export process.

Raises:

ValueError – If the installed version of snntorch is less than 0.9.0.

to_onnx(dummy_input: torch.Tensor, filename: str, **kwargs) None[source]

Exports the model to the ONNX (Open Neural Network Exchange) format.

Parameters:
  • dummy_input (torch.Tensor) – A sample input tensor that matches the input shape of the model. This tensor is required for tracing the model during the export process.

  • filename (str) – The file path where the ONNX model will be saved, including the .onnx extension.

  • **kwargs – Additional keyword arguments passed to the torch.onnx.export function for customization during the export process.

Raises:

RuntimeError – If an error occurs during the ONNX export process.

Workload Metrics

class neurobench.metrics.workload.ActivationSparsity[source]

Bases: WorkloadMetric

Sparsity of model activations.

Calculated as the number of zero activations over the total number of activations, over all layers, timesteps, samples in data.

__call__(model, preds, data)[source]

Compute activation sparsity.

Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

Returns:

Activation sparsity

Return type:

float

__init__()[source]

Initialize the ActivationSparsity metric.

class neurobench.metrics.workload.ActivationSparsityByLayer[source]

Bases: AccumulatedMetric

Sparsity layer-wise of model activations.

Calculated as the number of zero activations over the number of activations layer by layer, over all timesteps, samples in data.

__call__(model, preds, data)[source]

Compute activation sparsity layer by layer.

Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

Returns:

Activation sparsity

Return type:

float

__init__()[source]

Initialize the ActivationSparsityByLayer metric.

compute()[source]

Compute the activation sparsity layer by layer.

reset()[source]

Reset the metric.

class neurobench.metrics.workload.ClassificationAccuracy[source]

Bases: WorkloadMetric

Classification accuracy of the model predictions.

__call__(model, preds, data)[source]

Compute classification accuracy.

Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

Returns:

Classification accuracy

Return type:

float

__init__()[source]

Initialize the ClassificationAccuracy metric.

class neurobench.metrics.workload.CocoMap[source]

Bases: AccumulatedMetric

COCO mean average precision.

Measured for event data based on Perot2020, Supplementary B (https://arxiv.org/abs/2009.13436)
  • Skips first 0.5s of each sequence

  • Bounding boxes with diagonal size smaller than 60 pixels are ignored

__call__(model, preds, data)[source]

Accumulate predictions and ground truth detections over batches.

Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

Returns:

COCO mean average precision.

Return type:

float

__init__()[source]

Initialize the CocoMap metric.

Raises:

ImportError – If the metavision_ml and metavision_sdk_core packages are not installed on a supported platform.

compute()[source]

Compute COCO mAP using accumulated data.

Returns:

COCO mean average precision.

Return type:

float

reset()[source]

Reset metric state.

Clears all accumulated detections and reinitializes the CocoEvaluator.

class neurobench.metrics.workload.MSE[source]

Bases: WorkloadMetric

Mean squared error of the model predictions.

__call__(model, preds: torch.Tensor, data: torch.Tensor) float[source]

Compute mean squared error.

Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

Returns:

Mean squared error.

Return type:

float

__init__()[source]

Initialize the MSE metric.

class neurobench.metrics.workload.MembraneUpdates[source]

Bases: AccumulatedMetric

Membrane potential updates metric.

This metric computes the number of membrane potential updates occurring during the forward pass of the model. The updates are tracked per neuron, per layer.

__call__(model, preds, data)[source]

Accumulate the number of membrane updates for each model forward pass.

Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

Returns:

Number of membrane potential updates.

Return type:

float

__init__()[source]

Initialize the MembraneUpdates metric.

compute()[source]

Compute the total membrane updates normalized by the number of samples.

Returns:

Compute the total updates to each neuron’s membrane potential within the model, aggregated across all neurons and normalized by the number of samples processed.

Return type:

float

reset()[source]

Reset the metric state for a new evaluation.

class neurobench.metrics.workload.R2[source]

Bases: AccumulatedMetric

R2 Score of the model predictions.

Currently implemented for 2D output only.

__call__(model, preds, data)[source]
Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

Returns:

R2 Score.

Return type:

float

__init__()[source]

Initalize metric state.

Must hold memory of all labels seen so far.

compute()[source]

Compute r2 score using accumulated data.

reset()[source]

Reset metric state.

class neurobench.metrics.workload.SMAPE[source]

Bases: WorkloadMetric

Symmetric mean absolute percentage error of the model predictions.

__call__(model, preds: torch.Tensor, data: torch.Tensor) float[source]

Compute symmetric mean absolute percentage error.

Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

Returns:

Symmetric mean absolute percentage error.

Return type:

float

__init__()[source]

Initialize the SMAPE metric.

class neurobench.metrics.workload.SynapticOperations[source]

Bases: AccumulatedMetric

Number of synaptic operations.

This metric computes the number of Multiply-Accumulate operations (MACs) for Artificial Neural Networks (ANN) and Accumulation operations (ACs) for Spiking Neural Networks (SNN).

__call__(model, preds, data)[source]

Accumulate the Multiply-Accumulate (MAC) operations or Accumulation (AC) operations during the forward pass.

This method accumulates the operations based on the model’s connections, and differentiates between ANN (MACs) and SNN (ACs) operations based on the spiking activity.

Parameters:
  • model – A NeuroBenchModel.

  • preds – A tensor of model predictions.

  • data – A tuple of data and labels.

  • inputs – A tensor of model inputs.

Returns:

Multiply-accumulates.

Return type:

float

__init__()[source]

Initialize SynapticOperations metric.

compute()[source]

Compute the average number of operations per sample.

Returns:

A dictionary containing:

  • ”Effective_MACs”: The average MACs per sample.

  • ”Effective_ACs”: The average ACs per sample.

  • ”Dense”: The average total synaptic operations per sample.

Return type:

dict

reset()[source]

Reset the metric state for a new evaluation.

Clears all accumulated values for MAC, AC, synaptic operations, and the total number of samples.

Static Metrics

class neurobench.metrics.static.ConnectionSparsity[source]

Bases: StaticMetric

Sparsity of model connections between layers.

Based on number of zeros in supported layers, other layers are not taken into account in the computation: Supported layers: Linear Conv1d, Conv2d, Conv3d RNN, RNNBase, RNNCell LSTM, LSTMBase, LSTMCell GRU, GRUBase, GRUCell

__call__(model)[source]

Compute connection sparsity.

Parameters:

model – A NeuroBenchModel.

Returns:

Connection sparsity, rounded to 3 decimals.

Return type:

float

class neurobench.metrics.static.Footprint[source]

Bases: StaticMetric

A metric that counts the memory footprint of a model.

__call__(model)[source]

Count the memory footprint of a model.

Parameters:

model – A NeuroBenchModel.

Returns:

Memory footprint of the model.

Return type:

float

class neurobench.metrics.static.ParameterCount[source]

Bases: StaticMetric

A metric that counts the number of parameters in a model.

__call__(model)[source]

Count the number of parameters in a model.

Parameters:

model – A NeuroBenchModel.

Returns:

Number of parameters in the model.

Return type:

int