neurobench.benchmarks

Benchmark

class neurobench.benchmarks.benchmark.Benchmark(model, dataloader, preprocessors, postprocessors, metric_list)[source]

Bases: object

Top-level benchmark class for running benchmarks.

run(quiet=False, verbose: bool = False, dataloader=None, preprocessors=None, postprocessors=None, device=None)[source]

Runs batched evaluation of the benchmark.

Parameters:

dataloader (optional) – override DataLoader for this run.
preprocessors (optional) – override preprocessors for this run.
postprocessors (optional) – override postprocessors for this run.
quiet (bool, default=False) – If True, output is suppressed.
verbose (bool, default=False) – If True, metrics for each bach will be printed. If False (default), metrics are accumulated and printed after all batches are processed.
device (optional) – use device for this run (e.g. ‘cuda’ or ‘cpu’).

Returns:

A dictionary of results.

Return type:

results

Workload Metrics

class neurobench.benchmarks.workload_metrics.AccumulatedMetric[source]

Bases: object

Abstract class for a metric which must save state between batches.

compute()[source]

Compute the metric score using all accumulated data.

Returns:: the final accumulated metric.
Return type:: result

reset()[source]

Reset the metric state.

This is called when the benchmark is run again, e.g. on the FSCIL task the benchmark is run at the end of each session.

class neurobench.benchmarks.workload_metrics.COCO_mAP[source]

Bases: AccumulatedMetric

COCO mean average precision.

Measured for event data based on Perot2020, Supplementary B (https://arxiv.org/abs/2009.13436)

Skips first 0.5s of each sequence
Bounding boxes with diagonal size smaller than 60 pixels are ignored

compute()[source]: Compute COCO mAP using accumulated data.

reset()[source]: Reset metric state.

neurobench.benchmarks.workload_metrics.MSE(model, preds, data)[source]

Mean squared error of the model predictions.

Parameters:

model – A NeuroBenchModel.
preds – A tensor of model predictions.
data – A tuple of data and labels.

Returns:

Mean squared error.

Return type:

float

neurobench.benchmarks.workload_metrics.activation_sparsity(model, preds, data)[source]

Sparsity of model activations.

Calculated as the number of zero activations over the total number of activations, over all layers, timesteps, samples in data.

Parameters:

model – A NeuroBenchModel.
preds – A tensor of model predictions.
data – A tuple of data and labels.

Returns:

Activation sparsity.

Return type:

float

neurobench.benchmarks.workload_metrics.classification_accuracy(model, preds, data)[source]

Classification accuracy of the model predictions.

Parameters:

model – A NeuroBenchModel.
preds – A tensor of model predictions.
data – A tuple of data and labels.

Returns:

Classification accuracy.

Return type:

float

neurobench.benchmarks.workload_metrics.detect_activations_connections(model)[source]: Register hooks or other operations that should be called before running a benchmark.

neurobench.benchmarks.workload_metrics.number_neuron_updates(model, preds, data)[source]

Number of times each neuron type is updated.

Parameters:

model – A NeuroBenchModel.
preds – A tensor of model predictions.
data – A tuple of data and labels.

Returns:

key is neuron type, value is number of updates.

Return type:

dict

class neurobench.benchmarks.workload_metrics.r2[source]

Bases: AccumulatedMetric

R2 Score of the model predictions.

Currently implemented for 2D output only.

compute()[source]: Compute r2 score using accumulated data.

reset()[source]: Reset metric state.

neurobench.benchmarks.workload_metrics.sMAPE(model, preds, data)[source]

Symmetric mean absolute percentage error of the model predictions.

Parameters:

model – A NeuroBenchModel.
preds – A tensor of model predictions.
data – A tuple of data and labels.

Returns:

Symmetric mean absolute percentage error.

Return type:

float

class neurobench.benchmarks.workload_metrics.synaptic_operations[source]

Bases: AccumulatedMetric

Number of synaptic operations.

MACs for ANN ACs for SNN

compute()[source]

Compute the metric score using all accumulated data.

Returns:: the final accumulated metric.
Return type:: result

reset()[source]

Reset the metric state.

This is called when the benchmark is run again, e.g. on the FSCIL task the benchmark is run at the end of each session.

Static Metrics

neurobench.benchmarks.static_metrics.connection_sparsity(model)[source]

Sparsity of model connections between layers. Based on number of zeros in supported layers, other layers are not taken into account in the computation: Supported layers: Linear Conv1d, Conv2d, Conv3d RNN, RNNBase, RNNCell LSTM, LSTMBase, LSTMCell GRU, GRUBase, GRUCell

Parameters:: model – A NeuroBenchModel.
Returns:: Connection sparsity, rounded to 3 decimals.
Return type:: float

neurobench.benchmarks.static_metrics.footprint(model)[source]

Memory footprint of the model.

Parameters:: model – A NeuroBenchModel.
Returns:: Model size in bytes.
Return type:: float

neurobench.benchmarks.static_metrics.parameter_count(model)[source]

Number of parameters in the model.

Parameters:: model – A NeuroBenchModel.
Returns:: Number of parameters.
Return type:: int