neurobench.preprocessing

MFCC

class neurobench.preprocessing.mfcc.MFCCPreProcessor(sample_rate: int = 16000, n_mfcc: int = 40, dct_type: int = 2, norm: str = 'ortho', log_mels: bool = False, melkwargs: dict | None = None, device=None)[source]

Bases: NeuroBenchPreProcessor

Does MFCC computation on dataset using torchaudio.transforms.MFCC.

Call expects loaded .wav data and targets as a tuple (data, targets). Expects sample_rate to be the same for all samples in data.

__call__(dataset)[source]

Executes the MFCC computation on the dataset.

Parameters:

dataset (tuple) – A tuple of (data, targets).

Returns:

mfcc applied on data targets: targets from dataset

Return type:

results

__init__(sample_rate: int = 16000, n_mfcc: int = 40, dct_type: int = 2, norm: str = 'ortho', log_mels: bool = False, melkwargs: dict | None = None, device=None)[source]

Initialize pre-processor with any parameters needed.

Parameters:

args – Any arguments needed for pre-processing.

static dataset_validity_check(dataset)[source]

Checks if dataset is a tuple with length two.

Speech2Spikes

Speech2Spikes License Copyright © 2023 Accenture.

Speech2Spikes is made available under a proprietary license that permits using, copying, sharing, and making derivative works from Speech2Spikes and its source code for academics/non-commercial purposes only, as long as the above copyright notice and this permission notice are included in all copies of the software.

All distribution of Speech2Spikes in any form (source or executable), including any derivative works that you create or to which you contribute, must be under the terms of this license. You must inform recipients that any form of Speech2Spikes and its derivatives is governed by the terms of this license, and how they can obtain a copy of this license and a copy of the source code of Speech2Spikes. You may not attempt to alter or restrict the recipients’ rights in any form. If you are interested to use Speech2Spikes and/or develop derivatives for commercial purposes, licenses can be purchased from Accenture, please contact neuromorphic_inquiries@accenture.com for more information.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

You agree to indemnify and hold Accenture harmless from and against all liabilities, claims and suits and to pay all costs and expenses thereby incurred, including reasonable legal fees and court courts, arising out of, caused by or in any way connected with your use of Speech2Spikes.

The original code can be found at: https://github.com/Accenture/speech2spikes

class neurobench.preprocessing.speech2spikes.S2SPreProcessor(device=None, transpose=True, log_offset=1e-06)[source]

Bases: NeuroBenchPreProcessor

The SpikeEncoder class manages the conversion from raw audio into spikes and stores the required conversion parameters.

__call__(batch)[source]

Converts raw audio data to spikes using Speech2Spikes algorithm (https://doi.org/10.1145/3584954.3584995)

Parameters:

batch – A tuple of data and corresponding targets (data_tensor, targets)

Returns:

PyTorch int8 tensor of shape (batch, timesteps, …) targets: A tensor of corresponding targets.

Return type:

tensors

Todo

Add support for cumulative sum of features

__init__(device=None, transpose=True, log_offset=1e-06)[source]
Parameters:
  • device (torch.device, optional) – A torch.Device used by PyTorch for the computation. Defaults to None.

  • transpose (bool, optional) – Whether to transpose the input tensor before processing. If the input tensor is of shape (batch, channels, timesteps), this should be true.

  • log_offset (float, optional) – A small value added to the MelSpectrogram before log is applied

configure(threshold=1, **spec_kwargs)[source]

Allows the user to configure parameters of the S2S class and the MelSpectrogram transform from torchaudio.

Go to (https://pytorch.org/audio/main/generated/torchaudio.transforms.MelSpectrogram.html) for more information on the available transform parameters.

Parameters:
  • threshold (float) – The difference between the residual and signal that will be considered an increase or decrease. Defaults to 1.

  • **spec_kwargs – Keyword arguments passed to torchaudio’s MelSpectrogram.

neurobench.preprocessing.speech2spikes.tensor_to_events(batch, threshold=1, device=None)[source]

Converts a batch of continuous signals to binary spikes via delta modulation (https://en.wikipedia.org/wiki/Delta_modulation).

Parameters:
  • batch (Tensor) – PyTorch tensor of shape (…, timesteps).

  • threshold (float) – The difference between the residual and signal that will be considered an increase or decrease. Defaults to 1.

  • device (torch.device, optional) – A torch.Device used by PyTorch for the computation. Defaults to None.

Returns:

A PyTorch int8 tensor of events of shape (…, timesteps).

Return type:

Tensor

Todo

Add support for using multiple channels for polarity instead of signs.