src.detect.detect_onsets module#

Automatically detects note and beat onsets in the source separated tracks for each item in the corpus

src.detect.detect_onsets.process_item(corpus_json_name: str, corpus_item: dict, generate_click: bool, item_queue) None#

Process one item from the corpus, used in parallel contexts (i.e. called with joblib.Parallel)

src.detect.detect_utils module#

Utility classes, functions, and variables used in the onset detection process.

class src.detect.detect_utils.ClickTrackMaker(audio: array, **kwargs)#

clicks_from_onsets(freq, onsets, **kwargs) array#

Renders detected onsets to a click sound with a given frequency

generate_audio(onsets_list: list[numpy.array]) array#

Renders detected onsets to a click sound and combines with the original audio.

Takes in a list of reference onset arrays, converts these to audible clicks, applies a bandpass filter (to make telling different onsets apart easier), filters the original audio to the frequencies considered when detecting onsets, then combines filtered original audio + click to a new audio track.


onsets_list (list[np.array]): a list containing arrays of detected onsets


np.array: the click audio track that can be rendered to a file using soundfile, Librosa, etc.

order = 20#
class src.detect.detect_utils.OnsetMaker(corpus_name: str = 'corpus_bill_evans', item: dict | None = None, **kwargs)#

Automatically detect onset and beat positions for each instrument in a single item in the corpus.

_get_channel_override_fpath(name: str, fpath: str) str#

Gets the filepath for an item, with any channel overrides specified.

For instance, if we wish to use only the left channel for the double bass (and have specified “bass”: “l” in the “channel_overrides” dictionary for this item in the corpus), this function will return the correct filepath pointing to the source-separated left channel file.


name (str): the name of the instrument fpath (str): the default filepath for the item (i.e. stereo audio)


str: the overriden filepath if this is required and present locally, or the default (stereo) filepath if not

_load_audio(**kwargs) dict#

Loads audio as a time-series array for all instruments + the raw mix.

Wrapper around librosa.load_audio, called when class instance is constructed in order to generate audio for all instruments in required format. Keyword arguments are passed on to .load_audio


**kwargs: passed to librosa.load_audio


dict: each key-value pair corresponds to the loaded audio for one instrument, as an array


UserWarning: when a greater portion of a track than given in OnsetMaker.silence_threshold is silent

beat_track_rnn(starting_min: int = 100, starting_max: int = 300, use_nonoptimised_defaults: bool = False, audio_start: int = 0, audio_cutoff: int = 0, passes: int = 1, **kwargs) array#

Tracks the position of crotchet beats in the full mix of a track using recurrent neural networks.

Wrapper around RNNDownBeatProcessor’ and ‘DBNDownBeatTrackingProcessor from madmom.features.downbeat that allows for per-instrument defaults and multiple passes. A ‘pass’ refers to taking the detected crotchets from one run of the network, cleaning the results, extracting features from the cleaned array (e.g. minimum and maximum tempi), then creating a new network using these features and repeating the estimation process. This narrows down the range of tempo values that can be detected and increases the accuracy of detected crotchets substantially over a period of several passes.


starting_min (int, optional): the minimum possible tempo (in BPM) to use for the first pass, defaults to 100 starting_max (int, optional): the maximum possible tempo (in BPM) to use for the first pass, defaults to 300 use_nonoptimised_defaults (bool, optional): use default parameters over optimised, defaults to False audio_start (int, optional): start reading audio from this point (in total seconds) audio_cutoff (int, optional): stop reading audio after this point (in total seconds) passes (int, optional): the number of passes of the processer to use, defaults to 1 **kwargs: passed to madmom.features.downbeat.DBNDownBeatTrackingProcessor


np.array: an array of detected crotchet beat positions from the final pass

compare_onset_detection_accuracy(ref: array | None = None, fname: str | None = None, instr: str | None = None, onsets: list[numpy.array] | None = None, onsets_name: list[str] | None = None, audio_cutoff: int | None = None, window: float | None = None, **kwargs) dict#

Evaluates onset detection algorithm against reference onsets.

For every onset detected by an algorithm, attempt to match these to the nearest onset in a reference set (usually obtained from manual annotation). Then, construct a summary dictionary, containing summary statistics relating to the precision, recall, and accuracy of the detection. For more information on the evaluation procedure, see mir_eval.onset.f_measure.

At least one of ref or fname must be passed: ref must be an array of onset times, in seconds; fname must be a path to a text file containing onset times, with one onset per line. If both ref and fname are passed (don’t do this), ref will take priority.


ref (np.array): an array of reference onsets (in seconds) to use for evaluation fname (str): the file path to a reference set of onsets, one onset per line instr (str): the name of an instrument or track onsets (list[np.array]): a list of arrays, each array should be the results from one algorithm onsets_name (list[str]): a list of names that should match with the algorithm results in onsets window (float): the size of the window used for matching each onset to a reference audio_cutoff (int, optional): stop reading audio after this point (in total seconds) **kwargs: additional key-value pairs passed to the returned summary dictionary


dict: each dictionary contains summary statistics for one evaluation

detection_note_values = {'left': 0.03125, 'right': 0.0625}#
static extract_downbeats(beat_timestamps: array, beat_positions: array) tuple[numpy.array, numpy.array]#

Takes in arrays of beat onsets and bar positions and returns the downbeats of each bar

finalize_output() None#

Finalizes the output by cleaning up leftover files and setting any final attributes

generate_click_track(instr: str, *args) None#

Renders detected onsets to a click sound and outputs, combined with the original audio.


instr (str): the name of the instrument to render audio from *args (np.array): arrays of detected onsets to render to audio



generate_matched_onsets_dictionary(beats: array, onsets_list: list[numpy.array] | None = None, instrs_list: list | None = None, **kwargs) dict#

Matches onsets from multiple instruments with crotchet beat positions and returns a dictionary.

Wrapper function for OnsetMaker.match_onsets_and_beats. onsets_list should be a list of arrays corresponding to onset positions tracked from multiple source-separated instruments. These will then be sent individually to OnsetMaker.match_onsets_and_beats and matched with the provided beats array, then returned as the values in a dictionary, where the keys are identifiers passed in instrs_list (or numerical values, if this iterable is not passed). Any **kwargs will be passed to OnsetMaker.match_onsets_and_beats.

>>> om = OnsetMaker()
>>> bea = np.array([0, 0.5, 1.0, 1.5])
>>> ons = [
>>>     np.array([0.1, 0.6, 1.25, 1.55]),
>>>     np.array([0.05, 0.45, 0.95, 1.45]),
>>> ]
>>> instrs = ['instr1', 'instr2']
>>> print(om.generate_matched_onsets_dictionary(
>>>     beats=bea, onsets_list=ons, instrs_list=instrs, use_hard_threshold=True, threshold=0.1)
>>> )
    'beats': array([0. , 0.5, 1. , 1.5]),
    'instr1': array([0.1 , 0.6 ,  nan, 1.55]),
    'instr2': array([0.05, 0.45, 0.95, 1.45])

beats (np.array): iterable containing crotchet beat positions, typically tracked from the full mix onsets_list (list[np.array]): iterable containing arrays of onset positions instrs_list (list[str]): iterable containing names of instruments **kwargs: arbitrary keyword arguments, passed to OnsetMaker.match_onsets_and_beats


dict: keys are instrument names, values are matched arrays


AttributeError: if neither onsets_list or instrs_list are passed

static get_nonsilent_sections(aud: array, thresh: float = 1, **kwargs) array#

Returns the sections of a track which are not silent.

Wrapper function for librosa.effects.split that returns slices of a given audio track that are not silent. Slices are only considered not silent if their duration is above a reference threshold, given in seconds: this is to prevent the parsing of many small slices of audio.


aud (np.array): array of audio, read in during construction of the OnsetMaker class thresh (float): value in seconds used when parsing slices **kwargs: arbitrary keyword arguments, passed to librosa.effects.split


np.array: rows corresponding to sections of non-silent audio

static get_signal_to_noise_ratio(audio: array, axis: int = 0, ddof: int = 0) float#
get_silent_track_percent(aud: array | None = None, silent: array | None = None, **kwargs) float#

Returns the fraction of a track which is silent.


aud (np.array): array of audio, read in during construction of the OnsetMaker class silent (np.array): array of non-silent audio slices, returned from OnsetMaker.get_nonsilent_sections **kwargs: arbitrary keyword arguments, passed to OnsetMaker.get_nonsilent_sections


float: the fraction of a track which is silent, e.g. 1 == a completely silent track


AttributeError: if neither aud or silent are passed

static get_spectral_flatness(audio: array) array#
match_onsets_and_beats(beats: array, onsets: array | None = None, instr: str | None = None, use_hard_threshold: bool = False, detection_note_values: dict | None = None) array#

Matches event onsets with crotchet beat locations.

For every beat in the iterable beats, find the closest proximate onset in the iterable onsets, within a given window. If no onset can be found within this window, set the matched onset to NaN. Window type can either be a hard, fixed value by setting use_hard_threshold, or flexible and dependant on a particular rhythmic value within the underlying tempo (set using the detection_note_value class attribute). The latter option is recommended and used as a default, given that hard thresholds for matching onsets at one tempo may not be appropriate for other tempi.

>>> om = OnsetMaker()
>>> bea = np.array([0, 0.5, 1.0, 1.5])
>>> ons = np.array([0.1, 0.6, 1.25, 1.55])
>>> print(om.match_onsets_and_beats(beats=bea, onsets=ons, use_hard_threshold=True, threshold=0.1))
np.array([0.1 0.6 nan 1.55])
>>> om = OnsetMaker()
>>> om.tempo = 160
>>> bea = np.array([0, 0.5, 1.0, 1.5])
>>> ons = np.array([0.1, 0.6, 1.25, 1.55])
>>> print(om.match_onsets_and_beats(beats=bea, onsets=ons, use_hard_threshold=False))
np.array([nan nan nan 1.55])

beats (np.ndarray): iterable containing crotchet beat positions, typically tracked from the full mix onsets (np.ndarray): iterable containing onset positions, typically tracked from a source separated file instr (str): the name of an instrument, to be used if onsets is not provided use_hard_threshold (bool): whether to use a hard or tempo-dependent (default) threshold for matching onsets detection_note_values (dict): dictionary of note values to use either side of crotchet beat, e.g. 1/32, 1/8


np.array: the matched onset array, with shape == len(beats)


AttributeError: if neither onsets or instr are provided

metre_from_annotated_downbeat(timestamps_arr: array) array#

Constructs an array of metre positions from a known downbeat and time signature

onset_detect(instr: str, aud: array | None = None, env: array | None = None, units: str = 'time', use_nonoptimised_defaults: bool = False, **kwargs) array#

Detects onsets in an audio signal.

Wrapper around librosa.onset.onset_detect that enables per-instrument defaults to be used. Arguments passed as kwargs should be accepted by librosa.onset.onset_detect, except for rms: set this to True to use a custom energy function when backtracking detected onsets to local minima. Other keyword arguments overwrite current per-instrument defaults.


instr (str): the name of the instrument to detect onsets in aud (np.array, optional): an audio time-series to detect onsets in env (np.array, optional): the envelope to use when detecting onsets units (str, optional): the units to return detected onsets in, defaults to ‘time’, use_nonoptimised_defaults (bool, optional): whether to use default parameters, defaults to False **kwargs: additional keyword arguments passed to librosa.onset.onset_detect


np.array: the position of detected onsets

onset_strength(instr: str, aud: array | None = None, use_nonoptimised_defaults: bool = False, **kwargs) array#

Generates an onset strength envelope for a given instrument

Wrapper around librosa.onset.onset_strength that allows for the use of per-instrument defaults. Any **kwargs should be accepted by this function, and can be passed to override optimised per-instrument defaults.


instr (str): the name of the instrument to generate an onset strength envelope for aud (np.array, optional): an audio time-series array to generate the envelope for use_nonoptimised_defaults (bool, optional): whether to use default parameters, defaults to False **kwargs: any additional keyword arguments must be accepted by librosa.onset.onset_strength


np.array: the onset strength envelope as an array

process_mixed_audio(generate_click: bool) None#

Process the raw audio mix, i.e. with all tracks together.

This is the central function for running processing on the mixed audio. It will generate an onset envelope, detect crotchets within it using both predominant local pulse estimation and recurrent neural networks, compare the detections to a reference file (if this exists), and generate a click track (if this is required). This function should be called before OnsetMaker.process_separated_audio, to ensure that the crotchet beat positions are present before matching these to onsets detected in the source-separated tracks.


generate_click (bool): whether to generate an audio click track

process_separated_audio(generate_click: bool, remove_silence: bool = True) None#

Process the separated audio for all of our individual instruments (piano, bass, drums)

This is the central function for running processing on each source-separated audio file. It will generate an onset envelope, detect onsets within it, remove onsets from when the track was silent, compare the detections to a reference file (if this exists), generate a click track (if this is required), and match the detected onsets to the nearest crotchet beat. This function must be called AFTER OnsetMaker.process_mixed_audio, to ensure that the crotchet beat positions have been detected correctly in the raw audio mix.


generate_click (bool): whether to generate an audio click track remove_silence (bool): whether to remove onsets from portions of a track deemed to be silent by librosa

remove_onsets_in_silent_passages(onsets: array, instr: str | None = None, silent: array | None = None, **kwargs) array#

Removes onsets if they occurred during a portion of a track which was silent.

For a given array of event onsets and a given array of non-silent audio slice timestamps, returns only those onsets which occurred during a portion of an audio track deemed not to be silent. This prevents any spurious onsets detected by Librosa from being included in an analysis.

>>> om = OnsetMaker()
>>> non_silent = np.array(
>>>     [
>>>         [0, 5],
>>>         [10, 15]
>>>     ]
>>> )
>>> ons_ = np.array([0.1, 0.6, 5.5, 12.5, 17.5])
>>> print(om.remove_onsets_in_silent_passages(onsets=ons_, silent=non_silent))
array([0.1, 0.6, 12.5])

onsets (np.array): an array of event onsets instr (str): the name of an instrument silent (np.array): an array of non-silent audio slices, returned from OnsetMaker.get_nonsilent_sections **kwargs: arbitrary keyword arguments, passed to OnsetMaker.get_nonsilent_sections


np.array: an array of onset timestamps with those occurring during a silent slice removed


AttributeError: if neither silent or instr are passed

return_converged_paramaters() tuple[dict, dict]#
silence_threshold = 0.3333333333333333#
top_db = {'bass': 30, 'drums': 60, 'piano': 40}#
window = 0.05#
src.detect.detect_utils.bandpass_filter(audio: array, lowcut: int, highcut: int, order: int = 30, pad_len: float = 1.0, fade_dur: float = 0.5) array#

Applies a bandpass filter with given low and high cut frequencies to an audio signal.


audio (np.array): the audio array to filter lowcut (int): the lower frequency to filter highcut (int): the higher frequency to filter order (int): the sharpness of the filter, defaults to 30 pad_len (float): the number of seconds to pad the audio by, defaults to 1 fade_dur (float): the length of time to fade the audio in and out by


np.array: the filtered audio array

src.detect.detect_utils.calculate_tempo(pass_: ndarray) float#

Extract the average tempo from an array of times corresponding to crotchet beat positions

src.detect.detect_utils.create_silent_clicktrack(csvpath: str, outputdir: str = 'c:\\python projects\\jazz-corpus-analysis/beats.wav', cutoff: int | None = None) None#

Creates a click track containing only clicks from csvpath, no source audio

src.detect.optimise_detection_parameters module#

Optimises the parameters used in onset detection by running a large-scale search using ground truth files

class src.detect.optimise_detection_parameters.OptimizeBeatTrack(json_name: str, items: dict, **kwargs)#

Bases: Optimizer

Optimizes the OnsetMaker.beat_track_rnn function

analyze_track(item: dict, **kwargs) dict#

Detect beats in one track using a given combination of parameters.

args = [('threshold', <class 'float'>, 0, 1, 0.05), ('transition_lambda', <class 'float'>, 0, 500, 5), ('passes', <class 'int'>, 1, 5, 2)]#
audio_cutoff = 60#
csv_name: str = ''#
static enable_logger() Logger#
get_f_score(onsetmaker) float#

Returns F-score between detected onsets and manual annotation file

instr = 'mix'#
joblib_backend = 'threading'#
log_iteration(cached_ids: list, f_scores: list) None#

Log the results from a single iteration

lookup_results_from_cache(params: dict) tuple[list, list]#

Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters

n_jobs = 1#
objective_function(x: ndarray, _) float#

Objective function for maximising F-score of detected onsets

return_kwargs(x: ndarray) dict#

Formats arguments from NLopt into the required keyword argument format

run_optimization() tuple[dict, float]#

Runs optimization in NLopt

class src.detect.optimise_detection_parameters.OptimizeOnsetDetect(json_name: str, items: dict, instr: str, **kwargs)#

Bases: Optimizer

Optimizes the OnsetMaker.onset_detect and OnsetMaker.onset_strength function for an instrument

analyze_track(item: dict, **kwargs) dict#

Detect onsets in one track using a given combination of parameters.

args = [('max_size', <class 'int'>, 1, 200, 5), ('wait', <class 'int'>, 0, 100, 5), ('delta', <class 'float'>, 0, 1, 0.05), ('pre_max', <class 'int'>, 0, 100, 5), ('post_max', <class 'int'>, 1, 100, 5), ('pre_avg', <class 'int'>, 0, 100, 5), ('post_avg', <class 'int'>, 1, 100, 5)]#
audio_cutoff = None#
csv_name: str = ''#
static enable_logger() Logger#
get_f_score(onsetmaker) float#

Returns F-score between detected onsets and manual annotation file

joblib_backend = 'threading'#
log_iteration(cached_ids: list, f_scores: list) None#

Log the results from a single iteration

lookup_results_from_cache(params: dict) tuple[list, list]#

Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters

n_jobs = -1#
objective_function(x: ndarray, _) float#

Objective function for maximising F-score of detected onsets

return_kwargs(x: ndarray) dict#

Formats arguments from NLopt into the required keyword argument format

run_optimization() tuple[dict, float]#

Runs optimization in NLopt

class src.detect.optimise_detection_parameters.Optimizer(json_name: str, items: list[dict], instr: str, args: list[tuple], **kwargs)#

Base class for non-linear optimization of parameters

analyze_track(item: dict, **kwargs) dict#

Placeholder for analysis function, run in parallel; overridden in child classes

audio_cutoff = None#
csv_name = ''#
static enable_logger() Logger#
get_f_score(onsetmaker) float#

Returns F-score between detected onsets and manual annotation file

joblib_backend = 'threading'#
log_iteration(cached_ids: list, f_scores: list) None#

Log the results from a single iteration; overriden in child classes

lookup_results_from_cache(params: dict) tuple[list, list]#

Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters

n_jobs = -1#
objective_function(x: ndarray, _) float#

Objective function for maximising F-score of detected onsets

return_kwargs(x: ndarray) dict#

Formats arguments from NLopt into the required keyword argument format

run_optimization() tuple[dict, float]#

Runs optimization in NLopt

src.detect.optimise_detection_parameters.optimize_beat_tracking(json_name: str, tracks: list[dict], **kwargs) None#

Central function for optimizing onset detection across all reference tracks and instrument stems


json_name (str): the name of the corpus we’re using tracks (list[dict]): the metadata for the tracks to be used in optimization **kwargs: passed onto optimization class

src.detect.optimise_detection_parameters.optimize_onset_detection(json_name: str, tracks: list[dict], **kwargs) None#

Central function for optimizing onset detection across all reference tracks and instrument stems


json_name (str): the name of the corpus we’re using tracks (list[dict]): the metadata for the tracks to be used in optimization **kwargs: passed onto optimization class

