src.detect package#

Submodules#

src.detect.midi_utils module#

Utility classes, functions, and variables when working with MIDI files.

class src.detect.midi_utils.Interval(firstnote: Note, secondnote: Note)#

Bases: object

Used to extract info from two Note objects occurring separately

class src.detect.midi_utils.MIDIMaker(item: dict, **kwargs)#

Bases: object

Create MIDI for a single instrument (defaults to piano)

INSTR = 'piano'#
convert_to_midi() dict#

Convert processed audio into MIDI

finalize_output(dirpath: str | None = None, filename: str = 'piano_midi.mid') None#

Finalize output by saving processed MIDI into the correct directory

static pitch_correction(audio: array) array#

Pitch-shift given audio to A=440 Hz

preprocess_audio(filter_audio: bool = False, pitch_correction: bool = True) array#

Preprocess audio by filtering and/or applying pitch correction

class src.detect.midi_utils.MelodyMaker(midi_fpath: str, beats: array, downbeats: array, tempo: float, time_signature: int)#

Bases: object

Extracts melody from MIDI using skyline algorithm, and also provides functions for chunking into measures

MIDDLE_C = 60#
SHORTEST_RHYTHM = 0.015625#
TIME_THRESH = 0.01#
static _extract_highest_note(notes: list[pretty_midi.containers.Note]) Generator[Note, None, None]#
static _quantize_notes_in_beat(beat1: float, beat2: float, notes: list[pretty_midi.containers.Note], num_ticks: int = 8) Generator[Note, None, None]#

Quantize notes within a beat to the nearest 64th note (default)

_remove_iois_below_threshold(notes: list[pretty_midi.containers.Note]) list[pretty_midi.containers.Note]#
_remove_pitches_below_threshold(notes: list[pretty_midi.containers.Note]) list[pretty_midi.containers.Note]#
chunk_melody(notes: list[src.detect.midi_utils.Note | src.detect.midi_utils.Interval] | None = None, chunk_measures: int = 4, overlapping_chunks: bool = True) list[tuple[src.detect.midi_utils.Note]]#

Chunks a melody into slices, corresponding to a number of measures (consecutive chunks can be overlapping)

extract_intervals(melody_notes: list[src.detect.midi_utils.Note]) Generator[Interval, None, None]#

Extracts intervals from a sequence of melody notes

extract_melody()#

Applies skyline algorithm to extract melody from MIDI

load_midi(midi_fpath) Instrument#
class src.detect.midi_utils.Note(note)#

Bases: Note

Overrides pretty_midi.Note with a few additional properties

property duration#
get_duration()#

Get the duration of the note in seconds.

src.detect.midi_utils.group_onsets(onsets: ~numpy.array, window: float = 0.05, keep_func: callable = <function amin>) array#

Group near-simultaneous onsets within a given window.

Parameters:

onsets (np.array): the array of onsets to group window (float, optional): the window to use for grouping, defaults to 0.05 seconds keep_func (callable, optional): the function used to select an onset to keep from the group, defaults to np.min

Returns:

np.array: the grouped array

Examples:
>>> x = np.array([0.01, 0.05, 0.06, 0.07, 0.96, 1.00, 1.05, 1.06, 1.06])
>>> group_onsets(x)
np.array([0.01, 0.07, 0.96, 1.05])
>>> x = np.array([0.01, 0.05, 0.06, 0.07, 0.96, 1.00, 1.05, 1.06, 1.06])
>>> group_onsets(x, keep_func=np.mean)
np.array([0.04 , 0.07 , 0.98 , 1.055])

src.detect.onset_utils module#

Utility classes, functions, and variables used in the onset detection process.

class src.detect.onset_utils.ClickTrackMaker(audio: array, **kwargs)#

Bases: object

clicks_from_onsets(freq, onsets, **kwargs) array#

Renders detected onsets to a click sound with a given frequency

generate_audio(onsets_list: list[numpy.array]) array#

Renders detected onsets to a click sound and combines with the original audio.

Takes in a list of reference onset arrays, converts these to audible clicks, applies a bandpass filter (to make telling different onsets apart easier), filters the original audio to the frequencies considered when detecting onsets, then combines filtered original audio + click to a new audio track.

Arguments:

onsets_list (list[np.array]): a list containing arrays of detected onsets

Returns:

np.array: the click audio track that can be rendered to a file using soundfile, Librosa, etc.

order = 20#
class src.detect.onset_utils.OnsetMaker(item: dict | None = None, **kwargs)#

Bases: object

Automatically detect onset and beat positions for each instrument in a single item in the corpus.

_get_channel_override_fpath(name: str, fpath: str) str#

Gets the filepath for an item, with any channel overrides specified.

For instance, if we wish to use only the left channel for the double bass (and have specified “bass”: “l” in the “channel_overrides” dictionary for this item in the corpus), this function will return the correct filepath pointing to the source-separated left channel file.

Arguments:

name (str): the name of the instrument fpath (str): the default filepath for the item (i.e. stereo audio)

Returns:

str: the overriden filepath if this is required and present locally, or the default (stereo) filepath if not

_load_audio(**kwargs) dict#

Loads audio as a time-series array for all instruments + the raw mix.

Wrapper around librosa.load_audio, called when class instance is constructed in order to generate audio for all instruments in required format. Keyword arguments are passed on to .load_audio

Arguments:

**kwargs: passed to librosa.load_audio

Return:

dict: each key-value pair corresponds to the loaded audio for one instrument, as an array

Raises:

UserWarning: when a greater portion of a track than given in OnsetMaker.silence_threshold is silent

beat_track_rnn(starting_min: int = 100, starting_max: int = 300, use_nonoptimised_defaults: bool = False, audio_start: int = 0, audio_cutoff: int | None = None, passes: int = 1, **kwargs) array#

Tracks the position of crotchet beats in the full mix of a track using recurrent neural networks.

Wrapper around RNNDownBeatProcessor’ and ‘DBNDownBeatTrackingProcessor from madmom.features.downbeat that allows for per-instrument defaults and multiple passes. A ‘pass’ refers to taking the detected crotchets from one run of the network, cleaning the results, extracting features from the cleaned array (e.g. minimum and maximum tempi), then creating a new network using these features and repeating the estimation process. This narrows down the range of tempo values that can be detected and increases the accuracy of detected crotchets substantially over a period of several passes.

Arguments:

starting_min (int, optional): the minimum possible tempo (in BPM) to use for the first pass, defaults to 100 starting_max (int, optional): the maximum possible tempo (in BPM) to use for the first pass, defaults to 400 use_nonoptimised_defaults (bool, optional): use default parameters over optimised, defaults to False audio_start (int, optional): start reading audio from this point (in total seconds) audio_cutoff (int, optional): stop reading audio after this point (in total seconds) passes (int, optional): the number of passes of the processer to use, defaults to 1 **kwargs: passed to madmom.features.downbeat.DBNDownBeatTrackingProcessor

Returns:

np.array: an array of detected crotchet beat positions from the final pass

compare_downbeats(y_pred: array) dict#

Compares accuracy of downbeat detection

compare_onset_detection_accuracy(ref: array | None = None, fname: str | None = None, onsets: array | None = None, audio_cutoff: int | None = None, window: float | None = None) dict#

Evaluates onset detection algorithm against reference onsets.

For every onset detected by an algorithm, attempt to match these to the nearest onset in a reference set (usually obtained from manual annotation). Then, construct a summary dictionary, containing summary statistics relating to the precision, recall, and accuracy of the detection. For more information on the evaluation procedure, see mir_eval.onset.f_measure.

At least one of ref or fname must be passed: ref must be an array of onset times, in seconds; fname must be a path to a text file containing onset times, with one onset per line. If both ref and fname are passed (don’t do this), ref will take priority.

Arguments:

ref (np.array): an array of reference onsets (in seconds) to use for evaluation fname (str): the file path to a reference set of onsets, one onset per line onsets (np.array): an array of onsets, beats, etc. to use for evaluation window (float): the size of the window used for matching each onset to a reference audio_cutoff (int, optional): stop reading audio after this point (in total seconds)

Yields:

dict: each dictionary contains summary statistics for one evaluation

detection_note_values = {'left': 0.03125, 'right': 0.0625}#
static extract_downbeats(beat_timestamps: array, beat_positions: array) tuple[numpy.array, numpy.array]#

Takes in arrays of beat onsets and bar positions and returns the downbeats of each bar

finalize_output() None#

Finalizes the output by cleaning up leftover files and setting any final attributes

static format_arg(val)#
generate_click_track(instr: str, *args) None#

Renders detected onsets to a click sound and outputs, combined with the original audio.

Arguments:

instr (str): the name of the instrument to render audio from *args (np.array): arrays of detected onsets to render to audio

Returns:

None

generate_matched_onsets_dictionary(beats: array, onsets_list: list[numpy.array] | None = None, instrs_list: list | None = None, **kwargs) dict#

Matches onsets from multiple instruments with crotchet beat positions and returns a dictionary.

Wrapper function for OnsetMaker.match_onsets_and_beats. onsets_list should be a list of arrays corresponding to onset positions tracked from multiple source-separated instruments. These will then be sent individually to OnsetMaker.match_onsets_and_beats and matched with the provided beats array, then returned as the values in a dictionary, where the keys are identifiers passed in instrs_list (or numerical values, if this iterable is not passed). Any **kwargs will be passed to OnsetMaker.match_onsets_and_beats.

Examples:
>>> om = OnsetMaker()
>>> bea = np.array([0, 0.5, 1.0, 1.5])
>>> ons = [
>>>     np.array([0.1, 0.6, 1.25, 1.55]),
>>>     np.array([0.05, 0.45, 0.95, 1.45]),
>>> ]
>>> instrs = ['instr1', 'instr2']
>>> print(om.generate_matched_onsets_dictionary(
>>>     beats=bea, onsets_list=ons, instrs_list=instrs, use_hard_threshold=True, threshold=0.1)
>>> )
{
    'beats': array([0. , 0.5, 1. , 1.5]),
    'instr1': array([0.1 , 0.6 ,  nan, 1.55]),
    'instr2': array([0.05, 0.45, 0.95, 1.45])
}
Arguments:

beats (np.array): iterable containing crotchet beat positions, typically tracked from the full mix onsets_list (list[np.array]): iterable containing arrays of onset positions instrs_list (list[str]): iterable containing names of instruments **kwargs: arbitrary keyword arguments, passed to OnsetMaker.match_onsets_and_beats

Returns:

dict: keys are instrument names, values are matched arrays

Raises:

AttributeError: if neither onsets_list or instrs_list are passed

static get_nonsilent_sections(aud: array, thresh: float = 1, **kwargs) array#

Returns the sections of a track which are not silent.

Wrapper function for librosa.effects.split that returns slices of a given audio track that are not silent. Slices are only considered not silent if their duration is above a reference threshold, given in seconds: this is to prevent the parsing of many small slices of audio.

Arguments:

aud (np.array): array of audio, read in during construction of the OnsetMaker class thresh (float): value in seconds used when parsing slices **kwargs: arbitrary keyword arguments, passed to librosa.effects.split

Returns:

np.array: rows corresponding to sections of non-silent audio

static get_signal_to_noise_ratio(audio: array, axis: int = 0, ddof: int = 0) float#
get_silent_track_percent(aud: array | None = None, silent: array | None = None, **kwargs) float#

Returns the fraction of a track which is silent.

Arguments:

aud (np.array): array of audio, read in during construction of the OnsetMaker class silent (np.array): array of non-silent audio slices, returned from OnsetMaker.get_nonsilent_sections **kwargs: arbitrary keyword arguments, passed to OnsetMaker.get_nonsilent_sections

Returns:

float: the fraction of a track which is silent, e.g. 1 == a completely silent track

Raises:

AttributeError: if neither aud or silent are passed

static get_spectral_flatness(audio: array) array#
match_onsets_and_beats(beats: array, onsets: array | None = None, instr: str | None = None, use_hard_threshold: bool = False, detection_note_values: dict | None = None) array#

Matches event onsets with crotchet beat locations.

For every beat in the iterable beats, find the closest proximate onset in the iterable onsets, within a given window. If no onset can be found within this window, set the matched onset to NaN. Window type can either be a hard, fixed value by setting use_hard_threshold, or flexible and dependant on a particular rhythmic value within the underlying tempo (set using the detection_note_value class attribute). The latter option is recommended and used as a default, given that hard thresholds for matching onsets at one tempo may not be appropriate for other tempi.

Examples:
>>> om = OnsetMaker()
>>> bea = np.array([0, 0.5, 1.0, 1.5])
>>> ons = np.array([0.1, 0.6, 1.25, 1.55])
>>> print(om.match_onsets_and_beats(beats=bea, onsets=ons, use_hard_threshold=True, threshold=0.1))
np.array([0.1 0.6 nan 1.55])
>>> om = OnsetMaker()
>>> om.tempo = 160
>>> bea = np.array([0, 0.5, 1.0, 1.5])
>>> ons = np.array([0.1, 0.6, 1.25, 1.55])
>>> print(om.match_onsets_and_beats(beats=bea, onsets=ons, use_hard_threshold=False))
np.array([nan nan nan 1.55])
Arguments:

beats (np.ndarray): iterable containing crotchet beat positions, typically tracked from the full mix onsets (np.ndarray): iterable containing onset positions, typically tracked from a source separated file instr (str): the name of an instrument, to be used if onsets is not provided use_hard_threshold (bool): whether to use a hard or tempo-dependent (default) threshold for matching onsets detection_note_values (dict): dictionary of note values to use either side of crotchet beat, e.g. 1/32, 1/8

Returns:

np.array: the matched onset array, with shape == len(beats)

Raises:

AttributeError: if neither onsets or instr are provided

onset_detect_cnn(instr: str, use_nonoptimised_defaults: bool = False, **kwargs)#

Wrapper around CNNOnsetProcessor from madmom package that allows custom peak picking parameters.

Arguments:

instr (str): the name of the instrument to detect onsets in use_nonoptimised_defaults (bool, optional): whether to use default parameters, defaults to False **kwargs: additional keyword arguments passed to librosa.onset.onset_detect

Returns:

np.array: the position of detected onsets, in seconds

process_mixed_audio(generate_click: bool) None#

Process the raw audio mix, i.e. with all tracks together.

This is the central function for running processing on the mixed audio. It will generate an onset envelope, detect crotchets within it using both predominant local pulse estimation and recurrent neural networks, compare the detections to a reference file (if this exists), and generate a click track (if this is required). This function should be called before OnsetMaker.process_separated_audio, to ensure that the crotchet beat positions are present before matching these to onsets detected in the source-separated tracks.

Parameters:

generate_click (bool): whether to generate an audio click track

process_separated_audio(generate_click: bool, remove_silence: bool = True) None#

Process the separated audio for all of our individual instruments (piano, bass, drums)

This is the central function for running processing on each source-separated audio file. It will generate an onset envelope, detect onsets within it, remove onsets from when the track was silent, compare the detections to a reference file (if this exists), generate a click track (if this is required), and match the detected onsets to the nearest crotchet beat. This function must be called AFTER OnsetMaker.process_mixed_audio, to ensure that the crotchet beat positions have been detected correctly in the raw audio mix.

Parameters:

generate_click (bool): whether to generate an audio click track remove_silence (bool): whether to remove onsets from portions of a track deemed to be silent by librosa

remove_onsets_in_silent_passages(onsets: array, instr: str | None = None, silent: array | None = None, **kwargs) array#

Removes onsets if they occurred during a portion of a track which was silent.

For a given array of event onsets and a given array of non-silent audio slice timestamps, returns only those onsets which occurred during a portion of an audio track deemed not to be silent. This prevents any spurious onsets detected by Librosa from being included in an analysis.

Examples:
>>> om = OnsetMaker()
>>> non_silent = np.array(
>>>     [
>>>         [0, 5],
>>>         [10, 15]
>>>     ]
>>> )
>>> ons_ = np.array([0.1, 0.6, 5.5, 12.5, 17.5])
>>> print(om.remove_onsets_in_silent_passages(onsets=ons_, silent=non_silent))
array([0.1, 0.6, 12.5])
Arguments:

onsets (np.array): an array of event onsets instr (str): the name of an instrument silent (np.array): an array of non-silent audio slices, returned from OnsetMaker.get_nonsilent_sections **kwargs: arbitrary keyword arguments, passed to OnsetMaker.get_nonsilent_sections

Returns:

np.array: an array of onset timestamps with those occurring during a silent slice removed

Raises:

AttributeError: if neither silent or instr are passed

return_converged_parameters_cnn()#
save_annotations(dirpath: str | None = None)#

Saves all annotations from a given OnsetMaker instance inside their own folder

top_db = {'bass': 30, 'drums': 60, 'piano': 40}#
window = 0.05#
src.detect.onset_utils.bandpass_filter(audio: array, lowcut: int, highcut: int, order: int = 30, pad_len: float = 1.0, fade_dur: float = 0.5, sample_rate: float = 44100) array#

Applies a bandpass filter with given low and high cut frequencies to an audio signal.

Arguments:

audio (np.array): the audio array to filter lowcut (int): the lower frequency to filter highcut (int): the higher frequency to filter order (int): the sharpness of the filter, defaults to 30 pad_len (float): the number of seconds to pad the audio by, defaults to 1 fade_dur (float): the length of time to fade the audio in and out by sample_rate (float): sample rate to use for processing audio, defaults to project default (44100)

Returns:

np.array: the filtered audio array

src.detect.onset_utils.calculate_tempo(pass_: ndarray) float#

Extract the average tempo from an array of times corresponding to crotchet beat positions

src.detect.onset_utils.create_silent_clicktrack(csvpath: str, outputdir: str = 'c:\\python projects\\jazz-corpus-analysis/beats.wav', cutoff: int | None = None) None#

Creates a click track containing only clicks from csvpath, no source audio

src.detect.optimise_detection_parameters module#

Optimises the parameters used in onset detection by running a large-scale search using ground truth files

class src.detect.optimise_detection_parameters.OptimizeBeatTrackRNN(items: dict, **kwargs)#

Bases: Optimizer

Optimizes the OnsetMaker.beat_track_rnn function

analyze_track(item: dict, **kwargs) dict#

Detect beats in one track using a given combination of parameters.

args = [('threshold', <class 'float'>, 0, 1, 0.05), ('transition_lambda', <class 'float'>, 0, 500, 5), ('passes', <class 'int'>, 1, 5, 3)]#
audio_cutoff = 60#
csv_name: str = ''#
static enable_logger() Logger#
get_f_score(onsetmaker) float#

Returns F-score between detected onsets and manual annotation file

instr = 'mix'#
joblib_backend = 'threading'#
log_iteration(cached_ids: list, f_scores: list) None#

Log the results from a single iteration

lookup_results_from_cache(params: dict) tuple[list, list]#

Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters

n_jobs = 1#
objective_function(x: ndarray, _) float#

Objective function for maximising F-score of detected onsets

return_kwargs(x: ndarray) dict#

Formats arguments from NLopt into the required keyword argument format

run_optimization() tuple[dict, float]#

Runs optimization in NLopt

class src.detect.optimise_detection_parameters.OptimizeOnsetDetectCNN(items: dict, instr: str, **kwargs)#

Bases: Optimizer

Optimizes the OnsetMaker.onset_detect_cnn function for a single instrument

analyze_track(item: dict, **kwargs) dict#

Detect onsets in one track using a given combination of parameters.

args = [('threshold', <class 'float'>, 0, 10, 0.54), ('smooth', <class 'float'>, 0, 2, 0.05), ('pre_avg', <class 'float'>, 0, 2, 0), ('post_avg', <class 'float'>, 0, 2, 0), ('pre_max', <class 'float'>, 0, 2, 0.01), ('post_max', <class 'float'>, 0, 2, 0.01)]#
audio_cutoff = None#
csv_name: str = ''#
static enable_logger() Logger#
fps = 100#
get_f_score(onsetmaker) float#

Returns F-score between detected onsets and manual annotation file

joblib_backend = 'threading'#
log_iteration(cached_ids: list, f_scores: list) None#

Log the results from a single iteration

lookup_results_from_cache(params: dict) tuple[list, list]#

Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters

n_jobs = -1#
objective_function(x: ndarray, _) float#

Objective function for maximising F-score of detected onsets

return_kwargs(x: ndarray) dict#

Formats arguments from NLopt into the required keyword argument format

run_optimization() tuple[dict, float]#

Runs optimization in NLopt

class src.detect.optimise_detection_parameters.Optimizer(items: list[dict], instr: str, args: list[tuple], **kwargs)#

Bases: object

Base class for non-linear optimization of parameters

analyze_track(item: dict, **kwargs) dict#

Placeholder for analysis function, run in parallel; overridden in child classes

audio_cutoff = None#
csv_name = ''#
static enable_logger() Logger#
get_f_score(onsetmaker) float#

Returns F-score between detected onsets and manual annotation file

joblib_backend = 'threading'#
log_iteration(cached_ids: list, f_scores: list) None#

Log the results from a single iteration; overriden in child classes

lookup_results_from_cache(params: dict) tuple[list, list]#

Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters

n_jobs = -1#
objective_function(x: ndarray, _) float#

Objective function for maximising F-score of detected onsets

return_kwargs(x: ndarray) dict#

Formats arguments from NLopt into the required keyword argument format

run_optimization() tuple[dict, float]#

Runs optimization in NLopt

src.detect.optimise_detection_parameters.optimize_beat_tracking(tracks: list[dict], **kwargs) None#

Central function for optimizing onset detection across all reference tracks and instrument stems

Arguments:

tracks (list[dict]): the metadata for the tracks to be used in optimization **kwargs: passed onto optimization class

src.detect.optimise_detection_parameters.optimize_onset_detection_cnn(tracks: list[dict], **kwargs) None#

Central function for optimizing onset detection across all reference tracks and instrument stems

Arguments:

tracks (list[dict]): the metadata for the tracks to be used in optimization **kwargs: passed onto optimization class

src.detect.process_dataset module#

Process note onsets and piano MIDI for every track in the corpus

src.detect.process_dataset.process_item(corpus_item: dict, generate_click: bool) OnsetMaker#

Process one item from the corpus, used in parallel contexts (i.e. called with joblib.Parallel)

Module contents#