src.detect package#
Submodules#
src.detect.midi_utils module#
Utility classes, functions, and variables when working with MIDI files.
- class src.detect.midi_utils.Interval(firstnote: Note, secondnote: Note)#
Bases:
object
Used to extract info from two Note objects occurring separately
- class src.detect.midi_utils.MIDIMaker(item: dict, **kwargs)#
Bases:
object
Create MIDI for a single instrument (defaults to piano)
- INSTR = 'piano'#
- convert_to_midi() dict #
Convert processed audio into MIDI
- finalize_output(dirpath: str | None = None, filename: str = 'piano_midi.mid') None #
Finalize output by saving processed MIDI into the correct directory
- static pitch_correction(audio: array) array #
Pitch-shift given audio to A=440 Hz
- preprocess_audio(filter_audio: bool = False, pitch_correction: bool = True) array #
Preprocess audio by filtering and/or applying pitch correction
- class src.detect.midi_utils.MelodyMaker(midi_fpath: str, beats: array, downbeats: array, tempo: float, time_signature: int)#
Bases:
object
Extracts melody from MIDI using skyline algorithm, and also provides functions for chunking into measures
- MIDDLE_C = 60#
- SHORTEST_RHYTHM = 0.015625#
- TIME_THRESH = 0.01#
- static _extract_highest_note(notes: list[pretty_midi.containers.Note]) Generator[Note, None, None] #
- static _quantize_notes_in_beat(beat1: float, beat2: float, notes: list[pretty_midi.containers.Note], num_ticks: int = 8) Generator[Note, None, None] #
Quantize notes within a beat to the nearest 64th note (default)
- _remove_iois_below_threshold(notes: list[pretty_midi.containers.Note]) list[pretty_midi.containers.Note] #
- _remove_pitches_below_threshold(notes: list[pretty_midi.containers.Note]) list[pretty_midi.containers.Note] #
- chunk_melody(notes: list[src.detect.midi_utils.Note | src.detect.midi_utils.Interval] | None = None, chunk_measures: int = 4, overlapping_chunks: bool = True) list[tuple[src.detect.midi_utils.Note]] #
Chunks a melody into slices, corresponding to a number of measures (consecutive chunks can be overlapping)
- extract_intervals(melody_notes: list[src.detect.midi_utils.Note]) Generator[Interval, None, None] #
Extracts intervals from a sequence of melody notes
- extract_melody()#
Applies skyline algorithm to extract melody from MIDI
- load_midi(midi_fpath) Instrument #
- class src.detect.midi_utils.Note(note)#
Bases:
Note
Overrides pretty_midi.Note with a few additional properties
- property duration#
- get_duration()#
Get the duration of the note in seconds.
- src.detect.midi_utils.group_onsets(onsets: ~numpy.array, window: float = 0.05, keep_func: callable = <function amin>) array #
Group near-simultaneous onsets within a given window.
- Parameters:
onsets (np.array): the array of onsets to group window (float, optional): the window to use for grouping, defaults to 0.05 seconds keep_func (callable, optional): the function used to select an onset to keep from the group, defaults to np.min
- Returns:
np.array: the grouped array
- Examples:
>>> x = np.array([0.01, 0.05, 0.06, 0.07, 0.96, 1.00, 1.05, 1.06, 1.06]) >>> group_onsets(x) np.array([0.01, 0.07, 0.96, 1.05])
>>> x = np.array([0.01, 0.05, 0.06, 0.07, 0.96, 1.00, 1.05, 1.06, 1.06]) >>> group_onsets(x, keep_func=np.mean) np.array([0.04 , 0.07 , 0.98 , 1.055])
src.detect.onset_utils module#
Utility classes, functions, and variables used in the onset detection process.
- class src.detect.onset_utils.ClickTrackMaker(audio: array, **kwargs)#
Bases:
object
- clicks_from_onsets(freq, onsets, **kwargs) array #
Renders detected onsets to a click sound with a given frequency
- generate_audio(onsets_list: list[numpy.array]) array #
Renders detected onsets to a click sound and combines with the original audio.
Takes in a list of reference onset arrays, converts these to audible clicks, applies a bandpass filter (to make telling different onsets apart easier), filters the original audio to the frequencies considered when detecting onsets, then combines filtered original audio + click to a new audio track.
- Arguments:
onsets_list (list[np.array]): a list containing arrays of detected onsets
- Returns:
np.array: the click audio track that can be rendered to a file using soundfile, Librosa, etc.
- order = 20#
- class src.detect.onset_utils.OnsetMaker(item: dict | None = None, **kwargs)#
Bases:
object
Automatically detect onset and beat positions for each instrument in a single item in the corpus.
- _get_channel_override_fpath(name: str, fpath: str) str #
Gets the filepath for an item, with any channel overrides specified.
For instance, if we wish to use only the left channel for the double bass (and have specified “bass”: “l” in the “channel_overrides” dictionary for this item in the corpus), this function will return the correct filepath pointing to the source-separated left channel file.
- Arguments:
name (str): the name of the instrument fpath (str): the default filepath for the item (i.e. stereo audio)
- Returns:
str: the overriden filepath if this is required and present locally, or the default (stereo) filepath if not
- _load_audio(**kwargs) dict #
Loads audio as a time-series array for all instruments + the raw mix.
Wrapper around librosa.load_audio, called when class instance is constructed in order to generate audio for all instruments in required format. Keyword arguments are passed on to .load_audio
- Arguments:
**kwargs: passed to librosa.load_audio
- Return:
dict: each key-value pair corresponds to the loaded audio for one instrument, as an array
- Raises:
UserWarning: when a greater portion of a track than given in OnsetMaker.silence_threshold is silent
- beat_track_rnn(starting_min: int = 100, starting_max: int = 300, use_nonoptimised_defaults: bool = False, audio_start: int = 0, audio_cutoff: int | None = None, passes: int = 1, **kwargs) array #
Tracks the position of crotchet beats in the full mix of a track using recurrent neural networks.
Wrapper around RNNDownBeatProcessor’ and ‘DBNDownBeatTrackingProcessor from madmom.features.downbeat that allows for per-instrument defaults and multiple passes. A ‘pass’ refers to taking the detected crotchets from one run of the network, cleaning the results, extracting features from the cleaned array (e.g. minimum and maximum tempi), then creating a new network using these features and repeating the estimation process. This narrows down the range of tempo values that can be detected and increases the accuracy of detected crotchets substantially over a period of several passes.
- Arguments:
starting_min (int, optional): the minimum possible tempo (in BPM) to use for the first pass, defaults to 100 starting_max (int, optional): the maximum possible tempo (in BPM) to use for the first pass, defaults to 400 use_nonoptimised_defaults (bool, optional): use default parameters over optimised, defaults to False audio_start (int, optional): start reading audio from this point (in total seconds) audio_cutoff (int, optional): stop reading audio after this point (in total seconds) passes (int, optional): the number of passes of the processer to use, defaults to 1 **kwargs: passed to madmom.features.downbeat.DBNDownBeatTrackingProcessor
- Returns:
np.array: an array of detected crotchet beat positions from the final pass
- compare_downbeats(y_pred: array) dict #
Compares accuracy of downbeat detection
- compare_onset_detection_accuracy(ref: array | None = None, fname: str | None = None, onsets: array | None = None, audio_cutoff: int | None = None, window: float | None = None) dict #
Evaluates onset detection algorithm against reference onsets.
For every onset detected by an algorithm, attempt to match these to the nearest onset in a reference set (usually obtained from manual annotation). Then, construct a summary dictionary, containing summary statistics relating to the precision, recall, and accuracy of the detection. For more information on the evaluation procedure, see mir_eval.onset.f_measure.
At least one of ref or fname must be passed: ref must be an array of onset times, in seconds; fname must be a path to a text file containing onset times, with one onset per line. If both ref and fname are passed (don’t do this), ref will take priority.
- Arguments:
ref (np.array): an array of reference onsets (in seconds) to use for evaluation fname (str): the file path to a reference set of onsets, one onset per line onsets (np.array): an array of onsets, beats, etc. to use for evaluation window (float): the size of the window used for matching each onset to a reference audio_cutoff (int, optional): stop reading audio after this point (in total seconds)
- Yields:
dict: each dictionary contains summary statistics for one evaluation
- detection_note_values = {'left': 0.03125, 'right': 0.0625}#
- static extract_downbeats(beat_timestamps: array, beat_positions: array) tuple[numpy.array, numpy.array] #
Takes in arrays of beat onsets and bar positions and returns the downbeats of each bar
- finalize_output() None #
Finalizes the output by cleaning up leftover files and setting any final attributes
- static format_arg(val)#
- generate_click_track(instr: str, *args) None #
Renders detected onsets to a click sound and outputs, combined with the original audio.
- Arguments:
instr (str): the name of the instrument to render audio from *args (np.array): arrays of detected onsets to render to audio
- Returns:
None
- generate_matched_onsets_dictionary(beats: array, onsets_list: list[numpy.array] | None = None, instrs_list: list | None = None, **kwargs) dict #
Matches onsets from multiple instruments with crotchet beat positions and returns a dictionary.
Wrapper function for OnsetMaker.match_onsets_and_beats. onsets_list should be a list of arrays corresponding to onset positions tracked from multiple source-separated instruments. These will then be sent individually to OnsetMaker.match_onsets_and_beats and matched with the provided beats array, then returned as the values in a dictionary, where the keys are identifiers passed in instrs_list (or numerical values, if this iterable is not passed). Any
**kwargs
will be passed to OnsetMaker.match_onsets_and_beats.- Examples:
>>> om = OnsetMaker() >>> bea = np.array([0, 0.5, 1.0, 1.5]) >>> ons = [ >>> np.array([0.1, 0.6, 1.25, 1.55]), >>> np.array([0.05, 0.45, 0.95, 1.45]), >>> ] >>> instrs = ['instr1', 'instr2'] >>> print(om.generate_matched_onsets_dictionary( >>> beats=bea, onsets_list=ons, instrs_list=instrs, use_hard_threshold=True, threshold=0.1) >>> ) { 'beats': array([0. , 0.5, 1. , 1.5]), 'instr1': array([0.1 , 0.6 , nan, 1.55]), 'instr2': array([0.05, 0.45, 0.95, 1.45]) }
- Arguments:
beats (np.array): iterable containing crotchet beat positions, typically tracked from the full mix onsets_list (list[np.array]): iterable containing arrays of onset positions instrs_list (list[str]): iterable containing names of instruments **kwargs: arbitrary keyword arguments, passed to OnsetMaker.match_onsets_and_beats
- Returns:
dict: keys are instrument names, values are matched arrays
- Raises:
AttributeError: if neither onsets_list or instrs_list are passed
- static get_nonsilent_sections(aud: array, thresh: float = 1, **kwargs) array #
Returns the sections of a track which are not silent.
Wrapper function for librosa.effects.split that returns slices of a given audio track that are not silent. Slices are only considered not silent if their duration is above a reference threshold, given in seconds: this is to prevent the parsing of many small slices of audio.
- Arguments:
aud (np.array): array of audio, read in during construction of the OnsetMaker class thresh (float): value in seconds used when parsing slices **kwargs: arbitrary keyword arguments, passed to librosa.effects.split
- Returns:
np.array: rows corresponding to sections of non-silent audio
- static get_signal_to_noise_ratio(audio: array, axis: int = 0, ddof: int = 0) float #
- get_silent_track_percent(aud: array | None = None, silent: array | None = None, **kwargs) float #
Returns the fraction of a track which is silent.
- Arguments:
aud (np.array): array of audio, read in during construction of the OnsetMaker class silent (np.array): array of non-silent audio slices, returned from OnsetMaker.get_nonsilent_sections **kwargs: arbitrary keyword arguments, passed to OnsetMaker.get_nonsilent_sections
- Returns:
float: the fraction of a track which is silent, e.g. 1 == a completely silent track
- Raises:
AttributeError: if neither aud or silent are passed
- static get_spectral_flatness(audio: array) array #
- match_onsets_and_beats(beats: array, onsets: array | None = None, instr: str | None = None, use_hard_threshold: bool = False, detection_note_values: dict | None = None) array #
Matches event onsets with crotchet beat locations.
For every beat in the iterable beats, find the closest proximate onset in the iterable onsets, within a given window. If no onset can be found within this window, set the matched onset to NaN. Window type can either be a hard, fixed value by setting use_hard_threshold, or flexible and dependant on a particular rhythmic value within the underlying tempo (set using the detection_note_value class attribute). The latter option is recommended and used as a default, given that hard thresholds for matching onsets at one tempo may not be appropriate for other tempi.
- Examples:
>>> om = OnsetMaker() >>> bea = np.array([0, 0.5, 1.0, 1.5]) >>> ons = np.array([0.1, 0.6, 1.25, 1.55]) >>> print(om.match_onsets_and_beats(beats=bea, onsets=ons, use_hard_threshold=True, threshold=0.1)) np.array([0.1 0.6 nan 1.55])
>>> om = OnsetMaker() >>> om.tempo = 160 >>> bea = np.array([0, 0.5, 1.0, 1.5]) >>> ons = np.array([0.1, 0.6, 1.25, 1.55]) >>> print(om.match_onsets_and_beats(beats=bea, onsets=ons, use_hard_threshold=False)) np.array([nan nan nan 1.55])
- Arguments:
beats (np.ndarray): iterable containing crotchet beat positions, typically tracked from the full mix onsets (np.ndarray): iterable containing onset positions, typically tracked from a source separated file instr (str): the name of an instrument, to be used if onsets is not provided use_hard_threshold (bool): whether to use a hard or tempo-dependent (default) threshold for matching onsets detection_note_values (dict): dictionary of note values to use either side of crotchet beat, e.g. 1/32, 1/8
- Returns:
np.array: the matched onset array, with shape == len(beats)
- Raises:
AttributeError: if neither onsets or instr are provided
- onset_detect_cnn(instr: str, use_nonoptimised_defaults: bool = False, **kwargs)#
Wrapper around CNNOnsetProcessor from madmom package that allows custom peak picking parameters.
- Arguments:
instr (str): the name of the instrument to detect onsets in use_nonoptimised_defaults (bool, optional): whether to use default parameters, defaults to False **kwargs: additional keyword arguments passed to librosa.onset.onset_detect
- Returns:
np.array: the position of detected onsets, in seconds
- process_mixed_audio(generate_click: bool) None #
Process the raw audio mix, i.e. with all tracks together.
This is the central function for running processing on the mixed audio. It will generate an onset envelope, detect crotchets within it using both predominant local pulse estimation and recurrent neural networks, compare the detections to a reference file (if this exists), and generate a click track (if this is required). This function should be called before OnsetMaker.process_separated_audio, to ensure that the crotchet beat positions are present before matching these to onsets detected in the source-separated tracks.
- Parameters:
generate_click (bool): whether to generate an audio click track
- process_separated_audio(generate_click: bool, remove_silence: bool = True) None #
Process the separated audio for all of our individual instruments (piano, bass, drums)
This is the central function for running processing on each source-separated audio file. It will generate an onset envelope, detect onsets within it, remove onsets from when the track was silent, compare the detections to a reference file (if this exists), generate a click track (if this is required), and match the detected onsets to the nearest crotchet beat. This function must be called AFTER OnsetMaker.process_mixed_audio, to ensure that the crotchet beat positions have been detected correctly in the raw audio mix.
- Parameters:
generate_click (bool): whether to generate an audio click track remove_silence (bool): whether to remove onsets from portions of a track deemed to be silent by librosa
- remove_onsets_in_silent_passages(onsets: array, instr: str | None = None, silent: array | None = None, **kwargs) array #
Removes onsets if they occurred during a portion of a track which was silent.
For a given array of event onsets and a given array of non-silent audio slice timestamps, returns only those onsets which occurred during a portion of an audio track deemed not to be silent. This prevents any spurious onsets detected by Librosa from being included in an analysis.
- Examples:
>>> om = OnsetMaker() >>> non_silent = np.array( >>> [ >>> [0, 5], >>> [10, 15] >>> ] >>> ) >>> ons_ = np.array([0.1, 0.6, 5.5, 12.5, 17.5]) >>> print(om.remove_onsets_in_silent_passages(onsets=ons_, silent=non_silent)) array([0.1, 0.6, 12.5])
- Arguments:
onsets (np.array): an array of event onsets instr (str): the name of an instrument silent (np.array): an array of non-silent audio slices, returned from OnsetMaker.get_nonsilent_sections **kwargs: arbitrary keyword arguments, passed to OnsetMaker.get_nonsilent_sections
- Returns:
np.array: an array of onset timestamps with those occurring during a silent slice removed
- Raises:
AttributeError: if neither silent or instr are passed
- return_converged_parameters_cnn()#
- save_annotations(dirpath: str | None = None)#
Saves all annotations from a given OnsetMaker instance inside their own folder
- top_db = {'bass': 30, 'drums': 60, 'piano': 40}#
- window = 0.05#
- src.detect.onset_utils.bandpass_filter(audio: array, lowcut: int, highcut: int, order: int = 30, pad_len: float = 1.0, fade_dur: float = 0.5, sample_rate: float = 44100) array #
Applies a bandpass filter with given low and high cut frequencies to an audio signal.
- Arguments:
audio (np.array): the audio array to filter lowcut (int): the lower frequency to filter highcut (int): the higher frequency to filter order (int): the sharpness of the filter, defaults to 30 pad_len (float): the number of seconds to pad the audio by, defaults to 1 fade_dur (float): the length of time to fade the audio in and out by sample_rate (float): sample rate to use for processing audio, defaults to project default (44100)
- Returns:
np.array: the filtered audio array
- src.detect.onset_utils.calculate_tempo(pass_: ndarray) float #
Extract the average tempo from an array of times corresponding to crotchet beat positions
- src.detect.onset_utils.create_silent_clicktrack(csvpath: str, outputdir: str = 'c:\\python projects\\jazz-corpus-analysis/beats.wav', cutoff: int | None = None) None #
Creates a click track containing only clicks from csvpath, no source audio
src.detect.optimise_detection_parameters module#
Optimises the parameters used in onset detection by running a large-scale search using ground truth files
- class src.detect.optimise_detection_parameters.OptimizeBeatTrackRNN(items: dict, **kwargs)#
Bases:
Optimizer
Optimizes the OnsetMaker.beat_track_rnn function
- analyze_track(item: dict, **kwargs) dict #
Detect beats in one track using a given combination of parameters.
- args = [('threshold', <class 'float'>, 0, 1, 0.05), ('transition_lambda', <class 'float'>, 0, 500, 5), ('passes', <class 'int'>, 1, 5, 3)]#
- audio_cutoff = 60#
- csv_name: str = ''#
- static enable_logger() Logger #
- get_f_score(onsetmaker) float #
Returns F-score between detected onsets and manual annotation file
- instr = 'mix'#
- joblib_backend = 'threading'#
- log_iteration(cached_ids: list, f_scores: list) None #
Log the results from a single iteration
- lookup_results_from_cache(params: dict) tuple[list, list] #
Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters
- n_jobs = 1#
- objective_function(x: ndarray, _) float #
Objective function for maximising F-score of detected onsets
- return_kwargs(x: ndarray) dict #
Formats arguments from NLopt into the required keyword argument format
- run_optimization() tuple[dict, float] #
Runs optimization in NLopt
- class src.detect.optimise_detection_parameters.OptimizeOnsetDetectCNN(items: dict, instr: str, **kwargs)#
Bases:
Optimizer
Optimizes the OnsetMaker.onset_detect_cnn function for a single instrument
- analyze_track(item: dict, **kwargs) dict #
Detect onsets in one track using a given combination of parameters.
- args = [('threshold', <class 'float'>, 0, 10, 0.54), ('smooth', <class 'float'>, 0, 2, 0.05), ('pre_avg', <class 'float'>, 0, 2, 0), ('post_avg', <class 'float'>, 0, 2, 0), ('pre_max', <class 'float'>, 0, 2, 0.01), ('post_max', <class 'float'>, 0, 2, 0.01)]#
- audio_cutoff = None#
- csv_name: str = ''#
- static enable_logger() Logger #
- fps = 100#
- get_f_score(onsetmaker) float #
Returns F-score between detected onsets and manual annotation file
- joblib_backend = 'threading'#
- log_iteration(cached_ids: list, f_scores: list) None #
Log the results from a single iteration
- lookup_results_from_cache(params: dict) tuple[list, list] #
Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters
- n_jobs = -1#
- objective_function(x: ndarray, _) float #
Objective function for maximising F-score of detected onsets
- return_kwargs(x: ndarray) dict #
Formats arguments from NLopt into the required keyword argument format
- run_optimization() tuple[dict, float] #
Runs optimization in NLopt
- class src.detect.optimise_detection_parameters.Optimizer(items: list[dict], instr: str, args: list[tuple], **kwargs)#
Bases:
object
Base class for non-linear optimization of parameters
- analyze_track(item: dict, **kwargs) dict #
Placeholder for analysis function, run in parallel; overridden in child classes
- audio_cutoff = None#
- csv_name = ''#
- static enable_logger() Logger #
- get_f_score(onsetmaker) float #
Returns F-score between detected onsets and manual annotation file
- joblib_backend = 'threading'#
- log_iteration(cached_ids: list, f_scores: list) None #
Log the results from a single iteration; overriden in child classes
- lookup_results_from_cache(params: dict) tuple[list, list] #
Returns lists of IDs and F-scores for tracks that have already been processed with this set of parameters
- n_jobs = -1#
- objective_function(x: ndarray, _) float #
Objective function for maximising F-score of detected onsets
- return_kwargs(x: ndarray) dict #
Formats arguments from NLopt into the required keyword argument format
- run_optimization() tuple[dict, float] #
Runs optimization in NLopt
- src.detect.optimise_detection_parameters.optimize_beat_tracking(tracks: list[dict], **kwargs) None #
Central function for optimizing onset detection across all reference tracks and instrument stems
- Arguments:
tracks (list[dict]): the metadata for the tracks to be used in optimization **kwargs: passed onto optimization class
- src.detect.optimise_detection_parameters.optimize_onset_detection_cnn(tracks: list[dict], **kwargs) None #
Central function for optimizing onset detection across all reference tracks and instrument stems
- Arguments:
tracks (list[dict]): the metadata for the tracks to be used in optimization **kwargs: passed onto optimization class
src.detect.process_dataset module#
Process note onsets and piano MIDI for every track in the corpus
- src.detect.process_dataset.process_item(corpus_item: dict, generate_click: bool) OnsetMaker #
Process one item from the corpus, used in parallel contexts (i.e. called with joblib.Parallel)