src.features package#

Submodules#

src.features.features_utils module#

Utility classes, functions, and variables used in the feature extraction process

class src.features.features_utils.BaseExtractor#

Bases: object

Base feature extraction class, with some methods that are useful for all classes

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

src.features.melody_features module#

Utility classes, functions, and variables used in extracting melodic features

class src.features.melody_features.ContourExtractor(my_notes: list[src.detect.midi_utils.Note])#

Bases: BaseExtractor

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static huron_contour(pitches: list[int]) → str#

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.melody_features.IntervalExtractor(my_notes: list[src.detect.midi_utils.Note])#

Bases: BaseExtractor

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static modal_interval(intervals) → int#

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.melody_features.MelodyChunkManager(extractor, mm: MelodyMaker, **kwargs)#

Bases: BaseExtractor

For a given MelodyMaker instance, applies the given extractor to all chunks and averages the results

NOTE_LB = 2#

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs)#: Applies all the functions in summary_funcs to each array of values from the base extractor

class src.features.melody_features.PitchExtractor(my_notes: list[src.detect.midi_utils.Note])#

Bases: BaseExtractor

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.melody_features.TonalityExtractor(my_notes: list[src.detect.midi_utils.Note])#

Bases: BaseExtractor

MAJ_PROFILE = [6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88]#

MIN_PROFILE = [6.33, 2.68, 3.52, 5.38, 2.6, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17]#

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

krumhansl_schmuckler(notes: list[str]) → dict#

Implements the Krumhansl-Schmuckler key-finding algorithm.

Returns a dictionary of tonalities and corresponding correlation coefficients for a given list of notes.

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

src.features.rhythm_features module#

Utility classes, functions, and variables used in extracting rhythmic features

class src.features.rhythm_features.Asynchrony(my_beats: Series, their_beats: DataFrame | Series)#

Bases: BaseExtractor

Extracts various features relating to asynchrony of onsets.

Many of these features rely on the definitions established in the onsetsync package (Eerola & Clayton, 2023), and are ported to Python here with minimal changes.

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

extract_asynchronies(my_beats: Series, their_beats: DataFrame | Series) → dict#: Extract asynchrony between an instrument of interest and all other instruments and calculate functions

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static groupwise_asynchronization(asynchronies: Series) → float#: Extract the root-mean-square (RMS) of the pairwise asynchronizations.

static mean_absolute_asynchrony(asynchronies: Series) → float#: Extract the mean of all unsigned asynchrony values.

static mean_pairwise_asynchrony(asynchronies: Series) → float#: Extract the mean of all signed asynchrony values.

mean_relative_asynchrony(my_beats, their_beats: Series | DataFrame) → float#: Extract the mean position of an instrument’s onsets relative to the average position of the group

static pairwise_asynchronization(asynchronies: Series) → float#

Extract the standard deviation of the asynchronies of a pair of instruments.

Eerola & Clayton (2023) use the sample standard deviation rather than the population standard deviation, so we are required to set the correction term ddof in np.nanstd to 1 to correct this.

Parameters:: asynchronies (np.array): the onset time differences between two instruments
Returns:: float

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.rhythm_features.BeatUpbeatRatio(my_onsets, my_beats, clean_outliers: bool = True)#

Bases: BaseExtractor

HIGH_THRESH = 4#: Extract various features related to beat-upbeat ratios (BURs)

LOW_THRESH = 0.25#: Extract various features related to beat-upbeat ratios (BURs)

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

extract_burs(my_onsets: array, my_beats: array, use_log_burs: bool = False) → DataFrame#

Extracts beat-upbeat ratio (BUR) values from an array of onsets.

The beat-upbeat ratio is introduced in [1] as a concept for analyzing the individual amount of ‘swing’ in two consecutive eighth note beat durations. It is calculated simply by dividing the duration of the first, ‘long’ eighth note beat by the second, ‘short’ beat. A BUR value of 2 indicates ‘perfect’ swing, i.e. a triplet quarter note followed by a triplet eighth note, while a BUR of 1 indicates ‘even’ eighth note durations.

Arguments:

my_onsets (np.array, optional): the array of raw onsets. my_beats (np.array, optional): the array of crotchet beat positions. use_log_burs (bool, optional): whether to use the log^2 of inter-onset intervals to calculate BURs,

as employed in [2]. Defaults to False.

Returns:

np.array: the calculated BUR values

References:

[1]: Benadon, F. (2006). Slicing the Beat: Jazz Eighth-Notes as Expressive Microrhythm. Ethnomusicology,: 50/1 (pp. 73-98).
[2]: Corcoran, C., & Frieler, K. (2021). Playing It Straight: Analyzing Jazz Soloists’ Swing Eighth-Note: Distributions with the Weimar Jazz Database. Music Perception, 38(4), 372–385.

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.rhythm_features.IOIComplexity(my_onsets: array, downbeats: array, tempo: float, time_signature: int, bar_period: int = 4)#

Bases: BaseExtractor

Extracts features relating to the complexity and density of inter-onset intervals.

_bin_ioi(ioi: float) → float#: Bins an IOI as a proportion of a quarter note at the given time signature

_get_summary_dict() → dict#: Gets summary variables for this feature

alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']#

bin_iois(my_onsets: array, downbeats: array) → list#: Bins all IOIs within my_onsets according to the beats in downbeats

col_names = ['bar_range', 'lz77', 'n_onsets']#

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

extract_complexity(binned_iois: array) → Generator#: Extracts complexity scores for all inter-onset intervals in binned_iois

fracs = [1, 0.5, 0.4166666666666667, 0.375, 0.3333333333333333, 0.25, 0.16666666666666666, 0.125, 0.08333333333333333, 0]#

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static lz77_compress(data: array, window_size: int = 4096) → list#: Runs the LZ77 compression algorithm over the input data, with given window_size

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.rhythm_features.PhaseCorrection(my_beats: Series, their_beats: DataFrame | Series | None = None, order: int = 1, **kwargs)#

Bases: BaseExtractor

Extract various features related to phase correction

Args:: my_beats (pd.Series): onsets of instrument to model their_beats (pd.DataFrame | pd.Series, optional): onsets of other instrument(s), defaults to None order (int, optional): the order of the model to create, defaults to 1 (i.e. 1st-order model, no lagged terms) iqr_filter (bool, optional): whether to apply an iqr filter to data, defaults to False difference_iois (bool, optional): whether to take the first difference of IOI values, defaults to True

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

extract_model_coefficients() → Generator#: Extracts coefficients from linear phase correction model and format them correctly

format_array(arr: array, iqr_filter: bool | None = None, difference_iois: bool | None = None, standardize: bool | None = None) → Series#: Applies formatting to a single array used in creating the model

format_async_arrays(their_beats: Series | DataFrame | None, my_beats: Series) → DataFrame#: Format our asynchrony columns

generate_model(my_beats: Series, their_beats: DataFrame | Series | None) → RegressionResultsWrapper#: Generate the phase correction linear regression model

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

shifter(arr: array) → Generator#: Shift an input array by the required number of beats and return a generator

truncate(my_beats, their_beats) → tuple#: Truncates our input data between given low and high thresholds

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.rhythm_features.ProportionalAsynchrony(summary_df: DataFrame, my_instr_name: str, metre_col: str = 'metre_manual')#

Bases: BaseExtractor

Extracts features relating to the proportional asynchrony between performers.

LOWER_BOUND = 0.03125#

REF_INSTR = 'drums'#

UPPER_BOUND = 0.0625#

static _extract_async_stats(mean_async: array, my_instr_name: str) → dict#: Extracts asynchrony stats from all pairwise combinations of instruments and returns a dictionary

_extract_proportional_durations(summary_df: DataFrame) → Generator#: Extracts proportional beat values for all instruments

_format_async_df(async_df: DataFrame) → DataFrame#: Coerces asynchrony dataframe into correct format

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.rhythm_features.RollingIOISummaryStats(my_onsets: Series, downbeats, order: int = 4, **kwargs)#

Bases: IOISummaryStats

Extracts the statistics in IOISummaryStatsExtractor on a rolling basis, window defaults to 4 bars length

static binary_entropy(iois: Series) → float#: Extract the Shannon entropy from an iterable

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

extract_rolling_statistics(my_onsets: Series, downbeats: array, **kwargs) → dict#: Extract rolling summary statistics across the given bar period

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static lempel_ziv_complexity(iois: Series) → float#: Extract complexity from a binary sequence using Lempel-Ziv compression algorithm,

static npvi(iois: Series) → float#: Extract the normalised pairwise variability index (nPVI) from an iterable

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update our summary dictionary with values from this feature. Can be overridden!

class src.features.rhythm_features.TempoSlope(my_beats: Series)#

Bases: BaseExtractor

Extract features related to tempo slope, i.e. instantaneous tempo change (in beats-per-minute) per second

static count_nonzero(x) → int#: Simple wrapper around np.count_nonzero that removes NaN values from an array

static extract_tempo_slope(my_beats: array, my_bpms: array) → RegressionResultsWrapper | None#: Create the tempo slope regression model

static get_between(arr, i1, i2) → array#: From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively

static quantile25(x) → float#: Simple wrapper around np.nanquantile with arguments set

static quantile75(x) → float#: Simple wrapper around np.nanquantile with arguments set

static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) → DataFrame#

Truncate a dataframe or series between a low and high threshold.

Args:: arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
Raises:: AssertionError: if high < low
Returns:: pd.DataFrame

update_summary_dict(array_names, arrays, *args, **kwargs) → None#: Update the summary dictionary with tempo slope and drift coefficients

src.features.rhythm_features.get_beats_from_matched_onsets(summary_dict: dict) → DataFrame#: Gets mean beat timestamps from a summary_dict marked by more than two instruments in the trio

src.features.simulations_utils module#

Classes used for creating ensemble coordination simulations from the phase correction model

class src.features.simulations_utils.Simulation(params_dict, n_beats: int = 100, tempo: int = 120)#

Bases: object

Creates a single simulated performance with given params_dict

static _format_dict(python_dict: dict) → Dict#: Converts a Python dictionary into a type that can be utilised by Numba

_get_async_cls() → Asynchrony#: Gets all src.features.features_utils.Asynchrony classes for all instruments

_get_async_rms() → float#: Gets root-mean-square of all pairwise asynchrony values

_get_bpm_values()#: Gets beats-per-minute values from the simulation dataframe

_get_initial_data(init_instr: str) → Dict#: Gets initial starter data for use when creating the simulation

static _simulation_dispatcher(data_: tuple, params_: tuple) → tuple#: Creates one simulated performance, optimized with numba

run_simulation()#: Dispatcher function for a single simulation

starting_onset = 0#

class src.features.simulations_utils.SimulationManager(coupling_params, tempo: int = 120, n_sims: int = 500, n_beats: int = 100, n_jobs: int = -1)#

Bases: object

Manager for creating and handling multiple Simulation instances.

backend = 'threads'#

get_mean_bpm() → Series#: Returns average BPM value of all simulations in this simulation manager

get_mean_rms() → float#: Returns average RMS asynchrony value from all simulations in this simulation manager

get_rms_values() → array#: Returns all RMS asynchrony values from all simulations in this simulation manager

run_simulations()#: Runs all simulations and returns the SimulationManager instance

verbosity = 5#

src.features package#

Submodules#

src.features.features_utils module#

src.features.melody_features module#

src.features.rhythm_features module#

src.features.simulations_utils module#

Module contents#