src.features package#
Submodules#
src.features.features_utils module#
Utility classes, functions, and variables used in the feature extraction process
- class src.features.features_utils.BaseExtractor#
Bases:
object
Base feature extraction class, with some methods that are useful for all classes
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
src.features.melody_features module#
Utility classes, functions, and variables used in extracting melodic features
- class src.features.melody_features.ContourExtractor(my_notes: list[src.detect.midi_utils.Note])#
Bases:
BaseExtractor
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static huron_contour(pitches: list[int]) str #
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.melody_features.IntervalExtractor(my_notes: list[src.detect.midi_utils.Note])#
Bases:
BaseExtractor
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static modal_interval(intervals) int #
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.melody_features.MelodyChunkManager(extractor, mm: MelodyMaker, **kwargs)#
Bases:
BaseExtractor
For a given MelodyMaker instance, applies the given extractor to all chunks and averages the results
- NOTE_LB = 2#
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs)#
Applies all the functions in summary_funcs to each array of values from the base extractor
- class src.features.melody_features.PitchExtractor(my_notes: list[src.detect.midi_utils.Note])#
Bases:
BaseExtractor
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.melody_features.TonalityExtractor(my_notes: list[src.detect.midi_utils.Note])#
Bases:
BaseExtractor
- MAJ_PROFILE = [6.35, 2.23, 3.48, 2.33, 4.38, 4.09, 2.52, 5.19, 2.39, 3.66, 2.29, 2.88]#
- MIN_PROFILE = [6.33, 2.68, 3.52, 5.38, 2.6, 3.53, 2.54, 4.75, 3.98, 2.69, 3.34, 3.17]#
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- krumhansl_schmuckler(notes: list[str]) dict #
Implements the Krumhansl-Schmuckler key-finding algorithm.
Returns a dictionary of tonalities and corresponding correlation coefficients for a given list of notes.
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
src.features.rhythm_features module#
Utility classes, functions, and variables used in extracting rhythmic features
- class src.features.rhythm_features.Asynchrony(my_beats: Series, their_beats: DataFrame | Series)#
Bases:
BaseExtractor
Extracts various features relating to asynchrony of onsets.
Many of these features rely on the definitions established in the onsetsync package (Eerola & Clayton, 2023), and are ported to Python here with minimal changes.
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- extract_asynchronies(my_beats: Series, their_beats: DataFrame | Series) dict #
Extract asynchrony between an instrument of interest and all other instruments and calculate functions
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static groupwise_asynchronization(asynchronies: Series) float #
Extract the root-mean-square (RMS) of the pairwise asynchronizations.
- static mean_absolute_asynchrony(asynchronies: Series) float #
Extract the mean of all unsigned asynchrony values.
- static mean_pairwise_asynchrony(asynchronies: Series) float #
Extract the mean of all signed asynchrony values.
- mean_relative_asynchrony(my_beats, their_beats: Series | DataFrame) float #
Extract the mean position of an instrument’s onsets relative to the average position of the group
- static pairwise_asynchronization(asynchronies: Series) float #
Extract the standard deviation of the asynchronies of a pair of instruments.
Eerola & Clayton (2023) use the sample standard deviation rather than the population standard deviation, so we are required to set the correction term ddof in np.nanstd to 1 to correct this.
- Parameters:
asynchronies (np.array): the onset time differences between two instruments
- Returns:
float
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.rhythm_features.BeatUpbeatRatio(my_onsets, my_beats, clean_outliers: bool = True)#
Bases:
BaseExtractor
- HIGH_THRESH = 4#
Extract various features related to beat-upbeat ratios (BURs)
- LOW_THRESH = 0.25#
Extract various features related to beat-upbeat ratios (BURs)
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- extract_burs(my_onsets: array, my_beats: array, use_log_burs: bool = False) DataFrame #
Extracts beat-upbeat ratio (BUR) values from an array of onsets.
The beat-upbeat ratio is introduced in [1] as a concept for analyzing the individual amount of ‘swing’ in two consecutive eighth note beat durations. It is calculated simply by dividing the duration of the first, ‘long’ eighth note beat by the second, ‘short’ beat. A BUR value of 2 indicates ‘perfect’ swing, i.e. a triplet quarter note followed by a triplet eighth note, while a BUR of 1 indicates ‘even’ eighth note durations.
- Arguments:
my_onsets (np.array, optional): the array of raw onsets. my_beats (np.array, optional): the array of crotchet beat positions. use_log_burs (bool, optional): whether to use the log^2 of inter-onset intervals to calculate BURs,
as employed in [2]. Defaults to False.
- Returns:
np.array: the calculated BUR values
- References:
- [1]: Benadon, F. (2006). Slicing the Beat: Jazz Eighth-Notes as Expressive Microrhythm. Ethnomusicology,
50/1 (pp. 73-98).
- [2]: Corcoran, C., & Frieler, K. (2021). Playing It Straight: Analyzing Jazz Soloists’ Swing Eighth-Note
Distributions with the Weimar Jazz Database. Music Perception, 38(4), 372–385.
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.rhythm_features.IOIComplexity(my_onsets: array, downbeats: array, tempo: float, time_signature: int, bar_period: int = 4)#
Bases:
BaseExtractor
Extracts features relating to the complexity and density of inter-onset intervals.
- _bin_ioi(ioi: float) float #
Bins an IOI as a proportion of a quarter note at the given time signature
- _get_summary_dict() dict #
Gets summary variables for this feature
- alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']#
- bin_iois(my_onsets: array, downbeats: array) list #
Bins all IOIs within my_onsets according to the beats in downbeats
- col_names = ['bar_range', 'lz77', 'n_onsets']#
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- extract_complexity(binned_iois: array) Generator #
Extracts complexity scores for all inter-onset intervals in binned_iois
- fracs = [1, 0.5, 0.4166666666666667, 0.375, 0.3333333333333333, 0.25, 0.16666666666666666, 0.125, 0.08333333333333333, 0]#
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static lz77_compress(data: array, window_size: int = 4096) list #
Runs the LZ77 compression algorithm over the input data, with given window_size
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.rhythm_features.PhaseCorrection(my_beats: Series, their_beats: DataFrame | Series | None = None, order: int = 1, **kwargs)#
Bases:
BaseExtractor
Extract various features related to phase correction
- Args:
my_beats (pd.Series): onsets of instrument to model their_beats (pd.DataFrame | pd.Series, optional): onsets of other instrument(s), defaults to None order (int, optional): the order of the model to create, defaults to 1 (i.e. 1st-order model, no lagged terms) iqr_filter (bool, optional): whether to apply an iqr filter to data, defaults to False difference_iois (bool, optional): whether to take the first difference of IOI values, defaults to True
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- extract_model_coefficients() Generator #
Extracts coefficients from linear phase correction model and format them correctly
- format_array(arr: array, iqr_filter: bool | None = None, difference_iois: bool | None = None, standardize: bool | None = None) Series #
Applies formatting to a single array used in creating the model
- format_async_arrays(their_beats: Series | DataFrame | None, my_beats: Series) DataFrame #
Format our asynchrony columns
- generate_model(my_beats: Series, their_beats: DataFrame | Series | None) RegressionResultsWrapper #
Generate the phase correction linear regression model
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- shifter(arr: array) Generator #
Shift an input array by the required number of beats and return a generator
- truncate(my_beats, their_beats) tuple #
Truncates our input data between given low and high thresholds
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.rhythm_features.ProportionalAsynchrony(summary_df: DataFrame, my_instr_name: str, metre_col: str = 'metre_manual')#
Bases:
BaseExtractor
Extracts features relating to the proportional asynchrony between performers.
- LOWER_BOUND = 0.03125#
- REF_INSTR = 'drums'#
- UPPER_BOUND = 0.0625#
- static _extract_async_stats(mean_async: array, my_instr_name: str) dict #
Extracts asynchrony stats from all pairwise combinations of instruments and returns a dictionary
- _extract_proportional_durations(summary_df: DataFrame) Generator #
Extracts proportional beat values for all instruments
- _format_async_df(async_df: DataFrame) DataFrame #
Coerces asynchrony dataframe into correct format
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.rhythm_features.RollingIOISummaryStats(my_onsets: Series, downbeats, order: int = 4, **kwargs)#
Bases:
IOISummaryStats
Extracts the statistics in IOISummaryStatsExtractor on a rolling basis, window defaults to 4 bars length
- static binary_entropy(iois: Series) float #
Extract the Shannon entropy from an iterable
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- extract_rolling_statistics(my_onsets: Series, downbeats: array, **kwargs) dict #
Extract rolling summary statistics across the given bar period
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static lempel_ziv_complexity(iois: Series) float #
Extract complexity from a binary sequence using Lempel-Ziv compression algorithm,
- static npvi(iois: Series) float #
Extract the normalised pairwise variability index (nPVI) from an iterable
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update our summary dictionary with values from this feature. Can be overridden!
- class src.features.rhythm_features.TempoSlope(my_beats: Series)#
Bases:
BaseExtractor
Extract features related to tempo slope, i.e. instantaneous tempo change (in beats-per-minute) per second
- static count_nonzero(x) int #
Simple wrapper around np.count_nonzero that removes NaN values from an array
- static extract_tempo_slope(my_beats: array, my_bpms: array) RegressionResultsWrapper | None #
Create the tempo slope regression model
- static get_between(arr, i1, i2) array #
From an array arr, get all onsets between an upper and lower bound i1 and i2 respectively
- static quantile25(x) float #
Simple wrapper around np.nanquantile with arguments set
- static quantile75(x) float #
Simple wrapper around np.nanquantile with arguments set
- static truncate_df(arr: DataFrame | Series, low: float, high: float, col: str | None = None, fill_nans: bool = False) DataFrame #
Truncate a dataframe or series between a low and high threshold.
- Args:
arr (pd.DataFrame | pd.Series): dataframe to truncate low (float): lower boundary for truncating high (float): upper boundary for truncating. Must be greater than low. col (str): array to use when truncating. Must be provided if isinstance(arr, pd.DataFrame) fill_nans (bool, optional): whether to replace values outside low and high with np.nan
- Raises:
AssertionError: if high < low
- Returns:
pd.DataFrame
- update_summary_dict(array_names, arrays, *args, **kwargs) None #
Update the summary dictionary with tempo slope and drift coefficients
- src.features.rhythm_features.get_beats_from_matched_onsets(summary_dict: dict) DataFrame #
Gets mean beat timestamps from a summary_dict marked by more than two instruments in the trio
src.features.simulations_utils module#
Classes used for creating ensemble coordination simulations from the phase correction model
- class src.features.simulations_utils.Simulation(params_dict, n_beats: int = 100, tempo: int = 120)#
Bases:
object
Creates a single simulated performance with given params_dict
- static _format_dict(python_dict: dict) Dict #
Converts a Python dictionary into a type that can be utilised by Numba
- _get_async_cls() Asynchrony #
Gets all src.features.features_utils.Asynchrony classes for all instruments
- _get_async_rms() float #
Gets root-mean-square of all pairwise asynchrony values
- _get_bpm_values()#
Gets beats-per-minute values from the simulation dataframe
- _get_initial_data(init_instr: str) Dict #
Gets initial starter data for use when creating the simulation
- static _simulation_dispatcher(data_: tuple, params_: tuple) tuple #
Creates one simulated performance, optimized with numba
- run_simulation()#
Dispatcher function for a single simulation
- starting_onset = 0#
- class src.features.simulations_utils.SimulationManager(coupling_params, tempo: int = 120, n_sims: int = 500, n_beats: int = 100, n_jobs: int = -1)#
Bases:
object
Manager for creating and handling multiple Simulation instances.
- backend = 'threads'#
- get_mean_bpm() Series #
Returns average BPM value of all simulations in this simulation manager
- get_mean_rms() float #
Returns average RMS asynchrony value from all simulations in this simulation manager
- get_rms_values() array #
Returns all RMS asynchrony values from all simulations in this simulation manager
- run_simulations()#
Runs all simulations and returns the SimulationManager instance
- verbosity = 5#