src.analyse.analysis_utils

Utility functions, classes, and constants for analysis and modelling

Functions

`append_zoom_array`(perf_df, zoom_arr[, onset_col])	Appends a column to a dataframe showing the approx amount of latency by AV-Manip for each event in a performance
`average_bpms`(df1, df2[, window_size, elap, bpm])	Returns a list of averaged BPMs from two performance.
`create_model_list`(df, avg_groupers[, md])	Subset a dataframe of per-condition results and return a list of statsmodels regression outputs for use in a table.
`create_one_simulation`(keys_data, drms_data, ...)	Create data for one simulation, using numba optimisations.
`extract_event_density`(bpm, raw)	Appends a column to performance dataframe showing number of actual notes per extracted crotchet
`extract_interpolated_beats`(c)	Extracts the number of beats in the performance that required interpolation in REAPER.
`extract_npvi`(s)	Extracts the normalised pairwise variability index (nPVI) from a column of IOIs
`extract_pairwise_asynchrony`(keys_nn, drms_nn)	Extracts pairwise asynchrony from two matched dataframes.
`generate_df`(data[, iqr_range, threshold, ...])	Create dataframe from MIDI performance data, either cleaned (just crotchet beats) or raw.
`generate_tempo_slopes`(raw_data)	Returns average tempo slope coefficients for all performances as list of tuples in the form (trial, block, latency, jitter, avg.
`iqr_filter`(col, df[, iqr_range])	Filter duration values below a certain quartile to remove extraneous midi notes not cleaned in Reaper
`load_data`(input_filepath)	Loads all pickled data from the processed data folder
`load_from_disc`(output_dir[, filename])	Try and load models from disc
`log_model`(md[, logger])	Helper function to log metadata for a particular model in our GUI, if we've passsed a logger function
`log_simulation`(sim[, logger])	Helper function to log metadata for a particular simulation in our GUI, if we've passed a logger function
`reg_func`(df, xcol, ycol)	Calculates linear regression between two given columns, returns results table.
`resample`(perf[, func, col, resample_window, ...])	Resamples an individual performance dataframe to get mean of every second.
`return_average_coeffs`(coeffs)	Returns list of tuples containing average coefficient for keys/drums performance in a single trial Tuples take the form of those in generate_tempo_slopes, i.e. (trial, block, latency, jitter, avg.
`return_coeff_from_sm_output`(results)	Formats the table returned by statsmodel to return only the regression coefficient as an integer
`test_stationary`(array)	Tests if data is stationary, if not returns data with first difference calculated
`zip_same_conditions_together`(raw_data)	Iterates through raw data and zips keys/drums data from the same performance together Returns a list of zip objects, each element of which is a tuple containing

src.analyse.analysis_utils.append_zoom_array(perf_df: DataFrame, zoom_arr: array, onset_col: str = 'onset') → DataFrame: Appends a column to a dataframe showing the approx amount of latency by AV-Manip for each event in a performance

src.analyse.analysis_utils.average_bpms(df1: DataFrame, df2: DataFrame, window_size: int = 8, elap: str = 'elapsed', bpm: str = 'bpm') → DataFrame: Returns a list of averaged BPMs from two performance. Data is grouped by every second in a performance.

src.analyse.analysis_utils.create_model_list(df, avg_groupers: list, md='correction_partner_onset~C(latency)+C(jitter)+C(instrument)') → list: Subset a dataframe of per-condition results and return a list of statsmodels regression outputs for use in a table. By default, the regression will average results from the same condition across multiple measures. This can be overridden by setting the averaging argument to False.

src.analyse.analysis_utils.create_one_simulation(keys_data: Dict, drms_data: Dict, keys_params: Dict, drms_params: Dict, keys_noise, drms_noise, lat: ndarray, beats: int) → tuple: Create data for one simulation, using numba optimisations. This function is defined outside of the Simulation class to enable instances of the Simulation class to be pickled.

src.analyse.analysis_utils.extract_event_density(bpm: DataFrame, raw: DataFrame) → DataFrame: Appends a column to performance dataframe showing number of actual notes per extracted crotchet

src.analyse.analysis_utils.extract_interpolated_beats(c: array) → tuple[int, int]: Extracts the number of beats in the performance that required interpolation in REAPER. This was usually due to a performer ‘pushing’ ahead a crotchet beat by a swung quaver, or due to an implied metric modulation.

src.analyse.analysis_utils.extract_npvi(s: Series) → float: Extracts the normalised pairwise variability index (nPVI) from a column of IOIs

src.analyse.analysis_utils.extract_pairwise_asynchrony(keys_nn: DataFrame, drms_nn: DataFrame) → float

Extracts pairwise asynchrony from two matched dataframes.

Rasch (2015) defines pairwise asynchrony as as the root-mean-square of the standard deviations of the onset time differences for all pairs of voice parts. We can calculate this for each condition, using the nearest-neighbour model for both the keyboard and drummer.

src.analyse.analysis_utils.generate_df(data: array, iqr_range: tuple = (0.05, 0.95), threshold: float = 0, keep_pitch_vel: bool = False) → DataFrame: Create dataframe from MIDI performance data, either cleaned (just crotchet beats) or raw. Optional keyword arguments: iqr_range: Upper and lower quartile to clean IOI values by. threshold: Value to remove IOI timings below keep_pitch_vel: Keep pitch and velocity columns

src.analyse.analysis_utils.generate_tempo_slopes(raw_data: list) → list[tuple]: Returns average tempo slope coefficients for all performances as list of tuples in the form (trial, block, latency, jitter, avg. slope coefficient). Deprecated?

src.analyse.analysis_utils.iqr_filter(col: str, df: DataFrame, iqr_range: tuple = (0.05, 0.95)) → Series: Filter duration values below a certain quartile to remove extraneous midi notes not cleaned in Reaper

src.analyse.analysis_utils.load_data(input_filepath: str) → list: Loads all pickled data from the processed data folder

src.analyse.analysis_utils.load_from_disc(output_dir: str, filename: str = 'phase_correction_mds.p') → list: Try and load models from disc

src.analyse.analysis_utils.log_model(md, logger=None) → None: Helper function to log metadata for a particular model in our GUI, if we’ve passsed a logger function

src.analyse.analysis_utils.log_simulation(sim, logger=None) → None: Helper function to log metadata for a particular simulation in our GUI, if we’ve passed a logger function

src.analyse.analysis_utils.reg_func(df: DataFrame, xcol: str, ycol: str) → RegressionResults: Calculates linear regression between two given columns, returns results table. Deprecated.

src.analyse.analysis_utils.resample(perf: ~pandas.core.frame.DataFrame, func=<function nanmean>, col: str = 'my_onset', resample_window: str = '1s', interpolate: bool = True) → DataFrame: Resamples an individual performance dataframe to get mean of every second.

src.analyse.analysis_utils.return_average_coeffs(coeffs: list) → list[tuple]: Returns list of tuples containing average coefficient for keys/drums performance in a single trial Tuples take the form of those in generate_tempo_slopes, i.e. (trial, block, latency, jitter, avg. slope coefficient)

src.analyse.analysis_utils.return_coeff_from_sm_output(results: RegressionResults) → int: Formats the table returned by statsmodel to return only the regression coefficient as an integer

src.analyse.analysis_utils.test_stationary(array: Series) → Series: Tests if data is stationary, if not returns data with first difference calculated

src.analyse.analysis_utils.zip_same_conditions_together(raw_data: list) → list[zip]: Iterates through raw data and zips keys/drums data from the same performance together Returns a list of zip objects, each element of which is a tuple containing