src.analyse package

Submodules

src.analyse.analysis_utils module

Utility functions, classes, and constants for analysis and modelling

src.analyse.analysis_utils.append_zoom_array(perf_df: DataFrame, zoom_arr: array, onset_col: str = 'onset') → DataFrame: Appends a column to a dataframe showing the approx amount of latency by AV-Manip for each event in a performance

src.analyse.analysis_utils.average_bpms(df1: DataFrame, df2: DataFrame, window_size: int = 8, elap: str = 'elapsed', bpm: str = 'bpm') → DataFrame: Returns a list of averaged BPMs from two performance. Data is grouped by every second in a performance.

src.analyse.analysis_utils.create_model_list(df, avg_groupers: list, md='correction_partner_onset~C(latency)+C(jitter)+C(instrument)') → list: Subset a dataframe of per-condition results and return a list of statsmodels regression outputs for use in a table. By default, the regression will average results from the same condition across multiple measures. This can be overridden by setting the averaging argument to False.

src.analyse.analysis_utils.create_one_simulation(keys_data: Dict, drms_data: Dict, keys_params: Dict, drms_params: Dict, keys_noise, drms_noise, lat: ndarray, beats: int) → tuple: Create data for one simulation, using numba optimisations. This function is defined outside of the Simulation class to enable instances of the Simulation class to be pickled.

src.analyse.analysis_utils.extract_event_density(bpm: DataFrame, raw: DataFrame) → DataFrame: Appends a column to performance dataframe showing number of actual notes per extracted crotchet

src.analyse.analysis_utils.extract_interpolated_beats(c: array) → tuple[int, int]: Extracts the number of beats in the performance that required interpolation in REAPER. This was usually due to a performer ‘pushing’ ahead a crotchet beat by a swung quaver, or due to an implied metric modulation.

src.analyse.analysis_utils.extract_npvi(s: Series) → float: Extracts the normalised pairwise variability index (nPVI) from a column of IOIs

src.analyse.analysis_utils.extract_pairwise_asynchrony(keys_nn: DataFrame, drms_nn: DataFrame) → float

Extracts pairwise asynchrony from two matched dataframes.

Rasch (2015) defines pairwise asynchrony as as the root-mean-square of the standard deviations of the onset time differences for all pairs of voice parts. We can calculate this for each condition, using the nearest-neighbour model for both the keyboard and drummer.

src.analyse.analysis_utils.generate_df(data: array, iqr_range: tuple = (0.05, 0.95), threshold: float = 0, keep_pitch_vel: bool = False) → DataFrame: Create dataframe from MIDI performance data, either cleaned (just crotchet beats) or raw. Optional keyword arguments: iqr_range: Upper and lower quartile to clean IOI values by. threshold: Value to remove IOI timings below keep_pitch_vel: Keep pitch and velocity columns

src.analyse.analysis_utils.generate_tempo_slopes(raw_data: list) → list[tuple]: Returns average tempo slope coefficients for all performances as list of tuples in the form (trial, block, latency, jitter, avg. slope coefficient). Deprecated?

src.analyse.analysis_utils.iqr_filter(col: str, df: DataFrame, iqr_range: tuple = (0.05, 0.95)) → Series: Filter duration values below a certain quartile to remove extraneous midi notes not cleaned in Reaper

src.analyse.analysis_utils.load_data(input_filepath: str) → list: Loads all pickled data from the processed data folder

src.analyse.analysis_utils.load_from_disc(output_dir: str, filename: str = 'phase_correction_mds.p') → list: Try and load models from disc

src.analyse.analysis_utils.log_model(md, logger=None) → None: Helper function to log metadata for a particular model in our GUI, if we’ve passsed a logger function

src.analyse.analysis_utils.log_simulation(sim, logger=None) → None: Helper function to log metadata for a particular simulation in our GUI, if we’ve passed a logger function

src.analyse.analysis_utils.reg_func(df: DataFrame, xcol: str, ycol: str) → RegressionResults: Calculates linear regression between two given columns, returns results table. Deprecated.

src.analyse.analysis_utils.resample(perf: ~pandas.core.frame.DataFrame, func=<function nanmean>, col: str = 'my_onset', resample_window: str = '1s', interpolate: bool = True) → DataFrame: Resamples an individual performance dataframe to get mean of every second.

src.analyse.analysis_utils.return_average_coeffs(coeffs: list) → list[tuple]: Returns list of tuples containing average coefficient for keys/drums performance in a single trial Tuples take the form of those in generate_tempo_slopes, i.e. (trial, block, latency, jitter, avg. slope coefficient)

src.analyse.analysis_utils.return_coeff_from_sm_output(results: RegressionResults) → int: Formats the table returned by statsmodel to return only the regression coefficient as an integer

src.analyse.analysis_utils.test_stationary(array: Series) → Series: Tests if data is stationary, if not returns data with first difference calculated

src.analyse.analysis_utils.zip_same_conditions_together(raw_data: list) → list[zip]: Iterates through raw data and zips keys/drums data from the same performance together Returns a list of zip objects, each element of which is a tuple containing

src.analyse.phase_correction_models module

Code for generating phase correction models

class src.analyse.phase_correction_models.PhaseCorrectionModel(c1_keys: list, c2_drms: list, **kwargs)

Bases: object

A linear phase correction model for a single performance (keys and drums).

static _append_zoom_array(perf_df: DataFrame, zoom_arr: array, onset_col: str = 'my_onset') → DataFrame: Appends a column to a dataframe showing the approx amount of latency by AV-Manip for each event in a performance

_apply_elliptic_envelope(delayed_arr: ndarray, nn_np: ndarray) → DataFrame: Applies an EllipticEnvelope filter to data to extract outliers and rematch or set them to missing. Numba isn’t used here as it isn’t supported by EllipticEnvelope and sklearn.

static _cleaning_for_180ms(delayed_arr: ndarray, nn_np: ndarray) → ndarray: Applies specialised cleaning using bins to performances with 180ms of latency. Optimised with numba.

_create_higher_order_phase_correction_models(df: DataFrame, endog: str = 'my_next_ioi_diff', exog_vars: tuple[str] = ('my_prev_ioi_diff', 'asynchrony')) → list: Creates a list of higher order phase correction models, including a greater number of lags of the asynchrony and previous IOI terms

_create_phase_correction_model(df: DataFrame, md: Optional[str] = None): Create the linear phase correction model

_create_summary_dictionary(c: dict, md, nn: DataFrame, rn: int, higher_order_md: list): Creates a dictionary of summary statistics, used when analysing all models.

_extract_asynchrony_third_person(async_col: str = 'asynchrony_third_person', subset: Optional[int] = None) → float: Extracts asynchrony experienced by an imagined third person joined to the Zoom call

static _extract_pairwise_asynchrony(nn, asynchrony_col: str = 'asynchrony')

Extract pairwise asynchrony as a float (in milliseconds, as is standard for this unit in the literature)

Method: — - Carry out the nearest-neighbour matching, and get a series of asynchrony values for both musicians

(I.e. keys -> drums with delay, drums -> keys with delay).

Square all of these values;
Get the overall mean (here we collapse both arrays down to a single value);
Take the square root of this mean.

_extract_pairwise_asynchrony_with_standard_deviations(async_col: str = 'asynchrony')

Extract pairwise asynchrony using the standard deviation of the asynchrony

Method: — - Join both nearest-neighbour dataframes in order to match asynchrony values together; - Square all of these values; - Get the overall mean (here we collapse both arrays down to a single value); - Take the square root of this mean. - Repeat the join process with the other dataframe as left join, get the pairwise asynchrony again - Take the mean of both (this is to prevent issues with the dataframe join process)

_extract_tempo_slope(subset: Optional[int] = None)

Extracts tempo slope, in the form of a float representing BPM change per second.

Method: — - Resample both dataframes to get the mean BPM value every second. - Concatenate these dataframes together, and get the mean BPM by both performers each second - Compute a regression of this mean BPM against the overall elapsed time and extract the coefficient

_format_df_for_model(df: DataFrame) → DataFrame: Coerces a dataframe into the format required for the phrase-correction model, including setting required columns

_generate_df(data: list, threshold: float = 0.25) → tuple[pandas.core.frame.DataFrame, int]: Create dataframe, append zoom array, and add a column with our delayed onsets. This latter column replicates the performance as it would have been heard by our partner.

_get_contamination_value_from_json(default: Optional[float] = None) → float

_get_rolling_coefficients(nn_df: DataFrame, func=None, ind_var: str = 'latency', dep_var: str = 'my_prev_ioi', cov: str = 'their_prev_ioi') → list

Centralised function for calculating the relationship between IOI and latency variancy. Takes in a single independent and dependent variable and covariants, as well as a function to apply to these.

Method: — - Get rolling standard deviation values for all passed variables. - Lag these values according to the maximum lag attribute passed when creating the class instance. - Apply a function (defaults to regression) onto the lagged and non-lagged variables and return the results.

_get_rolling_standard_deviation_values(nn_df: DataFrame, cols: tuple[str] = ('my_prev_ioi', 'their_prev_ioi', 'latency')) → DataFrame: Extracts the rolling standard deviation of values within a given window size, then resample to get mean value for every second.

_iqr_filter(df: DataFrame, col: str) → DataFrame: Applies an inter-quartile range filter to set outlying values for a particular column to missing.

_lag_rolling_values(roll: DataFrame, cols: tuple[str] = ('my_prev_ioi_std', 'their_prev_ioi_std', 'latency_std')) → DataFrame: Shifts rolling values by a given number of seconds and concatenates together with the original dataframe.

_match_onsets(live_arr: ndarray, delayed_arr: ndarray, zoom_arr: ndarray) → DataFrame: For a single performer, matches each of their live onsets with the closest delayed onset from their partner.

static _nearest_neighbour(live_arr, delayed_arr, empty_arr): Carry out the nearest-neighbour matching. Optimised with numba.

_partial_corr_shifted_rolling_variables(lagged: DataFrame, dep_var: str = 'my_prev_ioi_std', ind_var: str = 'latency_std', cov_var: str = 'their_prev_ioi_std') → list: Gets the partial correlation between dep_var and ind_var, controlling for covariate cov_var

static _regress_shifted_rolling_variables(lagged: DataFrame, dep_var: str = 'my_prev_ioi_std', ind_var: str = 'latency_std', cov_var: str = 'their_prev_ioi_std') → list[float]: Creates a regression model of lagged variables vs non-lagged variables and extracts coefficients.

static _remove_duplicate_matches(nn_np: ndarray) → ndarray: Filters onsets for duplicate matches, then keeps whichever match is closest to median asynchrony time.

static _return_granger_causality(nn: DataFrame, maxlag: int = 1) → dict: Calculates Granger causality between time series

src.analyse.phase_correction_models.generate_phase_correction_models(raw_data: list, output_dir: str, logger=None, force_rebuild: bool = False) → tuple[list[src.analyse.phase_correction_models.PhaseCorrectionModel], str]: Generates all phase correction models. Returns the models and a string for logging

src.analyse.questionnaire module

Unused code stub for analysis of questionnaire data

class src.analyse.questionnaire.InterRaterReliability(**kwargs)

Bases: object

Deprecated

_recode_categorical_variables(s: Series) → Series: Replaces a ordinal variable from 1-9 with three categories, in accordance with those in the questionnaire

_return_cohen_kappa(grp: groupby, var: str) → float: Returns Cohen’s kappa value

_return_interclass_correlation(grp: groupby, var: str) → float: Returns the interclass correlation, in the form of Pearson’s R statistic

_return_kendall_w(grp: groupby, var: str) → float: Returns Kendall’s w value

format_df() → DataFrame: Called from outside the class, returns a dataframe grouped by class containing all the required summary stats

class src.analyse.questionnaire.TestRetestReliability(**kwargs)

Bases: object

Deprecated

format_df_for_regplot(var: str) → DataFrame: Returns a dataframe that can be provided to RegPlotSingle class to create a regression plot of scores of one variable across both measures, for all trials

format_trr_df() → DataFrame: Returns a dataframe of test-retest reliability scores for all variables, stratified by individual duos

src.analyse.questionnaire.questionnaire_analysis(raw_data: list[list], output_dir: str) → None: Unused

src.analyse.run_analysis module

Central file for running all analysis functions, called by run.cmd

src.analyse.simulations module

Code for generating simulations using phase correction models

class src.analyse.simulations.Simulation(pcm: PhaseCorrectionModel, num_simulations: int = 500, **kwargs)

Bases: object

Creates X number (default 500) of simulated performances from a given phase correction model.

Number of simulations defaults to 500, the same number in Jacoby et al. (2021).

static _append_timestamps_to_latency_array(latency_array, offset: int = 8, resample_rate: float = 0.75) → array: Appends timestamps showing the onset time for each value in the latency array applied to a performance

static _convert_musician_parameters_dict_to_numba(python_dict: dict) → Dict: Converts a Python dictionary into a type that can be utilised by Numba

_create_summary_dictionary() → dict: Creates a summary dictionary with important simulation parameters

_format_simulated_data(data: Dict) → DataFrame: Formats data from one simulation by creating a dataframe, adding in the timedelta column, and resampling to get the mean IOI (defaults to every second)

static _get_average_var_for_one_simulation(all_perf: list[pandas.core.frame.DataFrame], var: str = 'my_next_ioi') → DataFrame: Concatenate all simulations together and get the row-wise average (i.e. avg IOI every second)

static _get_number_of_beats_for_simulation(kp, dp) → int: Averages the total number of beats across both keys and drums, then gets the upper ceiling.

static _get_raw_musician_parameters(init: DataFrame) → dict: Gets necessary simulation parameters from pandas dataframe and converts to a dictionary

_get_rolling_standard_deviation_values(df: DataFrame, cols: tuple[str] = ('my_prev_ioi',)) → DataFrame

_initialise_empty_data(iois: tuple[float] = (0.5, 0.5), onset: float = 8) → Dict: Initialise an empty numba dictionary of string-array pairs, for storing data from one simulation in.

_modify_musician_parameters_by_simulation_type(input_data): Modifies a simulated musician’s parameters according to the given simulation type

create_all_simulations() → None: Run the simulations and create a list of dataframes for each individual performer

get_average_ioi_variability(func=<function nanmean>, **kwargs) → float

Returns the average tempo slope for all simulations.

Method: — - For every simulation, get the median IOI standard deviation value over the window size - Calculate the mean of all of these values.

get_average_pairwise_asynchrony(func=<function nanmean>, async_col: str = 'asynchrony', **kwargs) → float: Gets the average pairwise asynchrony (in milliseconds!) across all simulated performances

get_average_tempo_slope(func=<function nanmean>, **kwargs) → float

Returns the average tempo slope for all simulations.

Method: — - For every simulation, zip the corresponding keys and drums performance together. - Then, get the average IOI for every second across both keys and drums.

This is straightforward, because we resampled to average IOI per second in _format_simulated_data

Convert average IOI to average BPM by dividing by 60, then regress against elapsed seconds
Extract the slope coefficient, take the median across all simulations, and return

get_simulation_data_for_plotting(plot_individual: bool = True, plot_average: bool = True, var: str = 'my_next_ioi', timespan: tuple = (7, 101)) → tuple: Wrangles simulation data into a format that can be plotted and returns.

src.analyse.simulations.generate_phase_correction_simulations_for_coupling_parameters(mds: list[src.analyse.phase_correction_models.PhaseCorrectionModel], output_dir: str, logger=None, force_rebuild: bool = False, num_simulations: int = 500) → tuple[list[src.analyse.simulations.Simulation], str]: Create simulated performances across a range of artificial coupling parameters for every phase correction model

src.analyse.simulations.generate_phase_correction_simulations_for_individual_conditions(mds: list[src.analyse.phase_correction_models.PhaseCorrectionModel], output_dir: str, logger=None, force_rebuild: bool = False, num_simulations: int = 500) → list[src.analyse.simulations.Simulation]: Create simulated performances using the coupling within every individual performance.

Module contents

Scripts to turn final dataset into models and simulations