src.analyse package
Submodules
src.analyse.analysis_utils module
Utility functions, classes, and constants for analysis and modelling
- src.analyse.analysis_utils.append_zoom_array(perf_df: DataFrame, zoom_arr: array, onset_col: str = 'onset') DataFrame
Appends a column to a dataframe showing the approx amount of latency by AV-Manip for each event in a performance
- src.analyse.analysis_utils.average_bpms(df1: DataFrame, df2: DataFrame, window_size: int = 8, elap: str = 'elapsed', bpm: str = 'bpm') DataFrame
Returns a list of averaged BPMs from two performance. Data is grouped by every second in a performance.
- src.analyse.analysis_utils.create_model_list(df, avg_groupers: list, md='correction_partner_onset~C(latency)+C(jitter)+C(instrument)') list
Subset a dataframe of per-condition results and return a list of statsmodels regression outputs for use in a table. By default, the regression will average results from the same condition across multiple measures. This can be overridden by setting the averaging argument to False.
- src.analyse.analysis_utils.create_one_simulation(keys_data: Dict, drms_data: Dict, keys_params: Dict, drms_params: Dict, keys_noise, drms_noise, lat: ndarray, beats: int) tuple
Create data for one simulation, using numba optimisations. This function is defined outside of the Simulation class to enable instances of the Simulation class to be pickled.
- src.analyse.analysis_utils.extract_event_density(bpm: DataFrame, raw: DataFrame) DataFrame
Appends a column to performance dataframe showing number of actual notes per extracted crotchet
- src.analyse.analysis_utils.extract_interpolated_beats(c: array) tuple[int, int]
Extracts the number of beats in the performance that required interpolation in REAPER. This was usually due to a performer ‘pushing’ ahead a crotchet beat by a swung quaver, or due to an implied metric modulation.
- src.analyse.analysis_utils.extract_npvi(s: Series) float
Extracts the normalised pairwise variability index (nPVI) from a column of IOIs
- src.analyse.analysis_utils.extract_pairwise_asynchrony(keys_nn: DataFrame, drms_nn: DataFrame) float
Extracts pairwise asynchrony from two matched dataframes.
Rasch (2015) defines pairwise asynchrony as as the root-mean-square of the standard deviations of the onset time differences for all pairs of voice parts. We can calculate this for each condition, using the nearest-neighbour model for both the keyboard and drummer.
- src.analyse.analysis_utils.generate_df(data: array, iqr_range: tuple = (0.05, 0.95), threshold: float = 0, keep_pitch_vel: bool = False) DataFrame
Create dataframe from MIDI performance data, either cleaned (just crotchet beats) or raw. Optional keyword arguments: iqr_range: Upper and lower quartile to clean IOI values by. threshold: Value to remove IOI timings below keep_pitch_vel: Keep pitch and velocity columns
- src.analyse.analysis_utils.generate_tempo_slopes(raw_data: list) list[tuple]
Returns average tempo slope coefficients for all performances as list of tuples in the form (trial, block, latency, jitter, avg. slope coefficient). Deprecated?
- src.analyse.analysis_utils.iqr_filter(col: str, df: DataFrame, iqr_range: tuple = (0.05, 0.95)) Series
Filter duration values below a certain quartile to remove extraneous midi notes not cleaned in Reaper
- src.analyse.analysis_utils.load_data(input_filepath: str) list
Loads all pickled data from the processed data folder
- src.analyse.analysis_utils.load_from_disc(output_dir: str, filename: str = 'phase_correction_mds.p') list
Try and load models from disc
- src.analyse.analysis_utils.log_model(md, logger=None) None
Helper function to log metadata for a particular model in our GUI, if we’ve passsed a logger function
- src.analyse.analysis_utils.log_simulation(sim, logger=None) None
Helper function to log metadata for a particular simulation in our GUI, if we’ve passed a logger function
- src.analyse.analysis_utils.reg_func(df: DataFrame, xcol: str, ycol: str) RegressionResults
Calculates linear regression between two given columns, returns results table. Deprecated.
- src.analyse.analysis_utils.resample(perf: ~pandas.core.frame.DataFrame, func=<function nanmean>, col: str = 'my_onset', resample_window: str = '1s', interpolate: bool = True) DataFrame
Resamples an individual performance dataframe to get mean of every second.
- src.analyse.analysis_utils.return_average_coeffs(coeffs: list) list[tuple]
Returns list of tuples containing average coefficient for keys/drums performance in a single trial Tuples take the form of those in generate_tempo_slopes, i.e. (trial, block, latency, jitter, avg. slope coefficient)
- src.analyse.analysis_utils.return_coeff_from_sm_output(results: RegressionResults) int
Formats the table returned by statsmodel to return only the regression coefficient as an integer
- src.analyse.analysis_utils.test_stationary(array: Series) Series
Tests if data is stationary, if not returns data with first difference calculated
- src.analyse.analysis_utils.zip_same_conditions_together(raw_data: list) list[zip]
Iterates through raw data and zips keys/drums data from the same performance together Returns a list of zip objects, each element of which is a tuple containing
src.analyse.phase_correction_models module
Code for generating phase correction models
- class src.analyse.phase_correction_models.PhaseCorrectionModel(c1_keys: list, c2_drms: list, **kwargs)
Bases:
object
A linear phase correction model for a single performance (keys and drums).
- static _append_zoom_array(perf_df: DataFrame, zoom_arr: array, onset_col: str = 'my_onset') DataFrame
Appends a column to a dataframe showing the approx amount of latency by AV-Manip for each event in a performance
- _apply_elliptic_envelope(delayed_arr: ndarray, nn_np: ndarray) DataFrame
Applies an EllipticEnvelope filter to data to extract outliers and rematch or set them to missing. Numba isn’t used here as it isn’t supported by EllipticEnvelope and sklearn.
- static _cleaning_for_180ms(delayed_arr: ndarray, nn_np: ndarray) ndarray
Applies specialised cleaning using bins to performances with 180ms of latency. Optimised with numba.
- _create_higher_order_phase_correction_models(df: DataFrame, endog: str = 'my_next_ioi_diff', exog_vars: tuple[str] = ('my_prev_ioi_diff', 'asynchrony')) list
Creates a list of higher order phase correction models, including a greater number of lags of the asynchrony and previous IOI terms
- _create_phase_correction_model(df: DataFrame, md: Optional[str] = None)
Create the linear phase correction model
- _create_summary_dictionary(c: dict, md, nn: DataFrame, rn: int, higher_order_md: list)
Creates a dictionary of summary statistics, used when analysing all models.
- _extract_asynchrony_third_person(async_col: str = 'asynchrony_third_person', subset: Optional[int] = None) float
Extracts asynchrony experienced by an imagined third person joined to the Zoom call
- static _extract_pairwise_asynchrony(nn, asynchrony_col: str = 'asynchrony')
Extract pairwise asynchrony as a float (in milliseconds, as is standard for this unit in the literature)
Method: — - Carry out the nearest-neighbour matching, and get a series of asynchrony values for both musicians
(I.e. keys -> drums with delay, drums -> keys with delay).
Square all of these values;
Get the overall mean (here we collapse both arrays down to a single value);
Take the square root of this mean.
- _extract_pairwise_asynchrony_with_standard_deviations(async_col: str = 'asynchrony')
Extract pairwise asynchrony using the standard deviation of the asynchrony
Method: — - Join both nearest-neighbour dataframes in order to match asynchrony values together; - Square all of these values; - Get the overall mean (here we collapse both arrays down to a single value); - Take the square root of this mean. - Repeat the join process with the other dataframe as left join, get the pairwise asynchrony again - Take the mean of both (this is to prevent issues with the dataframe join process)
- _extract_tempo_slope(subset: Optional[int] = None)
Extracts tempo slope, in the form of a float representing BPM change per second.
Method: — - Resample both dataframes to get the mean BPM value every second. - Concatenate these dataframes together, and get the mean BPM by both performers each second - Compute a regression of this mean BPM against the overall elapsed time and extract the coefficient
- _format_df_for_model(df: DataFrame) DataFrame
Coerces a dataframe into the format required for the phrase-correction model, including setting required columns
- _generate_df(data: list, threshold: float = 0.25) tuple[pandas.core.frame.DataFrame, int]
Create dataframe, append zoom array, and add a column with our delayed onsets. This latter column replicates the performance as it would have been heard by our partner.
- _get_contamination_value_from_json(default: Optional[float] = None) float
- _get_rolling_coefficients(nn_df: DataFrame, func=None, ind_var: str = 'latency', dep_var: str = 'my_prev_ioi', cov: str = 'their_prev_ioi') list
Centralised function for calculating the relationship between IOI and latency variancy. Takes in a single independent and dependent variable and covariants, as well as a function to apply to these.
Method: — - Get rolling standard deviation values for all passed variables. - Lag these values according to the maximum lag attribute passed when creating the class instance. - Apply a function (defaults to regression) onto the lagged and non-lagged variables and return the results.
- _get_rolling_standard_deviation_values(nn_df: DataFrame, cols: tuple[str] = ('my_prev_ioi', 'their_prev_ioi', 'latency')) DataFrame
Extracts the rolling standard deviation of values within a given window size, then resample to get mean value for every second.
- _iqr_filter(df: DataFrame, col: str) DataFrame
Applies an inter-quartile range filter to set outlying values for a particular column to missing.
- _lag_rolling_values(roll: DataFrame, cols: tuple[str] = ('my_prev_ioi_std', 'their_prev_ioi_std', 'latency_std')) DataFrame
Shifts rolling values by a given number of seconds and concatenates together with the original dataframe.
- _match_onsets(live_arr: ndarray, delayed_arr: ndarray, zoom_arr: ndarray) DataFrame
For a single performer, matches each of their live onsets with the closest delayed onset from their partner.
- static _nearest_neighbour(live_arr, delayed_arr, empty_arr)
Carry out the nearest-neighbour matching. Optimised with numba.
- _partial_corr_shifted_rolling_variables(lagged: DataFrame, dep_var: str = 'my_prev_ioi_std', ind_var: str = 'latency_std', cov_var: str = 'their_prev_ioi_std') list
Gets the partial correlation between dep_var and ind_var, controlling for covariate cov_var
- static _regress_shifted_rolling_variables(lagged: DataFrame, dep_var: str = 'my_prev_ioi_std', ind_var: str = 'latency_std', cov_var: str = 'their_prev_ioi_std') list[float]
Creates a regression model of lagged variables vs non-lagged variables and extracts coefficients.
- static _remove_duplicate_matches(nn_np: ndarray) ndarray
Filters onsets for duplicate matches, then keeps whichever match is closest to median asynchrony time.
- static _return_granger_causality(nn: DataFrame, maxlag: int = 1) dict
Calculates Granger causality between time series
- src.analyse.phase_correction_models.generate_phase_correction_models(raw_data: list, output_dir: str, logger=None, force_rebuild: bool = False) tuple[list[src.analyse.phase_correction_models.PhaseCorrectionModel], str]
Generates all phase correction models. Returns the models and a string for logging
src.analyse.questionnaire module
Unused code stub for analysis of questionnaire data
- class src.analyse.questionnaire.InterRaterReliability(**kwargs)
Bases:
object
Deprecated
- _recode_categorical_variables(s: Series) Series
Replaces a ordinal variable from 1-9 with three categories, in accordance with those in the questionnaire
- _return_cohen_kappa(grp: groupby, var: str) float
Returns Cohen’s kappa value
- _return_interclass_correlation(grp: groupby, var: str) float
Returns the interclass correlation, in the form of Pearson’s R statistic
- _return_kendall_w(grp: groupby, var: str) float
Returns Kendall’s w value
- format_df() DataFrame
Called from outside the class, returns a dataframe grouped by class containing all the required summary stats
- class src.analyse.questionnaire.TestRetestReliability(**kwargs)
Bases:
object
Deprecated
- format_df_for_regplot(var: str) DataFrame
Returns a dataframe that can be provided to RegPlotSingle class to create a regression plot of scores of one variable across both measures, for all trials
- format_trr_df() DataFrame
Returns a dataframe of test-retest reliability scores for all variables, stratified by individual duos
- src.analyse.questionnaire.questionnaire_analysis(raw_data: list[list], output_dir: str) None
Unused
src.analyse.run_analysis module
Central file for running all analysis functions, called by run.cmd
src.analyse.simulations module
Code for generating simulations using phase correction models
- class src.analyse.simulations.Simulation(pcm: PhaseCorrectionModel, num_simulations: int = 500, **kwargs)
Bases:
object
Creates X number (default 500) of simulated performances from a given phase correction model.
Number of simulations defaults to 500, the same number in Jacoby et al. (2021).
- static _append_timestamps_to_latency_array(latency_array, offset: int = 8, resample_rate: float = 0.75) array
Appends timestamps showing the onset time for each value in the latency array applied to a performance
- static _convert_musician_parameters_dict_to_numba(python_dict: dict) Dict
Converts a Python dictionary into a type that can be utilised by Numba
- _create_summary_dictionary() dict
Creates a summary dictionary with important simulation parameters
- _format_simulated_data(data: Dict) DataFrame
Formats data from one simulation by creating a dataframe, adding in the timedelta column, and resampling to get the mean IOI (defaults to every second)
- static _get_average_var_for_one_simulation(all_perf: list[pandas.core.frame.DataFrame], var: str = 'my_next_ioi') DataFrame
Concatenate all simulations together and get the row-wise average (i.e. avg IOI every second)
- static _get_number_of_beats_for_simulation(kp, dp) int
Averages the total number of beats across both keys and drums, then gets the upper ceiling.
- static _get_raw_musician_parameters(init: DataFrame) dict
Gets necessary simulation parameters from pandas dataframe and converts to a dictionary
- _get_rolling_standard_deviation_values(df: DataFrame, cols: tuple[str] = ('my_prev_ioi',)) DataFrame
- _initialise_empty_data(iois: tuple[float] = (0.5, 0.5), onset: float = 8) Dict
Initialise an empty numba dictionary of string-array pairs, for storing data from one simulation in.
- _modify_musician_parameters_by_simulation_type(input_data)
Modifies a simulated musician’s parameters according to the given simulation type
- create_all_simulations() None
Run the simulations and create a list of dataframes for each individual performer
- get_average_ioi_variability(func=<function nanmean>, **kwargs) float
Returns the average tempo slope for all simulations.
Method: — - For every simulation, get the median IOI standard deviation value over the window size - Calculate the mean of all of these values.
- get_average_pairwise_asynchrony(func=<function nanmean>, async_col: str = 'asynchrony', **kwargs) float
Gets the average pairwise asynchrony (in milliseconds!) across all simulated performances
- get_average_tempo_slope(func=<function nanmean>, **kwargs) float
Returns the average tempo slope for all simulations.
Method: — - For every simulation, zip the corresponding keys and drums performance together. - Then, get the average IOI for every second across both keys and drums.
This is straightforward, because we resampled to average IOI per second in _format_simulated_data
Convert average IOI to average BPM by dividing by 60, then regress against elapsed seconds
Extract the slope coefficient, take the median across all simulations, and return
- get_simulation_data_for_plotting(plot_individual: bool = True, plot_average: bool = True, var: str = 'my_next_ioi', timespan: tuple = (7, 101)) tuple
Wrangles simulation data into a format that can be plotted and returns.
- src.analyse.simulations.generate_phase_correction_simulations_for_coupling_parameters(mds: list[src.analyse.phase_correction_models.PhaseCorrectionModel], output_dir: str, logger=None, force_rebuild: bool = False, num_simulations: int = 500) tuple[list[src.analyse.simulations.Simulation], str]
Create simulated performances across a range of artificial coupling parameters for every phase correction model
- src.analyse.simulations.generate_phase_correction_simulations_for_individual_conditions(mds: list[src.analyse.phase_correction_models.PhaseCorrectionModel], output_dir: str, logger=None, force_rebuild: bool = False, num_simulations: int = 500) list[src.analyse.simulations.Simulation]
Create simulated performances using the coupling within every individual performance.
Module contents
Scripts to turn final dataset into models and simulations