Cambridge Jazz Trio Database

Explore the database interactively! Code
View features extracted from a sample track! Code

Code Documentation

The Cambridge Jazz Trio Database🎹🎻🥁 is a dataset composed of about 16 hours of jazz performances with associated onset and beat annotations created by an automated signal processing pipeline.

The database contains about 16 hours of audio recorded between 1947–2015. The annotations were generated by an automated signal processing pipeline: two source separation models [1, 2] are applied to obtain isolated stems from every instrument in each recording, event onsets are tracked in these stems [3], and these onsets are matched with the nearest quarter note tracked in the audio mixture [4]. The pipeline was validated against an equivalent set of annotations generated by humans for 10% of the dataset, with an average F-score of .86.

All recordings consist of jazz piano trios ensembles and feature one of 10 different jazz pianists. In roughly half of recordings, the pianist was one of the top-10 most prolific and popular musicians in the piano trio format, identified through large-scale scraping of MusicBrainz and Last.FM data. In all other recordings, the pianist is Bill Evans, widely acknowledged as one of the most influential jazz pianists of all time.

This split enables models to be trained on both multi-class (i.e., which pianist is it?, using only the top-10 pianist recordings) and binary (i.e., is the pianist Bill Evans?, using all the data) classification problems.

Listen to the beats tracked in a performance of “Emily” (1970) by the Bill Evans Trio below: