Pitch Estimation of Choir Music using Deep Learning Strategies: from Solo to Unison Recordings

TitlePitch Estimation of Choir Music using Deep Learning Strategies: from Solo to Unison Recordings
Publication TypeMaster Thesis
Year of Publication2017
AuthorsCuesta, H.
AbstractThe goals of this thesis are the creation of new datasets to study aspects of choir singing, focusing on unison performances, and to research on data-driven methods for the automatic pitch estimation of a cappella choir singing performances. Choral music is polyphonic and involves multiple singers typically grouped into four main voices (soprano, alto, tenor and bass). The task of multi-pitch estimation becomes challenging due to the variety of acoustic scenarios (from solo singers to big choirs) and the lack of annotated datasets for training and evaluation, especially for the polyphonic case. In particular, we focus on building models of pitch from mono- phonic and unison recordings. In order to do that, we first build a dataset of choir singing that contains different types of performances: solo singers, unison, and four parts choir. Then, we train several deep learning architectures to extract pitch infor- mation from monophonic singing voice signals, and adapt them afterwards to model unison performances. The models for monophonic pitch estimation achieve state-of-the-art performances, and in some cases we outperform some of them, especially for the mid-frequency range. The model for unison choir is capable of predicting the average pitch and its dispersion of a unison performance with an average accuracy of 70%, although its accuracy and generalization capabilities are limited by the size of the dataset. The presented models provide a first step towards de automatic transcription of choir singing recordings, and the unison model is a useful resource for choir singing synthesis.
Final publicationhttps://doi.org/10.5281/zenodo.1108524
intranet