Audio Source Separation for Music in Low-latency and High-latency Scenarios

Marxer, R.

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Audio Source Separation for Music in Low-latency and High-latency Scenarios

Title	Audio Source Separation for Music in Low-latency and High-latency Scenarios
Publication Type	PhD Thesis
Year of Publication	2013
University	Universitat Pompeu Fabra
Authors	Marxer, R.
Academic Department	Department of Information and Communication Technologies
Date Published	09/2013
City	Barcelona
Abstract	The source separation problem in digital signal processing consists in finding the original signals that were mixed together into a set of mixture signals. Solutions to this problem have been extensively studied for the specific case of musical signals, however their application to real-world practical situations remains infrequent. There are two main obstacles for their widespread adoption depending on the scenario. The main limitation in some cases is their high latency and computational requirements. In other cases the quality of the results is still unacceptable. There has been extensive work on improving the quality of music separation, but few studies have been devoted to the development of low-latency and low computational cost separation of monaural music signals. We propose specific methods to address these issues in each of these scenarios independently. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch and multipitch estimation and tracking tasks, which are a crucial step in many separation methods. We then use the proposed spectrum decomposition method in low-latency music separation tasks targeting singing voice, bass and drums. Second, we develop methods that achieve improved separation results with respect to existing state-of-the-art methods at the cost of greater computational cost and higher latency. We propose several high-latency and computationally complex methods that improve the separation of singing voice, by modeling components that are often not accounted for, such as breathiness and the consonants. Finally we explore the use of temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals.
Final publication	http://hdl.handle.net/10803/123808