Low-Latency Instrument Separation in Polyphonic Audio Using Timbre Models

Publication Type Conference Paper
Year of Publication 2012
Conference Name Latent Variable Analysis and Signal Separation - 10th International Conference, LVA/ICA
Authors Marxer, R. , Janer J. , & Bonada J.
Pagination 314-321
Conference Start Date 12/03/2012
Publisher Springer Berlin / Heidelberg
Conference Location Tel Aviv, Israel
ISBN Number 978-3-642-28550-9
Abstract This research focuses on the removal of the singing voice in polyphonic audio recordings under real-time constraints. It is based on time-frequency binary masks resulting from the combination of azimuth, phase difference and absolute frequency spectral bin classification and harmonic-derived masks. For the harmonic-derived masks, a pitch likelihood estimation technique based on Tikhonov regularization is proposed. A method for target instrument pitch tracking makes use of supervised timbre models. This approach runs in real-time on off-the-shelf computers with latency below 250ms. The method was compared to a state of the art Non-negative Matrix Factorization (NMF) offline technique and to the ideal binary mask separation. For the evaluation we used a dataset of multi-track versions of professional audio recordings.
preprint/postprint document files/publications/lowlatency.pdf