Da-TACOS: A dataset for cover song identification and understanding

Yesiler, F; Tralie, C; Correya, A; Silva, DF; Tovstogan, P; Gómez, E; Serra, X.

Note: This bibliographic page is archived and will no longer be updated. For an up-to-date list of publications from the Music Technology Group see the Publications list .

Da-TACOS: A dataset for cover song identification and understanding

Title	Da-TACOS: A dataset for cover song identification and understanding
Publication Type	Conference Paper
Year of Publication	2019
Conference Name	International Society for Music Information Retrieval (ISMIR)
Authors	Yesiler, F. , Tralie C. , Correya A. , Silva D. F. , Tovstogan P. , Gómez E. , & Serra X.
Conference Start Date	04/11/2019
Conference Location	Delft, Netherlands
Abstract	This paper focuses on Cover Song Identification (CSI), an important research challenge in content-based Music Information Retrieval (MIR). Although the task itself is interesting and challenging for both academia and industry scenarios, there are a number of limitations for the advancement of current approaches. We specifically address two of them in the present study. First, the number of publicly available datasets for this task is limited, and there is no publicly available benchmark set that is widely used among researchers for comparative algorithm evaluation. Second, most of the algorithms are not publicly shared and reproducible, limiting the comparison of approaches. To overcome these limitations we propose Da-TACOS, a DaTAset for COver Song Identification and Understanding, and two frameworks for feature extraction and benchmarking to facilitate reproducibility. Da-TACOS contains 25K songs represented by unique editorial metadata plus 9 low- and mid-level features pre-computed with open source libraries, and is divided into two subsets. The Cover Analysis subset contains audio features (e.g. key, tempo) that can serve to study how musical characteristics vary for cover songs. The Benchmark subset contains the set of features that have been frequently used in CSI research, e.g. chroma, MFCC, beat onsets etc. Moreover, we provide initial benchmarking results of a selected number of state-of-the-art CSI algorithms using our dataset, and for reproducibility, we share a GitHub repository containing the feature extraction and benchmarking frameworks.
preprint/postprint document	http://hdl.handle.net/10230/42771