|Abstract||Research corpora are fundamental for the computational study of music. The design criteria with which to create them is a research task in itself. These corpora need to be well suited for the specific research problems to be addressed. Since these research problems are also shaped by musical, cultural and other specific aspects of the music traditions to be studied, the research corpora should take these specificities into account.
In this paper we address the problems of creating corpora for computational research on Arab-Andalusian music, considering several relevant criteria for creating such corpora. We focus on the problems raised during the annotation process of the corpora, specifically the language issues surrounding this art music tradition.
Following the criteria, we created a research corpus consisting of audio recordings with their corresponding metadata, lyrics and music scores. So far we have gathered 338 recordings from 3 different Arab-Andalusian music schools of Morocco, covering most of the musical modes, rhythms and forms of this art music tradition. The Arab-Andalusian corpus is accessible to the research community from a central online repository. Moreover, the audio recordings of this corpora are freely available through the Internet Archive repository. The Arab-Andalusian corpus can be used to generate test datasets, which can be used as ground truth to test several computational research tasks.