New datasets released
New datasets released
We have just released two datasets from two papers that will be presented at ISMIR this year.
The Semantic Artist Similarity dataset consists of two datasets of artists entities with their corresponding biography texts, and the list of top-10 most similar artists within the datasets used as ground truth. The dataset is composed by a corpus of 268 artists and a slightly larger one of 2,336 artists, both gathered from Last.fm. The former is mapped to the MIREX Audio and Music Similarity evaluation dataset, so that its similarity judgments can be used as ground truth. For the latter corpus we use the similarity between artists as provided by the Last.fm API.
Oramas, S., Sordo M., Espinosa-Anke L., & Serra X. (2015). A Semantic-based Approach for Artist Similarity. 16th International Society for Music Information Retrieval Conference.
- FlaBase: A Flamenco Music Knowledge Base
Its ultimate aim is to gather all available online editorial, biographical and musicological information related to flamenco music. Its content is the result of the curation and extraction processes combining several data sources (Wikipedia, MusicBrainz and Flamenco webs). FlaBase is stored in JSON format. This first release of FlaBase contains information about 1,102 artists, 74 palos (flamenco genres), 2,860 albums, 13,311 tracks, and 771 Andalusian locations.
Oramas, S., Gómez F., Gómez E., & Mora J. (2015). FlaBase: Towards the Creation of a Flamenco Music Knowledge Base. 16th International Society for Music Information Retrieval Conference.