Entity Linking for the Music Domain
ELMD: Entity Linking for the Music Domain Dataset
ELMD is a corpus of annotated named entities from the music domain that comes from a collection of about 13k Last.fm artist biographies. Entities are linked to DBpedia thanks to a voting system among different state of the art Entity Linking systems (ELVIS) with a precision of at least 0,94. In addition, by setting up a higher confidence threshold it is possible to obtain a subset of ELMD that prioritizes higher Precision by sacrificing Recall.ELMD 2.0
During the last months we have reviewed and expanded ELMD, expanding it as follows:- Most of the entities are also linked now to MusicBrainz (Mapping retrieved through Last.fm API)
- More annotations have been added by propagating existing annotations throughout the document in which they were found, assuming they appear in a one-sense-per-discourse fashion.
- New output formats have been added: NIF and GATE
Annotations | Entities | |
---|---|---|
All | 144,593 | 63,902 |
Artist | 112,524 | 39,131 |
Album | 18,701 | 15,064 |
Track | 9,203 | 7,832 |
Label | 4,165 | 1,875 |
Annotations | Entities | |
---|---|---|
DBpedia | 58.6% | 49.1% |
MusicBrainz | 93.6% | 91.1% |
Both | 57.2% | 47% |
None | 5% | 9.2% |
ELMD 2.0 is available in the following formats
In the JSON version every biography is stored in a separate document and splitted in sentences. For every sentence, annotations are stored as a list of entities with the following fields: startChar, endChar, uri (DBpedia URI), mbid (MusicBrainz ID), category (Artist/Album/Track/Label), and lastfm_url (Last.fm URL). Track and Album entities may have an additional mbid_artist field, which provides the artist's MusicBrainz ID.In the XML version, entities are annotated inside text using the category of the entity as the XML tag and with 3 attributes: dbp (DBpedia URI), mb (MusicBrainz ID) and lfm (Last.fm URL).
The NIF version has the whole dataset in one single file, following the NIF 2.0 specification
The original ELMD 1.0 is also available for download here.
ELVIS (Entity Linking Framework Voting and Integration System), the source code used to generate ELMD 1.0 and 2.0, is also available for download here: https://github.com/sergiooramas/elvis
ELMDist: A vector space model with words and MusicBrainz entities
In addition, word vectors have been trained from ELMD 2.0 using word2vec. Vectors can be downloaded here: The code to retrain the vectors is available here.Scientific References
Please cite the following paper if using ELVIS or any of the datasets (ELMD 1.0 and 2.0).
(2016).
Please cite the following paper if using ELMDist.
ELMDist: A vector space model with words and MusicBrainz entities. Workshop on Semantic Deep Learning (SemDeep), collocated with ESWC 2017.
(2017).