START Conference Manager    

The impact of quality with respect to the use of controlled vocabularies in metadata

Javier Nogueras-Iso, Javier Lacasta, Gilles Falquet and F. Javier Zarazaga-Soria

(Submission #163)


Abstract

In order to facilitate the discovery and monitoring of spatial resources, INSPIRE directive encouraged the use of controlled vocabularies through its metadata regulation. In particular, in the case of the Keyword metadata element, the regulation forces the use of one keyword, at least, from the general environmental multilingual thesaurus (GEMET) to describe the spatial data theme, as defined by the INSPIRE annexes. Moreover, the technical guidelines for the metadata implementing rules recommend a minimum of two keywords in addition to the mandatory keyword, and if possible, selected from controlled vocabularies such as GEMET, EUROVOC or AGROVOC.

Despite these recommendations, the truth is that current holdings of metadata records make little use of these vocabularies. For instance, we analyzed the use of GEMET in the metadata catalogue of Spain Spatial Data Infrastructure (IDEE). Although GEMET is supposed to be the main thesaurus for the classification of resouces in this context, only 25% of the metadata records in 2015 contained a keyword with an explicit reference to "GEMET – INSPIRE Spatial Data Themes", and just 0.5% of records included a GEMET concept that differed from the INSPIRE themes.

One possible reason to prevent cataloguers from the right use of thesauri could be the existence of specific issues in its quality. Therefore, we have analyzed the quality of GEMET properties and relations in English, French and Spanish. The analysis method is based a lexical and syntactic analysis of the concept labels, and a structural and semantic analysis of the properties and relations. To determine if the meaning of the BT/NT (broader term/narrower term) relations is correct, we aligned it to WordNet and DOLCE ontologies and compared the original relations with the ones provided in these models. Although in general the quality of the thesaurus is high, there are some issues to be noted: definitions of concepts are only provided in English; some Related Term (RT) relations are not informative as they connect concepts already in the same BT/NT hierarchy (6%); and an estimated 24% of BT/NT relations are not semantically correct.

We cannot guarantee that the use of controlled vocabularies in metadata will be improved with an increase in GEMET quality, but at least it seems sensible to make an effort in the following tasks: provide definitions in more languages, refine RT and BT/NT relations, and probably connect GEMET concepts to INSPIRE spatial data themes with extra-relations.

Categories

Topic Area:  [2.2] Technologies and tools required to deliver INSPIRE
Abstract Type:  Oral Presentation

Additional fields

Comments:   Metadata, controlled vocabularies, thesaurus quality, INSPIRE

START Conference Manager (V2.61.0 - Rev. 4195)