Controlled Vocabulary

Dave Remsen, Dag Endresen (GBIF) & Gregor Hagedorn, Andreas Plank (JKI)

KEY RESOURCES

Vocabulary Management Task Group (VoMaG) home page
Proposed charter for a TAG Vocabulary Management Task Group (VOMAG)
Download the draft charter

Knowledge Organisation Systems (KOS) range from simple vocabularies (lists of words), glossaries and dictionaries through classification schemes (including taxonomies), thesauri and ontologies. They are, in essence, the link between a group of letters that form a word with the meaning behind that word. For example some form of KOS is needed so that the computer knows that "banana" is a kind of "fruit".  Thus they facilitate organisation, management and retrieval of information and they are an essential foundation for data interoperability.

ViBRANT needs a flexible, user-friendly ontology management environment, enabling users to create, define, extend and share their own terms and concepts, providing options for discussions and annotation, while supporting re-use of terms from standard ontologies wherever possible.

ViBRANT will explore two approaches: extent the functionality of the existing GBIF vocabulary services; and will develop a wiki-based collaborative community interface.  The aim is to make the construction and application of vocabularies a simple process both for users and machines, as discussed on the associated page
"Data aggregation".

GBIF Vocabularies Service Species-ID, a wiki-based collaborative community interface

 

Working groups and Recommendations

Several GBIF working groups discussed the best practices for the use of vocabularies and ontologies.

One of these, the GBIF task group on Metadata Implementation recommended the use of controlled vocabularies where the terms are identified by persistent identifiers (unique reference number):

  •  “The identifier of the vocabulary should use existing identifiers from other registries where possible. If one does not exist, then GBIF should construct and publish the identifier.” (…) ”At present there is no well-defined and consistent means of referencing an identifier of a vocabulary or a vocabulary term. The proposed GBIF registry should provide an unambiguous citation method for each vocabulary and the terms they contain.”
    (…) “Some vocabularies will be global in use but some will be domain specific. To ensure compatibility across all metadata records, it is important that users use the appropriate and community agreed vocabularies”.
    (Jones et al., 2010, page 22-23).

     

Another, the GBIF task group on Persistent Identifiers recommended the following actions by GBIF:

  • “take a leadership role in encouraging the use of metadata vocabularies for information in the GBIF data portal and extending the role of the data portal by hosting resources related to the use of identifiers, such as the TDWG vocabularies”. (Cryer et al., 2010, page 14, recommendation 12).

     

The GBIF ‘Beginner’s Guide to Persistent Identifiers’ says:

  • “It is also very important to reuse, where appropriate, the vocabularies and schemas that other communities have developed, to aid interoperability and save reinventing the wheel.”

  • “Because biodiversity informatics is a fairly specialized area of expertise, it is likely that a large proportion of the vocabularies and ontologies required for this domain will need to be developed within this community” (Richards et al., 2011, page 20).

The GBIF task group for Knowledge Organisation Systems (KOS) made recommendations in a white paper on the use of vocabularies and ontologies for biodiversity informatics (Catapano et al., 2011).

DCMI 2011

Darwin Core Metadata Initiative - pre-conference on vocabulary management


GBIF represented the biodiversity informatics community at the Dublin Core annual conference for metadata and vocabulary management in The Hague, Netherlands. Best practices for maintaining a federated KOS with a common vocabulary of terms used by a decentralised network were discussed. The most important lesson learnt was that communities of domains other than biodiversity informatics were also just starting to work with many of the same issues we face. The best practices for maintaining a federated KOS are under development in the Dublin Core (DCMI) community and the early experiences from the biodiversity informatics community will provide important input to this process.


TDWG 2011 KOS Symposium

Taxonomic Database Working Group

Darwin Core - a glossary of terms

 GBIF organised a special symposium at the BIS (TDWG) 2011 conference. A series of presentations introduced recent activities related to KOS in the GBIF work programme, the current status and history of the TDWG vocabularies and the management of the Darwin Core set of terms.

SCARY DIAGRAM

GBIF KOS architecture: overview of the proposed architecture of the resources repository and the term vocabularies

The Audubon Core: a standard for multimedia biodiversity data

The increasing ease of creating media objects (still and moving images, sound, etc.) and distributing them in digital form through online channels is affecting the scientific workflow. Multimedia files become increasingly important as vouchers of scientific information. The Audubon Core, developed by the joint GBIF and TDWG multimedia taskgroup is a set of vocabularies designed to represent metadata for biodiversity multimedia resources and collections. The standardisation process is currently nearing completion. Since standardisation of this information is important for the purposes of ViBRANT, JKI and GBIF have invested resources in bringing the Audubon Core towards the present public review stage. The ViBRANT site "Species-ID" is hosting the vocabulary.

Bibliography


[1] Jones, M.B., N. Bertrand, J. Holetschek, V. Hutchison, B.C.-J. Ko, A., Suarez-Mayorga, M. Meaux, W. Ulate, D. Watts, T. Robertson, and E. O Tuama (2010). Report of the GBIF metadata implementation framework task group (MIFTG). Global Biodiversity Information Facility (GBIF), Copenhagen.

[2] Cryer, P., R. Hyam, C. Miller, N. Nicolson, E. O Tuama, R. Page, J. Rees, G. Riccardi, K. Richards, and R. White (2010). Adoption of persistent identifiers for biodiversity informatics: Recommendations of the GBIF LSID GUID task group, 6. November 2009. Global Biodiversity Information Facility (GBIF), Copenhagen.

[3] Richards, K., R. White, N. Nicolson, R. Pyle (2011). A beginner’s guide to persistent identifiers, version 1.0. Released on 9 February 2011. Global Biodiversity Information Facility (GBIF) Copenhagen.

[4] Catapano, T., D. Hobern, H. Lapp, R.A. Morris, N. Morrison, N. Noy, M. Schildhauer, and D. Thau (2011). Recommendations for the use of knowledge organization systems by GBIF. Released on 4 February 2011. Global Biodiversity Information Facility (GBIF), Copenhagen.

[5] Eamonn O Tuama, Dag Terje Filip Endresen, David Remsen (2011) Establishing a support infrastructure for Knowledge Organisation Systems (KOS) in biodiversity informatics.