Data Standards

Yde de Jong (UvA) & Walter Berendsohn (FU-BGBM)

KEY RESOURCES

Berendsohn, Walter, Anton Güntsch, Niels Hoffmann, Andreas Kohlbecker, Katja Luther, and Andreas Müller. "Biodiversity information platforms: From standards to interoperability." {ZooKeys} 150 (2011): 71-87. http://dx.doi.org/10.3897/zookeys.150.2166
Baker, Edward, and Ellinor Michel. "Data standards, sense and stability: Scratchpads, the {ICZN} and {ZooBank}." {ZooKeys} 150 (2011): 167-176. http://dx.doi.org/10.3897/zookeys.150.2248

Standardised cross-platform integration of taxonomic data.

The sharing of data is essential to collaboration and more broadly to advance our understanding of the natural world. In practice, working with multiple systems, databases and networks, one of the most serious bottlenecks is the need to integrate the data that each provides. We would sensibly want to analyse, visualise and publish the products of our labour.

Data standards provide a consistent representation of data and enable different sources to be combined by providing the rules for structuring information, so that data entered into a system can be read, sorted, indexed, retrieved, communicated between systems, and shared. In a very real sense data standards define the lingua franca that makes data sharing and scienctific advance possible.

Software is being developed to ensure that all data entered or managed in ViBRANT are compatible with, and available to other research and publishing infrastructures. Specifically we are targeting some current biodiversity information platforms including Scratchpads, CyberPlatform, EoL, PESI, GBIF and Species-ID.

Bringing it all together: some examples

PESI (Pan-European Species directories Infrastructure) integrates all-taxon registers in Europe into a single, authoritative checklist for plant and animal species in Europe. In ViBRANT, PESI will couple its networking activities with Scratchpad users to facilitate the production of regional checklists and taxonomic catalogues. An improved interoperability infrastructure is being built.

In ViBRANT the interoperability between Scratchpads and the Common Data Model (CDM) will be improved. The CDM is the domain model for the core components of the CyberPlatform and is primarily based on the BIS (TDWG - Taxonomic Database Working Group) ontology. The CDM describes commonly used data dealt with in the CyberPlatform, and therefore covers taxonomic names and concepts; literature references; authors; specimen (including types); structured descriptive data; and species related content of any kind (e.g. locality, observations, economic use or conservation status).

                                                                               Taxonomic workflow in the CyberPlatform, Kohlbecker et al., 2009he CyberPlatform      
                         Chicorieae Portal 
Some data of the CDM   The Cichorieae data portal makes full use of the CDM Library, a component of the CyberPlatform

 

To enable the exchange of data between Scratchpads, CDM, Encyclopedia of Life (EoL) and the Global Biodiversity Information Facility (GBIF), ViBRANT will use the Darwin Core Archive (DwC-A) format developed by GBIF. DwC-A was designed for publishing biodiversity data and is essentially a set of text files plus a simple header file describing the component file organisation. Default data content types include organism names, species information, factual data, distribution, media and literature.

DwC-A Assistant

The Darwin Core Archive Assistant, a metafile


Future work

To further link ViBRANT with existing and newly-emerging infrastructures in biodiversity and ecosystem science the forthcoming activities are:

  • further improve the functionality of the Darwin Core Archive module;
  • implement a CDM datastore as a ViBRANT index so that users can perform cross-platform searches and that software systemes can have access to the ViBRANT universe;
  • further development of software contributing to an improved data exchange between biodiversity platforms (Scratchpads, MediaWiki, PESI) by using the CDM based database;
  • integrate descriptive features such as SDD (Structured Descriptive Data) standards across the ViBRANT environment.