Data Standards
Yde de Jong (UvA) & Walter Berendsohn (FU-BGBM)
KEY RESOURCES |
---|
Biodiversity information platforms: From standards to interoperability." {ZooKeys} 150 (2011): 71-87. http://dx.doi.org/10.3897/zookeys.150.2166 | "
Data standards, sense and stability: Scratchpads, the {ICZN} and {ZooBank}." {ZooKeys} 150 (2011): 167-176. http://dx.doi.org/10.3897/zookeys.150.2248 | "
Standardised cross-platform integration of taxonomic data.
The sharing of data is essential to collaboration and more broadly to advance our understanding of the natural world. In practice, working with multiple systems, databases and networks, one of the most serious bottlenecks is the need to integrate the data that each provides. We would sensibly want to analyse, visualise and publish the products of our labour.
Data standards provide a consistent representation of data and enable different sources to be combined by providing the rules for structuring information, so that data entered into a system can be read, sorted, indexed, retrieved, communicated between systems, and shared. In a very real sense data standards define the lingua franca that makes data sharing and scienctific advance possible.
Software is being developed to ensure that all data entered or managed in ViBRANT are compatible with, and available to other research and publishing infrastructures. Specifically we are targeting some current biodiversity information platforms including Scratchpads, CyberPlatform, EoL, PESI, GBIF and Species-ID.
Bringing it all together: some examples
PESI (Pan-European Species directories Infrastructure) integrates all-taxon registers in Europe into a single, authoritative checklist for plant and animal species in Europe. In ViBRANT, PESI will couple its networking activities with Scratchpad users to facilitate the production of regional checklists and taxonomic catalogues. An improved interoperability infrastructure is being built.
In ViBRANT the interoperability between Scratchpads and the Common Data Model (CDM) will be improved. The CDM is the domain model for the core components of the CyberPlatform and is primarily based on the BIS (TDWG - Taxonomic Database Working Group) ontology. The CDM describes commonly used data dealt with in the CyberPlatform, and therefore covers taxonomic names and concepts; literature references; authors; specimen (including types); structured descriptive data; and species related content of any kind (e.g. locality, observations, economic use or conservation status).
|
|
|
Some data of the CDM | The Cichorieae data portal makes full use of the CDM Library, a component of the CyberPlatform |
To enable the exchange of data between Scratchpads, CDM, Encyclopedia of Life (EoL) and the Global Biodiversity Information Facility (GBIF), ViBRANT will use the Darwin Core Archive (DwC-A) format developed by GBIF. DwC-A was designed for publishing biodiversity data and is essentially a set of text files plus a simple header file describing the component file organisation. Default data content types include organism names, species information, factual data, distribution, media and literature.
The Darwin Core Archive Assistant, a metafile
Future work
To further link ViBRANT with existing and newly-emerging infrastructures in biodiversity and ecosystem science the forthcoming activities are:
- further improve the functionality of the Darwin Core Archive module;
- implement a CDM datastore as a ViBRANT index so that users can perform cross-platform searches and that software systemes can have access to the ViBRANT universe;
- further development of software contributing to an improved data exchange between biodiversity platforms (Scratchpads, MediaWiki, PESI) by using the CDM based database;
- integrate descriptive features such as SDD (Structured Descriptive Data) standards across the ViBRANT environment.