Wednesday, June 21, 2017

2017 GBIF Ebbe Nielsen Challenge

For the third year GBIF is running its Ebbe Nielsen Challenge. Developers and data scientists have three months to create and submit tools capable of liberating species records from open data repositories for scientific discovery and reuse. Here some more details:

This year's Challenge will seek to leverage the growth of open data policies among scientific journals and research funders, which require researchers to make the data underlying their findings publicly available. Adoption of these policies represents an important first step toward increasing openness, transparency and reproducibility across all scientific domains, including biodiversity-related research.

To abide by these requirements, researchers often deposit datasets in public open-access repositories. Potential users are then able to find and access the data through repositories as well as data aggregators like OpenAIRE and DataONE. Many of these datasets are already structured in tables that contain the basic elements of biodiversity information needed to build species occurrence records: scientific names, dates, and geographic locations, among others.

However, the practices adopted by most repositories, funders and journals do not yet encourage the use of standardized formats. This approach significantly limits the interoperability and reuse of these datasets. As a result, the wider reuse of data implied if not stated by many open data policies falls short, even in cases where open licensing designations (like those provided through Creative Commons) seem to encourage it.

The challenge
The 2017 GBIF Ebbe Nielsen Challenge seeks submissions that repurpose these datasets and adapting them into the Darwin Core Archive format (DwC-A), the interoperable and reusable standard that powers the publication of almost 800 million species occurrence records from the nearly 1,000 worldwide institutions now active in the GBIF network.

The 2017 Ebbe Nielsen Challenge will task developers and data scientists to create web applications, scripts or other tools that automate the discovery and extraction of relevant biodiversity data from open data repositories. Such tools might generate datasets ready for publication on by:

  • Automating searches of open data available in public repositories
  • Effectively mining the information needed to generate checklists, species occurrence and sampling-event datasets (e.g. scientific names, date and location of occurrence et al.) from datasets in these repositories
  • Mapping datasets’ column headings and/or contents with standardized Darwin Core terms
  • Routinely converting the reformatted data into Darwin Core archive formats ready for publication through

No comments:

Post a Comment