News from the workshop (3-7/11/2017) in Göttingen, Germany
Background/History: The EPD was designed in the time before email and WWW as a Paradox database that would have a master copy on one particular computer with one person able to make changes to the database and distribute copies of it. The old system uses DOS programs to upload sites and the Paradox format itself is outdated. Discussions on how to continue with the database system started after the open meeting in 2007. At that time, our North American colleagues started to develop a database structure that could hold not only pollen data but also elements from other palaeoecological investigations (e.g. diatoms, faunal remains etc.). This platform, Neotoma, was developed as a database that would allow several people to upload and change datasets, which has many advantages, including better development of the Database. Therefore, the active EPD community with Richard Bradshaw as Chairperson made the decision to become a constituent database of Neotoma. This decision was not challenged during the 2016 open EPD meeting, where the bylaws were changed to be compatible with Neotoma. Although the EPD is formally a constituent database of Neotoma for some time now, the living database remains a Paradox version on a computer in southern France.
The process: The way to migrate the data from the EPD to Neotoma is to upload copies of individual datasets using a steward version of the Tilia program. One major obstacle for the data transfer was a long list of taxa names that were only used in Europe. Eric Grimm has now included the European taxa names and thus facilitated the migration of the database. Eric Grimm has put a copy of the EPD onto a server from which we download the data to Tilia. In the process, the MADCAP models are associated to the datasets and uploaded to Neotoma with the datasets. The process gives a good opportunity to check for inconsistencies in the data, however, the main importance now is the transfer and minor problems that cannot be fixed immediately are flagged to be solved later. We divided the pollen diagrams by country and the full list if initial assignments can be downloaded here. The minimum checking of the data before uploading includes checking the position of the site and a general scroll through the counts looking for obvious problems. The relevant citations are parsed out if the data stewards has the time. The decision that the core top is modern is made if the uncertainty that this is the case is less than 250 years. Workshop participants experimented with the down->upload process of sites and found a few problems, some of which could be solved on site while others will be addressed by Eric.
Decisions: A large number of EPD sites have made their way to Neotoma via the Global Pollen Database (GPD) prior to the year 2000. During the last 10 years datasets in the EPD were checked and errors amended (e.g. MADCAP activities), while the GPD datasets remained unchanged. We discussed pros and cons of checking for changes between the two datasets and retaining the data already in Neotoma or overwriting all EPD data with what is currently in the “living” EPD. The decision was made to take the latter choice and overwrite existing EPD data in Neotoma. The workload was divided between the participants reflecting regional knowledge. Each one has to check and move between 200 and 500 sites.
We also discussed age models for the new and old sites in the EPD and decided to continue with the “classical age models” and the star classification of uncertainties (see here), which Eric will implement into Neotoma. The reason for this choice was motivated by the fact that database applications will mainly need a single sample age. Moreover, we aim to analyse pollen accumulation rates for data sets in the EPD where that is possible and these are more robust when based on general trends for sedimentation rates rather than including sudden changes in sedimentation at a radiocarbon date. We will build Bayesian models for all data in the EPD in a second step benefiting from a review of control points and a general vetting of the age depth relationship. Simon Brewer volunteered to orchestrate the construction of new age models following the same procedure as the previous exercise: review control points -> two age models are constructed (linear and smooth spline) -> models are evaluated against the pollen diagram and limits for extrapolation set.
Oliver Heiri, representing the Alpine Pollen Database (ALPADABA), discussed with Thomas Giesecke possible future scenarios of that database. Currently most of the publicly available ALPADABA sites are already in the EPD, while the living version of that database is maintained as Paradox database in Bern. Willy Tinner decided with Oliver that there would be advantages to maintain the database in the future in Neotoma, however, since the database is supported by local funds it is necessary to maintain it as a separate entity. The easiest solution to this is for ALPADABA to become a constituent database of Neotoma, while sharing stewardship with the EPD (e.g. sites in ALPADABA can be curated by EPD stewards and vice versa).
Near Future: We hope that we will be able to complete the process of data migration by spring 2018. During this time new sites will not be uploaded to the EPD or Neotoma but made available via the internet with a link on the EPD page. If all goes to plan the downloadable version 2017-10-31 of the EPD will be the last Paradox version of the EPD and new versions will be maintained in Neotoma. However, we aim to extract an EPD only version from Neotoma at least 4 times a year and continue to exchange individual datasets with PANGAEA. The EPD will maintain its identity documented by the web page, the blog, Twitter and other community activity.
Participants: Graciela Gil Romera, Walter Finsinger, Martin Theuerkauf, Steffen Wolters, Petr Kuneš, Simon C. Brewer, Michelle Leydet, Eric Grimm, Oliver Heiri, Thomas Giesecke