EPD workshop, Göttingen (Germany) 30-31 March 2023

Boosting Neotoma through the European Pollen Database to gain a better understanding of long-term whole ecosystem change

Aims

The main aim of this INQUA-funded workshop is to start the process of creating a multi-proxy European representation in the Neotoma database (www.neotomadb.org) that will harness the power of Neotoma as a tool facilitating multi proxy data analyses, populate Neotoma further and engage in collaborative research.

In view of this long-term objective, we will introduce European palaeoecologists to the Neotoma database, develop ideas for cross-continental multi-proxy data analysis, and develop a COST-application to pursue this community effort. We would be pleased if we could stimulate populating the Neotoma database with already existing palaeoecological data sets for Europe. To meet these aims, attendants will be introduced to the Neotoma data-extraction and data-handling toolboxes, and in addition will receive training to join the active Neotoma Data Stewards community.

EPD Open Science Meeting and workshops. Prague, 1st-3rd June, 2022.

Latest news, accommodation details and practicals.

We are looking forward to meet you in Prague! Before our meeting, you may find this document useful. In it you will find details about the venue, transport, accommodation, lunches, posters and more! we would like to draw you attention to some final details:

1. For those of you with your accommodation supported by PAGES, you can see your accommodation assignation here. We however advice you to come first to the reception desk on Wednesday 1st at 1pm so we give you all the needed information to get there and will update the room numbers if necessary. In addition, at this welcoming desk we will provide you with a public transport ticket, valid for three days, for you to connect between the conference venue and the accommodation area (ca. 45′ by transport, all details are in the previous section document).

2. We have finalized now the workshop assignation, see here. The Chronology and Tilia workshops will run only once (Thursday and Friday respectively). You will be contacted by the workshop organizer in case anything is required before the conference. Please note that all workshops require you having your own laptop and for most you need to have R and R-Studio installed. If you are interested in following the Tilia workshop note that this is a windows program and can only be installed on a Windows PC or Windows emulator.

3. If you are presenting a poster, remember to prepare one single slide summarizing it and be ready to give a 1minute flash talk of it. You will have to hand out this slide at the welcoming desk on Wednesday the 1st from 1 to 2pm.

4. During the poster session we invite you to bring sweets and snacks or drinks typical from your country of origin to share with the rest of us.

5. Those of you attending the BBQ on Friday, be ready to pay at the welcoming desk (either in CZK or EUR) from 5 to 10€ to cover the food costs – there is not a fixed amount and we leave it to your own appetite 🙂 Drinks can be bought on location.

6. Here you have some additional information regarding where to eat, nightlife and public transport over night.

What is this meeting about?

We are excited to invite you to an in person Open Science Meeting in Prague between the 1st and 3rd of June 2022. With this meeting we aim to maintain and foster the community of palaeoecologists using the EPD and Neotoma for data storage and analysis. Therefore, we like to bring together palynologists and researchers using different proxies that may be archived in Neotoma (macrofossils, sedaDNA, charcoal, geochemistry, isotopes).

We are preparing exciting three days filled with interesting keynotes, workshops and discussions. Attendants will have the opportunity to showcase their research during a poster session with a short plenary oral highlight of the main finds. We have some social events such as an open-air barbecue, and an art and science tour. The venue will be the Faculty of Science of the beautiful Charles University in Prague.

Please note that we will only have a live meeting if the Covid situation allows us to meet in person in Prague and will cancel otherwise. Please organize your travel arrangements accordingly.

Registration is now closed, deadline for registration was 1st of May 2022. Should you have any questions, please write an email to:
michelle.leydet@univ-amu.fr

We strongly recommend participants to register but attendance will be possible with no previous registration. In the later case, joining a workshop will only be possible if there is still space and booking accommodation through the EPD organizing committee will not longer be possible.

1. Program

To explore the most updated program, please click here .

2. Confirmed speakers

Jack Williams “Why Neotoma? Supporting Science at Local to Global Scales”
Danielle Schreve “Move, adapt or die: Late Pleistocene and early Holocene vertebrate datasets from Britain.”
Jan Kolář: “Plants or humans of the past? Do we really need to choose what to study?”
Elizabeth Dietze: “The future of the Global Paleofire Database”.
Kasia Marcisz: “Testate amoebae in multi-proxy palaeoecological studies”
Cindy De Jonge: “Biomarker lipids as temperature proxies: what are global calibrations hiding?”
Inger Greve Alsos “High taxonomic resolution of sedimentary ancient DNA allows detailed reconstruction of postglacial species arrival and ecosystem build-up”
Leeli Amon: “The last deglaciation and local-scale vegetation dynamics”
Thomas Giesecke (you can also check here) “Introduction and historical perspectives of the EPD”
Ulrike Herzschuh “Sedimentary ancient DNA – from population to ecosystem-level reconstructions”
Patrik Mráz: “Herbarium collections as a tool for tracking global change“
Heikki Seppä: “Identifying problems using continental-scale modern pollen databases for climate reconstructions”

3. Workshops

(Each participant can attend two workshops)

Chronology-building using the EPD by Petr Kuneš and Graciela Gil-Romera.
Tilia: individual sites and data upload by Michelle Leydet and Graciela Gil-Romera.
Quantitative land-cover reconstructions by Martin Theuerkauf and Vojtěch Abraham.
Neotoma R by Socorro Domínguez.
Non Pollen Palynomorphs by Lyudmila Shumilovskikh.
CharAnalysis for R by Walter Finsinger.
An introduction to quantitative climate reconstructions in R by Basil Davies.

An obituary to Eric Grimm by the EPD community

On November 15 we lost a good friend, a brilliant scientist and pioneer in striving for open shared scientific data. Eric Grimm’s passing leaves a large empty space in our hearts.

Eric is known worldwide to the palynological community as the maker of the “Tilia” program to manage pollen data and produce pollen diagrams. He was a key driver behind the creation of pollen databases; he helped establish our European Pollen Database and supported our work over the years. His knowledge of different aspects of palaeoecology, plant taxonomy and computer programming was combined in his contribution and lasting legacy, the palaeoecological database Neotoma.

We also remember Eric for his passionate discussions of new ideas and findings. He supported palynologists worldwide in countless workshops, teaching them how to best create pollen diagrams, understand their data and work with pollen databases. He showed patience and enthusiasm for individual problems and was indefatigable in finding solutions.

Beyond this, Eric enriched many workshop evenings with stories of fieldwork and science history based on his immense and irreplaceable experience. He will be greatly missed by our community as a leader, a scientist and a friend.

Accessing the pollen counts

For any analysis using the EPD, it’s likely you will want to access the pollen counts, either to plot a diagram or so that they can be included in some larger, macro-ecological study. The goal of this post is to create an object for a single pollen record, containing counts, variable names and taxa group names, and sample depths for a site, Dallican Water, published by Bennett et al. (1997). The entity number for this site is 196.

The R object (what I am calling the count object) we will create then be used in subsequent analyses/ plotting.

The EPD is structured in such a way that the pollen counts are stored in the p_counts table (see dependencies here). In a previous post we already created an R version of the database, including this table and others, called myEPD. Therefore, we can access the p_vars table from this version of the database using the code below. Inspecting the structure in R we can see that the p_vars table is a thin version with 4 columns: Entity number in the first (the crucial code which many tables related to the pollen counts together); sample code in the second (relating to the sample code in the p_depth table); a variable code in the third (relating to the pollen type, whose code is stored in the p_vars table); and then the actual counts for each variable.

Below we load our R version of the EPD, extract the p_count table and inspect it. Then we extract only the rows that contain counds for Dallican Water (Entity Number = 196)

load('~/Documents/EPDr/epdWorkshop_dir/myEPD.RData')
p_counts <- myEPD[["p_counts"]]
head(p_counts)

##   e_ sample_ var_ count
## 1  1       1   32    15
## 2  1       1   66     1
## 3  1       1  137     1
## 4  1       1  141     7
## 5  1       1  146     2
## 6  1       1  185     3

# Access the Dallican Water (Bennett 1997) dataset using the correct entity number
eNum <- 196
counts.thin <- p_counts[p_counts$e_== eNum,]

One thing to note is that the data are not structured like a typical sample x species matrix. We can use the reshape2 package to convert the pvars table into the more common sample x species matrix (useful for, e.g. plotting pollen diagrams, conducting ordinations).

smSp <- reshape2::dcast(counts.thin, sample_ ~ var_, value.var = "count")

# convert NAs (zero counts) to numeric Zero
smSp[is.na(smSp)] <- 0
# get the sample numbers and then remove this column
sample_ <- smSp[, 1]
smSp <- smSp[,-1]

# rename the rows as the sample numbers
row.names(smSp) <- sample_

At the moment the variable names (e.g. pollen types) are also just a sample code. You have to cross reference these using the p_vars table to access them. We can also access the taxa group IDs for the different variables (e.g. Trees/ Shrubs, Herbs, Exotics, from the p_group table), and the sample depths (from the p_sample table). This is done below:

p_vars <- myEPD[["p_vars"]]
p_group <- myEPD[["p_group"]]
p_sample <- myEPD[["p_sample"]]

# Get the variable names
vars <- colnames(smSp)
spWant <- match(vars, p_vars$var_)
colnames(smSp) <- p_vars$varname[spWant]
taxa.names <- colnames(smSp)

# Get the groups as well
idWant <- match(vars, p_group$var)
taxa.groupid <- p_group$groupid[idWant] 
    
# Get the depths
p_sampleWant <- p_sample[p_sample$e_ == eNum,]
depths <- p_sampleWant[,c("sample_", "depthcm")]
row.names(depths) <- depths$sample_

Finally, we want store all of this as one object that can be used in subsequent analysis. Since the different R objects we have created are of different dimensions, it is useful to use a list. So we create an object called counts, with all the relevant R objects stored as individual items in it.

This object now stores all the relevant pollen count information, including entity number, counts, depths, and variable names and types.

# make the object
    counts <- list(core_number = eNum,
                taxa_names = taxa.names,
                pvarCode = vars,
                taxa_groupid = taxa.groupid,
                sample_ = sample_,
                counts = smSp,
                depths = depths
            )

If doing multiple analyses, we can easily turn all this into a function that enables us to get the pollen counts from any site with a known entity number. The following function getCountInfo does this. Note that I added an extra bit of code to return NA if no pollen counts are found for that site.

getCountInfo <- function(eNum, EPDfile = myEPD){
  # eNum = e_ number in EPD
    # EPDfile = an R object containing the EPD files
    require("reshape2")
    
    # define the tables that you need to use
    p_counts <- EPDfile[["p_counts"]]
    p_vars <- EPDfile[["p_vars"]]
    p_group <- EPDfile[["p_group"]]
    p_sample <- myEPD[["p_sample"]]

    # Get the counts you want
    counts.thin <- p_counts[p_counts$e_== eNum,]
    if (nrow(counts.thin) == 0) {
       counts <- list(core_number = eNum,
                taxa_names = NA,
                taxa_groupid = NA,
                sample_ = NA,
                counts = NA,
                depths = NA
            )
    return(counts)
    }

    # Needs to be reshaped into a sample by species matrix 
    smSp <- dcast(counts.thin, sample_ ~ var_, value.var = "count")
    smSp[is.na(smSp)] <- 0
    sample_ <- smSp[, 1]
    smSp <- smSp[,-1]
    row.names(smSp) <- sample_
    
    # Get the taxa names for the full, unadjusted dataset
    vars <- colnames(smSp)
    spWant <- match(vars, p_vars$var_)
    colnames(smSp) <- p_vars$varname[spWant]
    taxa.names <- colnames(smSp)
    
    # Get the groups as well
    idWant <- match(vars, p_group$var)
    taxa.groupid <- p_group$groupid[idWant] 
    
    # Get the depths
    p_sampleWant <- p_sample[p_sample$e_ == eNum,]
    depths <- p_sampleWant[,c("sample_", "depthcm")]
    row.names(depths) <- depths$sample_
    
    # make the object
    counts <- list(core_number = eNum,
                taxa_names = taxa.names,
                pvarCode = vars,
                taxa_groupid = taxa.groupid,
                sample_ = sample_,
                counts = smSp,
                depths = depths
            )
    return(counts)
}

This can be run using the following one line of code to extract the data for Dallican Water.

dallican <- getCountInfo(eNum = 196, EPDfile = myEPD)

## Loading required package: reshape2

str(dallican, max.level = 1)

## List of 7
##  $ core_number : num 196
##  $ taxa_names  : chr [1:85] "Achillea-type" "Alnus" "Artemisia" "Betula" ...
##  $ pvarCode    : chr [1:85] "8" "32" "66" "95" ...
##  $ taxa_groupid: chr [1:85] "HERB" "TRSH" "HERB" "TRSH" ...
##  $ sample_     : int [1:80] 1 2 3 4 5 6 7 8 9 10 ...
##  $ counts      :'data.frame':    80 obs. of  85 variables:
##  $ depths      :'data.frame':    80 obs. of  2 variables:

In the next post we will see how to align the published age models from Giesecke et al. (2014) with the sample depths in the pollen count object we just created here.

Accessing the EPD from R

Until the EPD migrates to Neotoma, the most up-to-date version of the EPD is stored on the EPD website here. When this is complete then the Neotoma R package can be used to access much of the information. Until then, it is best to use the version stored on the EPD website.

At this web address, there are three different formats: Paradox, Microsoft Access and Postgres. I don’t know much about Paradox, and had trouble accessing the Microsoft Access version from my Mac (I heard other people have problems with Windows machines as well), so the one solution is to use the Postgres version. Here you can set up the EPD as a Postgres server on your own computer. Diego Nieto Lugilde, the writer of the EPD_R package, has some advice on how to get this work for Windows machines here. I have some notes on how to do the same thing for a Mac.

This can still be a challenge for some people, however, who might not have the correct access rights to their computer. In repsonse to this, Richard Telford published a slightly different solution on his blog which involves setting up an SQLite database. It involves downloading the MS Access file and converting it to an SQLlite file using mdbtools. You will have to instll mdbtools on the command line on your computer first, then follow the code he used on his blog here.

Once this is complete, it is possible to access the database as follows. In the code below, for example, I am connecting to the version of the database running as a server on my computer, then accessing the relevant tables required. Incidentally, this connection will also be used if you are using Diego’s EPD R package.

driver <- "PostgreSQL"
database <- "epd"
host = "localhost"
user = "epd"
password = "epdpassword"

con <- RPostgreSQL::dbConnect(driver, dbname = database, 
        host = host, user = user, password = password)
dbListTables(con)

# lists all the tables in the EPD
DBI::dbListTables(con)

# Obtains a number of tables I have found useful as R objects
pSample <- DBI::dbReadTable(con, "p_sample")
pCounts <- DBI::dbReadTable(con, "p_counts")
pGroup <- DBI::dbReadTable(con, "p_group")
siteLoc <- DBI::dbReadTable(con, "siteloc")
entity <- DBI::dbReadTable(con, "entity")
p_entity <- DBI::dbReadTable(con, "p_entity")
groups <- DBI::dbReadTable(con, "groups")
workers <-DBI::dbReadTable(con, "workers") 
publ <- DBI::dbReadTable(con, "publ")
publent <- DBI::dbReadTable(con, "publent")
pVars <- DBI::dbReadTable(con, "p_vars")
siteInfo <- DBI::dbReadTable(con, "siteinfo")
siteDesc <- DBI::dbReadTable(con, "sitedesc")

Since some of these DBI connections are quite slow, I have found it most efficient to store these tables as my own .RData file of the database called myEPD. I save this file on my computer here and this is what will be used in all subsequent posts.

# Combines these tables into the myEPD object
myEPD <- list(p_sample= pSample, 
                p_vars = pVars, 
                p_counts = pCounts,
                p_group = pGroup,
                siteloc = siteLoc,
                entity = entity,
                p_entity = p_entity,
                groups = groups,
                workers = workers,
                publ = publ,
                publent = publent,
                siteinfo = siteInfo,
                sitedesc = siteDesc
                )

# Save the object on your directory somewhere
save(myEPD,file= "myEPD.RData")

For ease, I’m currently trying to arrange to put this version online so that anyone can bypass these steps and go straight to accessing the database. There will be a number of blog posts following this showing how to use different parts of the EPD using R. If you are interested in contributing your own post, or would like advice on how to perform a particular function, then send an email to here and one member of the team will try to solve the issue (given available time constraints).

In the meantime, if you would like the RData version of the EPD now then post a message below.

Porting EPD into Neotoma, first steps…

News from the workshop (3-7/11/2017) in Göttingen, Germany

Background/History: The EPD was designed in the time before email and WWW as a Paradox database that would have a master copy on one particular computer with one person able to make changes to the database and distribute copies of it. The old system uses DOS programs to upload sites and the Paradox format itself is outdated. Discussions on how to continue with the database system started after the open meeting in 2007. Continue reading “Porting EPD into Neotoma, first steps…”

Workshop & training in Aix-en-Provence

EPD Workshop June 2016, Aix-en-Provence, France

Past global change and Mediterranean biodiversity

A Symposium in memory of Armand Pons (25/01/2013, Marseille, France)

Armand Pons, born in 1931, died this year in January 2012. This meeting is organised to celebrate a major pioneer of modern Palynology and Palaeoecology in France and to make a point on the state of the art in the fields he started to explore 50 years ago. Find and download the program at symposium_armand_pons_2013.pdf

Continue reading “Past global change and Mediterranean biodiversity”