Accessing the pollen counts

For any analysis using the EPD, it’s likely you will want to access the pollen counts, either to plot a diagram or so that they can be included in some larger, macro-ecological study. The goal of this post is to create an object for a single pollen record, containing counts, variable names and taxa group names, and sample depths for a site, Dallican Water, published by Bennett et al. (1997). The entity number for this site is 196.

The R object (what I am calling the count object) we will create then be used in subsequent analyses/ plotting.

The EPD is structured in such a way that the pollen counts are stored in the p_counts table (see dependencies here). In a previous post we already created an R version of the database, including this table and others, called myEPD. Therefore, we can access the p_vars table from this version of the database using the code below. Inspecting the structure in R we can see that the p_vars table is a thin version with 4 columns: Entity number in the first (the crucial code which many tables related to the pollen counts together); sample code in the second (relating to the sample code in the p_depth table); a variable code in the third (relating to the pollen type, whose code is stored in the p_vars table); and then the actual counts for each variable.

Below we load our R version of the EPD, extract the p_count table and inspect it. Then we extract only the rows that contain counds for Dallican Water (Entity Number = 196)

load('~/Documents/EPDr/epdWorkshop_dir/myEPD.RData')
p_counts <- myEPD[["p_counts"]]
head(p_counts)
##   e_ sample_ var_ count
## 1  1       1   32    15
## 2  1       1   66     1
## 3  1       1  137     1
## 4  1       1  141     7
## 5  1       1  146     2
## 6  1       1  185     3
# Access the Dallican Water (Bennett 1997) dataset using the correct entity number
eNum <- 196
counts.thin <- p_counts[p_counts$e_== eNum,]

One thing to note is that the data are not structured like a typical sample x species matrix. We can use the reshape2 package to convert the pvars table into the more common sample x species matrix (useful for, e.g. plotting pollen diagrams, conducting ordinations).

smSp <- reshape2::dcast(counts.thin, sample_ ~ var_, value.var = "count")

# convert NAs (zero counts) to numeric Zero
smSp[is.na(smSp)] <- 0
# get the sample numbers and then remove this column
sample_ <- smSp[, 1]
smSp <- smSp[,-1]

# rename the rows as the sample numbers
row.names(smSp) <- sample_

At the moment the variable names (e.g. pollen types) are also just a sample code. You have to cross reference these using the p_vars table to access them. We can also access the taxa group IDs for the different variables (e.g. Trees/ Shrubs, Herbs, Exotics, from the p_group table), and the sample depths (from the p_sample table). This is done below:

p_vars <- myEPD[["p_vars"]]
p_group <- myEPD[["p_group"]]
p_sample <- myEPD[["p_sample"]]

# Get the variable names
vars <- colnames(smSp)
spWant <- match(vars, p_vars$var_)
colnames(smSp) <- p_vars$varname[spWant]
taxa.names <- colnames(smSp)

# Get the groups as well
idWant <- match(vars, p_group$var)
taxa.groupid <- p_group$groupid[idWant] 
    
# Get the depths
p_sampleWant <- p_sample[p_sample$e_ == eNum,]
depths <- p_sampleWant[,c("sample_", "depthcm")]
row.names(depths) <- depths$sample_

Finally, we want store all of this as one object that can be used in subsequent analysis. Since the different R objects we have created are of different dimensions, it is useful to use a list. So we create an object called counts, with all the relevant R objects stored as individual items in it.

This object now stores all the relevant pollen count information, including entity number, counts, depths, and variable names and types.

# make the object
    counts <- list(core_number = eNum,
                taxa_names = taxa.names,
                pvarCode = vars,
                taxa_groupid = taxa.groupid,
                sample_ = sample_,
                counts = smSp,
                depths = depths
            )

If doing multiple analyses, we can easily turn all this into a function that enables us to get the pollen counts from any site with a known entity number. The following function getCountInfo does this. Note that I added an extra bit of code to return NA if no pollen counts are found for that site.

getCountInfo <- function(eNum, EPDfile = myEPD){
  # eNum = e_ number in EPD
    # EPDfile = an R object containing the EPD files
    require("reshape2")
    
    # define the tables that you need to use
    p_counts <- EPDfile[["p_counts"]]
    p_vars <- EPDfile[["p_vars"]]
    p_group <- EPDfile[["p_group"]]
    p_sample <- myEPD[["p_sample"]]

    # Get the counts you want
    counts.thin <- p_counts[p_counts$e_== eNum,]
    if (nrow(counts.thin) == 0) {
       counts <- list(core_number = eNum,
                taxa_names = NA,
                taxa_groupid = NA,
                sample_ = NA,
                counts = NA,
                depths = NA
            )
    return(counts)
    }

    # Needs to be reshaped into a sample by species matrix 
    smSp <- dcast(counts.thin, sample_ ~ var_, value.var = "count")
    smSp[is.na(smSp)] <- 0
    sample_ <- smSp[, 1]
    smSp <- smSp[,-1]
    row.names(smSp) <- sample_
    
    # Get the taxa names for the full, unadjusted dataset
    vars <- colnames(smSp)
    spWant <- match(vars, p_vars$var_)
    colnames(smSp) <- p_vars$varname[spWant]
    taxa.names <- colnames(smSp)
    
    # Get the groups as well
    idWant <- match(vars, p_group$var)
    taxa.groupid <- p_group$groupid[idWant] 
    
    # Get the depths
    p_sampleWant <- p_sample[p_sample$e_ == eNum,]
    depths <- p_sampleWant[,c("sample_", "depthcm")]
    row.names(depths) <- depths$sample_
    
    # make the object
    counts <- list(core_number = eNum,
                taxa_names = taxa.names,
                pvarCode = vars,
                taxa_groupid = taxa.groupid,
                sample_ = sample_,
                counts = smSp,
                depths = depths
            )
    return(counts)
}

This can be run using the following one line of code to extract the data for Dallican Water.

dallican <- getCountInfo(eNum = 196, EPDfile = myEPD)
## Loading required package: reshape2
str(dallican, max.level = 1)
## List of 7
##  $ core_number : num 196
##  $ taxa_names  : chr [1:85] "Achillea-type" "Alnus" "Artemisia" "Betula" ...
##  $ pvarCode    : chr [1:85] "8" "32" "66" "95" ...
##  $ taxa_groupid: chr [1:85] "HERB" "TRSH" "HERB" "TRSH" ...
##  $ sample_     : int [1:80] 1 2 3 4 5 6 7 8 9 10 ...
##  $ counts      :'data.frame':    80 obs. of  85 variables:
##  $ depths      :'data.frame':    80 obs. of  2 variables:

In the next post we will see how to align the published age models from Giesecke et al. (2014) with the sample depths in the pollen count object we just created here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s