For any analysis using the EPD, it’s likely you will want to access the pollen counts, either to plot a diagram or so that they can be included in some larger, macro-ecological study. The goal of this post is to create an object for a single pollen record, containing counts, variable names and taxa group names, and sample depths for a site, Dallican Water, published by Bennett et al. (1997). The entity number for this site is 196.
The R object (what I am calling the count
object) we will create then be used in subsequent analyses/ plotting.
The EPD is structured in such a way that the pollen counts are stored in the p_counts
table (see dependencies here). In a previous post we already created an R version of the database, including this table and others, called myEPD
. Therefore, we can access the p_vars
table from this version of the database using the code below. Inspecting the structure in R we can see that the p_vars
table is a thin version with 4 columns: Entity number in the first (the crucial code which many tables related to the pollen counts together); sample code in the second (relating to the sample code in the p_depth
table); a variable code in the third (relating to the pollen type, whose code is stored in the p_vars
table); and then the actual counts for each variable.
Below we load our R version of the EPD, extract the p_count
table and inspect it. Then we extract only the rows that contain counds for Dallican Water (Entity Number = 196)
load('~/Documents/EPDr/epdWorkshop_dir/myEPD.RData')
p_counts <- myEPD[["p_counts"]]
head(p_counts)
## e_ sample_ var_ count
## 1 1 1 32 15
## 2 1 1 66 1
## 3 1 1 137 1
## 4 1 1 141 7
## 5 1 1 146 2
## 6 1 1 185 3
# Access the Dallican Water (Bennett 1997) dataset using the correct entity number
eNum <- 196
counts.thin <- p_counts[p_counts$e_== eNum,]
One thing to note is that the data are not structured like a typical sample x species matrix. We can use the reshape2
package to convert the pvars table into the more common sample x species matrix (useful for, e.g. plotting pollen diagrams, conducting ordinations).
smSp <- reshape2::dcast(counts.thin, sample_ ~ var_, value.var = "count")
# convert NAs (zero counts) to numeric Zero
smSp[is.na(smSp)] <- 0
# get the sample numbers and then remove this column
sample_ <- smSp[, 1]
smSp <- smSp[,-1]
# rename the rows as the sample numbers
row.names(smSp) <- sample_
At the moment the variable names (e.g. pollen types) are also just a sample code. You have to cross reference these using the p_vars
table to access them. We can also access the taxa group IDs for the different variables (e.g. Trees/ Shrubs, Herbs, Exotics, from the p_group
table), and the sample depths (from the p_sample
table). This is done below:
p_vars <- myEPD[["p_vars"]]
p_group <- myEPD[["p_group"]]
p_sample <- myEPD[["p_sample"]]
# Get the variable names
vars <- colnames(smSp)
spWant <- match(vars, p_vars$var_)
colnames(smSp) <- p_vars$varname[spWant]
taxa.names <- colnames(smSp)
# Get the groups as well
idWant <- match(vars, p_group$var)
taxa.groupid <- p_group$groupid[idWant]
# Get the depths
p_sampleWant <- p_sample[p_sample$e_ == eNum,]
depths <- p_sampleWant[,c("sample_", "depthcm")]
row.names(depths) <- depths$sample_
Finally, we want store all of this as one object that can be used in subsequent analysis. Since the different R objects we have created are of different dimensions, it is useful to use a list. So we create an object called counts
, with all the relevant R objects stored as individual items in it.
This object now stores all the relevant pollen count information, including entity number, counts, depths, and variable names and types.
# make the object
counts <- list(core_number = eNum,
taxa_names = taxa.names,
pvarCode = vars,
taxa_groupid = taxa.groupid,
sample_ = sample_,
counts = smSp,
depths = depths
)
If doing multiple analyses, we can easily turn all this into a function that enables us to get the pollen counts from any site with a known entity number. The following function getCountInfo
does this. Note that I added an extra bit of code to return NA
if no pollen counts are found for that site.
getCountInfo <- function(eNum, EPDfile = myEPD){
# eNum = e_ number in EPD
# EPDfile = an R object containing the EPD files
require("reshape2")
# define the tables that you need to use
p_counts <- EPDfile[["p_counts"]]
p_vars <- EPDfile[["p_vars"]]
p_group <- EPDfile[["p_group"]]
p_sample <- myEPD[["p_sample"]]
# Get the counts you want
counts.thin <- p_counts[p_counts$e_== eNum,]
if (nrow(counts.thin) == 0) {
counts <- list(core_number = eNum,
taxa_names = NA,
taxa_groupid = NA,
sample_ = NA,
counts = NA,
depths = NA
)
return(counts)
}
# Needs to be reshaped into a sample by species matrix
smSp <- dcast(counts.thin, sample_ ~ var_, value.var = "count")
smSp[is.na(smSp)] <- 0
sample_ <- smSp[, 1]
smSp <- smSp[,-1]
row.names(smSp) <- sample_
# Get the taxa names for the full, unadjusted dataset
vars <- colnames(smSp)
spWant <- match(vars, p_vars$var_)
colnames(smSp) <- p_vars$varname[spWant]
taxa.names <- colnames(smSp)
# Get the groups as well
idWant <- match(vars, p_group$var)
taxa.groupid <- p_group$groupid[idWant]
# Get the depths
p_sampleWant <- p_sample[p_sample$e_ == eNum,]
depths <- p_sampleWant[,c("sample_", "depthcm")]
row.names(depths) <- depths$sample_
# make the object
counts <- list(core_number = eNum,
taxa_names = taxa.names,
pvarCode = vars,
taxa_groupid = taxa.groupid,
sample_ = sample_,
counts = smSp,
depths = depths
)
return(counts)
}
This can be run using the following one line of code to extract the data for Dallican Water.
dallican <- getCountInfo(eNum = 196, EPDfile = myEPD)
## Loading required package: reshape2
str(dallican, max.level = 1)
## List of 7
## $ core_number : num 196
## $ taxa_names : chr [1:85] "Achillea-type" "Alnus" "Artemisia" "Betula" ...
## $ pvarCode : chr [1:85] "8" "32" "66" "95" ...
## $ taxa_groupid: chr [1:85] "HERB" "TRSH" "HERB" "TRSH" ...
## $ sample_ : int [1:80] 1 2 3 4 5 6 7 8 9 10 ...
## $ counts :'data.frame': 80 obs. of 85 variables:
## $ depths :'data.frame': 80 obs. of 2 variables:
In the next post we will see how to align the published age models from Giesecke et al. (2014) with the sample depths in the pollen count object we just created here.