Accessing the EPD from R

Until the EPD migrates to Neotoma, the most up-to-date version of the EPD is stored on the EPD website here. When this is complete then the Neotoma R package can be used to access much of the information. Until then, it is best to use the version stored on the EPD website.

At this web address, there are three different formats: Paradox, Microsoft Access and Postgres. I don’t know much about Paradox, and had trouble accessing the Microsoft Access version from my Mac (I heard other people have problems with Windows machines as well), so the one solution is to use the Postgres version. Here you can set up the EPD as a Postgres server on your own computer. Diego Nieto Lugilde, the writer of the EPD_R package, has some advice on how to get this work for Windows machines here. I have some notes on how to do the same thing for a Mac.

This can still be a challenge for some people, however, who might not have the correct access rights to their computer. In repsonse to this, Richard Telford published a slightly different solution on his blog which involves setting up an SQLite database. It involves downloading the MS Access file and converting it to an SQLlite file using mdbtools. You will have to instll mdbtools on the command line on your computer first, then follow the code he used on his blog here.

Once this is complete, it is possible to access the database as follows. In the code below, for example, I am connecting to the version of the database running as a server on my computer, then accessing the relevant tables required. Incidentally, this connection will also be used if you are using Diego’s EPD R package.

driver <- "PostgreSQL"
database <- "epd"
host = "localhost"
user = "epd"
password = "epdpassword"

con <- RPostgreSQL::dbConnect(driver, dbname = database, 
        host = host, user = user, password = password)
dbListTables(con)

# lists all the tables in the EPD
DBI::dbListTables(con)

# Obtains a number of tables I have found useful as R objects
pSample <- DBI::dbReadTable(con, "p_sample")
pCounts <- DBI::dbReadTable(con, "p_counts")
pGroup <- DBI::dbReadTable(con, "p_group")
siteLoc <- DBI::dbReadTable(con, "siteloc")
entity <- DBI::dbReadTable(con, "entity")
p_entity <- DBI::dbReadTable(con, "p_entity")
groups <- DBI::dbReadTable(con, "groups")
workers <-DBI::dbReadTable(con, "workers") 
publ <- DBI::dbReadTable(con, "publ")
publent <- DBI::dbReadTable(con, "publent")
pVars <- DBI::dbReadTable(con, "p_vars")
siteInfo <- DBI::dbReadTable(con, "siteinfo")
siteDesc <- DBI::dbReadTable(con, "sitedesc")

Since some of these DBI connections are quite slow, I have found it most efficient to store these tables as my own .RData file of the database called myEPD. I save this file on my computer here and this is what will be used in all subsequent posts.

# Combines these tables into the myEPD object
myEPD <- list(p_sample= pSample, 
                p_vars = pVars, 
                p_counts = pCounts,
                p_group = pGroup,
                siteloc = siteLoc,
                entity = entity,
                p_entity = p_entity,
                groups = groups,
                workers = workers,
                publ = publ,
                publent = publent,
                siteinfo = siteInfo,
                sitedesc = siteDesc
                )

# Save the object on your directory somewhere
save(myEPD,file= "myEPD.RData")

For ease, I’m currently trying to arrange to put this version online so that anyone can bypass these steps and go straight to accessing the database. There will be a number of blog posts following this showing how to use different parts of the EPD using R. If you are interested in contributing your own post, or would like advice on how to perform a particular function, then send an email to here and one member of the team will try to solve the issue (given available time constraints).

In the meantime, if you would like the RData version of the EPD now then post a message below.

Share this:

Leave a comment Cancel reply