Fast Access to Large Database in R

R is a useful programming language for statistics. Sometimes, we need to handle large databases. Unfortunately, R is very slow to navigate some special data fields from such huge database. The ff package enables us to store large data on disk systematically and have fast access to the database. For example, you can create ff objects instead of R objects, and save them into a specified file as follows.
library(ff)
n <- 8e3
a <- ff(sample(n, n, TRUE), vmode="integer", length=n, filename="a.ff")
b <- ff(sample(255, n, TRUE), vmode="ubyte", length=n, filename="b.ff")
ffsave(a, b, file="y", rootpath="/")
Then, it will generate four files: "a.ff", "b.ff", "y.RData", and "y.ffData". Afterwards, you can load the data file and open each variable by executing the following codes.
library(ff)
load("y.RData")
open.ff(a)
open.ff(b)
When you want to create a function which save ff objects, you should be careful to set up environment parameters. For example, let us define the following function.
saveFF <- function(){
     n <- 8e3
     a <- ff(sample(n, n, TRUE), vmode="integer",
             length=n, filename="a.ff")
     b <- ff(sample(255, n, TRUE), vmode="ubyte",
             length=n, filename="b.ff")
     ffsave(a, b, file="y", rootpath="/")
}
However, the function does not function correctly and will produce an error since its environment parameters should be global while these parameters were set up inside the function as default. Thus, we can modify as follows.
saveFF <- function(){
     n <- 8e3
     a <<- ff(sample(n, n, TRUE), vmode="integer",
              length=n, filename="a.ff")
     b <<- ff(sample(255, n, TRUE), vmode="ubyte",
              length=n, filename="b.ff")
     ffsave(a, b, envir=sys.frame(), file="y",
            rootpath="/")
}
It enables us to create ff objects effectively by executing a customized function. For other knowledges on ff package, visit this page of the Ryan Wiki.


Comments