Posts

Showing posts from January, 2011

Fast Access to Large Database in R

R is a useful programming language for statistics. Sometimes, we need to handle large databases. Unfortunately, R is very slow to navigate some special data fields from such huge database. The ff package enables us to store large data on disk systematically and have fast access to the database. For example, you can create ff objects instead of R objects, and save them into a specified file as follows. library(ff) n <- 8e3 a <- ff(sample(n, n, TRUE), vmode="integer", length=n, filename="a.ff") b <- ff(sample(255, n, TRUE), vmode="ubyte", length=n, filename="b.ff") ffsave(a, b, file="y", rootpath="/") Then, it will generate four files: "a.ff", "b.ff", "y.RData", and "y.ffData". Afterwards, you can load the data file and open each variable by executing the following codes. library(ff) load("y.RData") open.ff(a) open.ff(b) When you want to create a function whic...