Cannot find sprng 2.0 header file

10 08 2009

Trying to
install.packages("rsprng")
fails because
Cannot find sprng 2.0 header file.

The development package of the SPRNG (Scalable Parallel Random Number Generator) library which provides several RNGs that are suitable for use in parallel computing are missing:

sudo aptitude install libsprng2-dev

fixes it.





Fancy Rugs in Regression Plots

10 08 2009
Additive Regression Model with Customized Rug

Additive Regression Model with Customized Rug

Rugplots along the axes show the distribution of the underlying data in regression model plots. This is particulary useful in connection with additive (nonparametric) models where the plotted smooth function is the exclusive representation of the model in order to assess how much data contributed to the model fit at the different values of the exlanatory variable.

The custom plot.gam() function includes the possibility of such rugs and pointwise conficence intervalls by default.

Adding quartiles to the rugs requires some customization, though. I included the complete code to produce the above plot underneath.

The example for this GAM is borrowed from the excellent book of Alain Zuur et. al. Mixed Effects Models and Extensions in Ecology with R p.55ff. The ISIT data to run the code above is included in the R package AED which can be downloaded from the books website.

Note: the package AED is needed for the example dataset only. It is NOT necessary to use the example code on ones own dataset.

The main points concerning the rugs and quantile lables on the x-axis are:

  1. Plot the coordinate system without lables
  2. plot( ... , axes = FALSE )

  3. Plot the x-axis with x-lables
  4. axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)

  5. Plot rugs – the jitter() is necessary since a lot of datapoints sit on the same values of SampleDepth – so shake them a bit.
  6. axis(side = 1 , line = -0.9 , at = jitter(ISIT$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)

  7. Print the lables “1Q”, “Median” and “3Q” or whatever you like to call them on the right position. The line and padj parameter set the position of the text and cex.axis the textsize.
  8. axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)

  9. Plot thick tickmarks crossing through the rug cloud at 1Q, median and 3Q
  10. axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)

  11. and finally short thick tickmarks under the text touching the x-axis
  12. axis(side = 1 , line = 0.3, at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)

Here goes the complete code:
library(mgcv)
library(AED)
data(ISIT)
#
# Fit a univariate GAM model
model <- gam(Sources ~ s(SampleDepth) , data = ISIT)
fit <- predict(model, se = T)$fit
se <- predict(model, se = T)$se.fit
lcl <- fit - 1.96 * se
ucl <- fit + 1.96 * se
#
# open a jpeg
jpeg("FancyRugs.jpg" , width=400, height=400)
#
# set plotting options: 1 plot per page, horizontal labels and textsize
par(mfrow = c(1,1) , las = 1 , cex = 1)
#
# plot coordinatesystem and labels
plot(0 , bty = "n" , type = "n" , xlim = c(0,5000) , ylim = c(-10,50) , xlab = "Depth (m)" , ylab = expression(paste("Number of sources (" , m^-3 , ")")) , axes = FALSE)
#
title(main="Association between number of sources of\nbioluminescent organisms and ocean depth" , cex.main = 0.8)
#
## _____ X-AXIS ______
# x-axis values
axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)
#
# rugs at datapoints
axis(side = 1 , line = -0.9 , at = jitter(ISIT$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)
#
# labels at 1Q, median and 3Q
axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)
#
# tick marks at 1Q, median and 3Q
axis(side = 1 , line = 0.3, at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)
#
axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)
#
## _____ Y-AXIS ______
# y-axis values
axis(side = 2 , at = 0:5*10)
#
# rugs at datapoints
axis(side = 2 , line = -0.9 , at = jitter(ISIT$Sources) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)
#
# labels at 1Q, median and 3Q
axis(side = 2 , line = -0.7 , at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black")
#
# thicker tick marks at 1Q, median and 3Q
axis(side = 2 , line = 0.3, at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = T, tcl = 0.3 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE , padj = -2)
axis(side = 2 , line = -0.7 , at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black" , labels = FALSE)
#
# horizontal line marking the intercept = mean(Sources) (for univariate model only)
abline(h=mean(ISIT$Sources), lty=3)
#
# Scatterplot
lines(ISIT$SampleDepth , ISIT$Source , type = "p" , cex = 0.4 , lwd = 0.2 , col = "grey")
#
# plot main figure
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , fit[order(ISIT$SampleDepth)] , col = "black" , lwd = 2)
#
# plot lower confidence limit (lcl)
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , lcl[order(ISIT$SampleDepth)] , col = "grey" , lwd = 1)
#
# plot upper confidence limit (ucl)
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , ucl[order(ISIT$SampleDepth)] , col = "grey" , lwd = 1)
#
# closing the jpg file
dev.off()





Install R Package XML with Ubuntu amd64

8 08 2009

install.packages("XML")

failed while complaining

Cannot find xml2-config

Seth Falcon pointed out library libxml2-dev – Development files for the GNOME XML library – where missing.

sudo aptitude install libxml2-dev

fixed the problem.





Installing RODBC with Ubuntu/Debian amd64

8 08 2009

Trying to install RODBC in Ubuntu with
install.packages("RODBC")
failed throwing an error message
ODBC headers sql.h and sqlext.h not found

A glance at the r-help showed that it had to do with something called unixODBC – an ODBC driver manager.

The package was installed, but not the development package and thus not the headers which R complained about.

Again a not-so-obvious-for-the-newbee-Linux-Unix-Shell-goblish thing. The fix is
sudo aptitude install unixodbc-dev





Open Access .mdb Files with RODBC

7 08 2009

Getting data into R can be done by reading colon separated files (.csv) via the read.table() function. It is also possible to access databases directly and send SQL queries directly from R to the database. This has some advantages: Using Sweave the queries get documented in the analysis report, variable formats are retained.

To install the RODBC package:
install.packages("RODBC")

Open a database connection to an Microsoft Access database file, e.g. “MyDataBase.mdb” sitting in the Folder “C:\ MyPath\MyDataBase.mdb”:
channel <- odbcConnectAccess("C:/MyPath/MyDataBase")
note that the Windows backslashes “\” become slashes “/” in R and the extension “.mdb” is omitted.

Getting the database table “MyTable” into the R dataframe “R.Table”
R.Table <- sqlQuery( channel , paste ("select * from MyTable"))
MyTable can also be a sql query in the Access database.





Correlation matrices

6 08 2009

Correlation matrices are a great way to visualize covariation in large datasets. Thanks to Sarkar (2008) and Friendly (2002), we can borrow these examples. First we need to load a few packages:

library(lattice)
library(ellipse)

We then go on to importing our data:

MyData <- read.table("/FILE/LOCATION/FILE.txt", header=TRUE,sep='\t',quote='')

Then on for some tedious work. We’ll have to manually type in the name (as it named i the dataset!) of each of the variables we’re interested in (unless off course you’ll want to analyze the whole shebang – which often isn’t desirable).

var <- c("var1","var2","var3")

We then proceed to make a correlation matrix with the selected variables vector “var”:

cor.MyData <- cor(MyData[,var],use="pair")

And here’s the money shot; setting the order of the variables in the matrix based on the result of a cluster analysis. This is for easier visual interpretation.

ord <- order.dendrogram(as.dendrogram(hclust(dist(cor.MyData))))

We then can make our Corrgrams. First an easy one:

print(levelplot(cor.MyData[ord,ord],xlab=NULL,ylab=NULL,
at=do.breaks(c(-1.01,1.01),101),scales=list(x=list(rot=90)),
colorkey=list(space="top"),
col.regions=colorRampPalette(c("red","white","blue"))))

Simple correlation matrix

Then on to the more elaborate ones, first a informative one using the ellipse package. We’ll have to write our own panel function here (again, thanks to Sarkar 2008), but this is completly generic, so cut ‘n’ paste should suffice.

panel.corrgram<-function(x,y,z,subscripts,at,
level=0.9,label=FALSE,...
{require('ellipse',quietly=TRUE)
x<-as.numeric(x)[subscripts]
y<-as.numeric(y)[subscripts]
z<-as.numeric(z)[subscripts]
zcol<-level.colors(z,at=at,...)
for(i in seq(along=z)){
ell<-ellipse(z[i],level=level,npoints=50,
scale=c(.2,.2),centre=c(x[i],y[i]))
panel.polygon(ell,col=zcol[i],border=zcol[i],...)
}
if (label)
panel.text(x=x,y=y,lab=100*round(z,2),
cex=0.8,col=ifelse(z<0,'white','black'))
}

To create a *.pdf of your output we’ll run following code (remember your working directory!):

pdf('corrgram_ellipse.pdf',10,10)
levelplot(cor.MyData[ord,ord],
at=do.breaks(c(-1.01,1.01),20),xlab=NULL,
ylab=NULL,colorkey=list(space='top'),
scales=list(x=list(rot=90)),
panel=panel.corrgram,label=TRUE)
dev.off()

Correlation matrix - ellipse

However, if you’re more in the pacman thing, this next one might be a better choice. We’ll start by writing a new panel function:

panel.corrgram.2<-function(x,y,z,
subscripts,at=pretty(z),scale=0.8,...)
{
require('grid',quietly=TRUE)
x<-as.numeric(x)[subscripts]
y<-as.numeric(y)[subscripts]
z<-as.numeric(z)[subscripts]
zcol<-level.colors(z,at=at,...)
for(i in seq(along=z))
{
lims<-range(0,z[i])
tval<-2*base::pi*
seq(from=lims[1],to=lims[2],by=0.01)
grid.polygon(x=x[i]+.5*scale*c(0,sin(tval)),
y=y[i]+.5*scale*c(0,cos(tval)),
default.units='native',
gp=gpar(fill=zcol[i]))
grid.circle(x=x[i],y=y[i],r=.5*scale,
default.units='native')
}
}

Which we’ll export:

pdf('corrgram_pacman.pdf',10,10)
levelplot(cor.MyData[ord,ord],
at=do.breaks(c(-1.01,1.01),101),xlab=NULL,
ylab=NULL,colorkey=list(space='top'),
scales=list(x=list(rot=90)),
panel=panel.corrgram.2,
col.regions=colorRampPalette(c('red','white','blue')))
dev.off()

Correlation matrix - pacman

Of course, changing the output size (and type) is easy. Changing pdf –> png (or the format of your choosing). Size were in these examples 10 by 10 inches.





Installing and Updating R

19 07 2009

These are the recommendations from CRAN on how to install and update R. I have streamlined it for my situtation.

To obtain the latest R packages, add an entry like
deb http://<my.favorite.cran.mirror>/bin/linux/ubuntu hardy/
in your /etc/apt/sources.list file.

In my case (using Ubuntu Hardy 8.04 LTS in Bergen, Norway) this is
deb http://cran.ii.uib.no/bin/linux/ubuntu hardy/

See http://cran.r-project.org/mirrors.html
for the list of CRAN mirrors.

To install the complete R system, use
sudo aptitude update
sudo aptitude install r-recommended ess

ESS is a package to use R via the emacs editor. It is good! As an added convenience to Ubuntu users it is provided as an up-to-date version in the cran repositories.

SECURE APT

The Ubuntu archives on CRAN are signed with the key of “Vincent Goulet
<vincent.goulet@act.ulaval.ca>” with key ID E2A11821. You can fetch
this key with

gpg --keyserver subkeys.pgp.net --recv-key E2A11821

and then feed it to apt-key with
gpg -a --export E2A11821 | sudo apt-key add -

ADMINISTRATION AND MAINTENANCE OF R PACKAGES

The R packages part of the Ubuntu r-base and r-recommended packages
are installed into the directory /usr/lib/R/library. These can be
updated using aptitude with

sudo aptitude update
sudo aptitude upgrade

The other r-cran-* packages shipped with Ubuntu are installed into the
directory /usr/lib/R/site-library.

Installing R packages not provided with Ubuntu first requires tools to
compile the packages from source. These tools are installed via the R
development package with
sudo aptitude install r-base-dev

Then a site administrator can install R packages into the directory
/usr/local/lib/R/site-library by running R as root and using the install.packages() function:
sudo R
install.packages()

A routine update can then be undertaken from R using
update.packages(lib.loc = "/usr/local/lib/R/site-library")





Create LaTex tables from R output

8 07 2009

Creating reasonable layouted LaTeX tables from R output was easier then expected. I should have googled it long ago…

install.packages("xtable")

Lets say you created a tabular output in R called “tab1″, e.g by doing:
data(CO2)
tab1 <- with(CO2, table(Treatment , Type))
tab1

your text output in R would look like

Type
Treatment Quebec Mississippi
nonchilled 21 21
chilled 21 21

Now you would like this or whatever table or data.frame as a nice LaTeX-table, the only thing to do is:
library(xtable)
xtable(tab1)

and the output will be:

% latex table generated in R 2.9.0 by xtable 1.5-5 package
% Wed Jul 08 16:20:54 2009
\begin{table}[ht]
\begin{center}
\begin{tabular}{rrr}
\hline
& Quebec & Mississippi \\
\hline
nonchilled & 21 & 21 \\
chilled & 21 & 21 \\
\hline
\end{tabular}
\end{center}
\end{table}

If you are using Sweave the usage becomes
<< echo = FALSE , results = tex >>
library(xtable)
data(CO2)
with(CO2, xtable(table(Treatment , Type)))
@

and a the result of the R run is a LaTeX document. Another post will give a hint how to paste R graphics into the same Sweave or LaTeX document… later…

The full usage of xtable() is
xtable(x, caption=NULL, label=NULL, align=NULL, digits=NULL, display=NULL)
.. add table captions
xtable(table , caption = "My table caption")
and labels
xtable(table , label = " MyLaTeXlable")
to the LaTeX tables.





Install R packages without internet connection

7 07 2009

You might have no internet connection or as in my case you have one, but the firewall prevents anything beside Windows IE to access the net. Grrr.

Ok. It takes more time, but is possible. Lets have an example:
Installing the package ‘gam’ for R-2.9 on Windows being in Norway (with cran.ii.uib.no as the nearest and fastest “Comprehensive R Archive Network”.

  1. Find another place with internet connection or as in my case, the browser works, so use the browser.
  2. Find the package on the repository corresponding to your operating system and your version of R in our above example here
  3. klick on the link gam_1.0.zip, download it and save it (e.g. using an USB disk). Remember the path to the file (in our example in the folder H:/DATA/).
  4. Open R and write install.packages("H:/DATA/gam_1.0.zip", repos=NULL) at the command line.

Thats it.





Generate LaTeX tables of descriptive statistics

7 07 2009

Using a LOT of time on fiddling together tables of descriptive statistics manually in R did not inspire me to look in the CRAN repositories for a R function doing exactly this… yes a bit stubborn… want to find out myself…

So what I had to do was:

  1. handcode the descriptives for each variable, something like prop.table(xtabs( ~ Variable + Group1 + Group2), c(2,3)),
  2. calculate some statistics for each variable (chi^2, one-way-ANOVA, Fisher-exact..) and attach it to the descriptives,
  3. run the code,
  4. copy the plain text output from R back into the editor and finally
  5. add LaTeX or CSV syntax to get the desired results.

Anyway this is over now. A solution exists, of course (thx Kjetil… again):

The package ‘reporttools’ of Kaspar Rufibach.
install.packages("reporttools", dependecies=TRUE)

The most important functions are well:

tableContinuous( vars = c(bmi, ejectionfraction, systolicBP, diastolicBP) , group = sex , subset = significantstenosis , print.pval = "anova")

and

tableNominal() for nominal variables.

The output is LaTeX and it is possible to specify table captions and lables to the tables in the function call. I will give it a try inside Sweave… but first I have to get it to my Linux machine … which is blocked by the corporate firewall … after a recent Mircrosoft powergrab … after a Virusattack (Conficker) … after running XP unpached.

But thats another story








Follow

Get every new post delivered to your Inbox.