install.packages("XML")
failed while complaining
Cannot find xml2-config
Seth Falcon pointed out library libxml2-dev – Development files for the GNOME XML library – where missing.
sudo aptitude install libxml2-dev
fixed the problem.
install.packages("XML")
failed while complaining
Cannot find xml2-config
Seth Falcon pointed out library libxml2-dev – Development files for the GNOME XML library – where missing.
sudo aptitude install libxml2-dev
fixed the problem.
Trying to install RODBC in Ubuntu with
install.packages("RODBC")
failed throwing an error message
ODBC headers sql.h and sqlext.h not found
A glance at the r-help showed that it had to do with something called unixODBC – an ODBC driver manager.
The package was installed, but not the development package and thus not the headers which R complained about.
Again a not-so-obvious-for-the-newbee-Linux-Unix-Shell-goblish thing. The fix is
sudo aptitude install unixodbc-dev
Getting data into R can be done by reading colon separated files (.csv) via the read.table() function. It is also possible to access databases directly and send SQL queries directly from R to the database. This has some advantages: Using Sweave the queries get documented in the analysis report, variable formats are retained.
To install the RODBC package:
install.packages("RODBC")
Open a database connection to an Microsoft Access database file, e.g. “MyDataBase.mdb” sitting in the Folder “C:\ MyPath\MyDataBase.mdb”:
channel <- odbcConnectAccess("C:/MyPath/MyDataBase")
note that the Windows backslashes “\” become slashes “/” in R and the extension “.mdb” is omitted.
Getting the database table “MyTable” into the R dataframe “R.Table”
R.Table <- sqlQuery( channel , paste ("select * from MyTable"))
MyTable can also be a sql query in the Access database.
Creating reasonable layouted LaTeX tables from R output was easier then expected. I should have googled it long ago…
install.packages("xtable")
Lets say you created a tabular output in R called “tab1″, e.g by doing:
data(CO2)
tab1 <- with(CO2, table(Treatment , Type))
tab1
your text output in R would look like
Type
Treatment Quebec Mississippi
nonchilled 21 21
chilled 21 21
Now you would like this or whatever table or data.frame as a nice LaTeX-table, the only thing to do is:
library(xtable)
xtable(tab1)
and the output will be:
% latex table generated in R 2.9.0 by xtable 1.5-5 package
% Wed Jul 08 16:20:54 2009
\begin{table}[ht]
\begin{center}
\begin{tabular}{rrr}
\hline
& Quebec & Mississippi \\
\hline
nonchilled & 21 & 21 \\
chilled & 21 & 21 \\
\hline
\end{tabular}
\end{center}
\end{table}
If you are using Sweave the usage becomes
<< echo = FALSE , results = tex >>
library(xtable)
data(CO2)
with(CO2, xtable(table(Treatment , Type)))
@
and a the result of the R run is a LaTeX document. Another post will give a hint how to paste R graphics into the same Sweave or LaTeX document… later…
The full usage of xtable() is
xtable(x, caption=NULL, label=NULL, align=NULL, digits=NULL, display=NULL)
.. add table captions
xtable(table , caption = "My table caption")
and labels
xtable(table , label = " MyLaTeXlable")
to the LaTeX tables.
You might have no internet connection or as in my case you have one, but the firewall prevents anything beside Windows IE to access the net. Grrr.
Ok. It takes more time, but is possible. Lets have an example:
Installing the package ‘gam’ for R-2.9 on Windows being in Norway (with cran.ii.uib.no as the nearest and fastest “Comprehensive R Archive Network”.
install.packages("H:/DATA/gam_1.0.zip", repos=NULL) at the command line.Thats it.
Using a LOT of time on fiddling together tables of descriptive statistics manually in R did not inspire me to look in the CRAN repositories for a R function doing exactly this… yes a bit stubborn… want to find out myself…
So what I had to do was:
prop.table(xtabs( ~ Variable + Group1 + Group2), c(2,3)),Anyway this is over now. A solution exists, of course (thx Kjetil… again):
The package ‘reporttools’ of Kaspar Rufibach.
install.packages("reporttools", dependecies=TRUE)
The most important functions are well:
tableContinuous( vars = c(bmi, ejectionfraction, systolicBP, diastolicBP) , group = sex , subset = significantstenosis , print.pval = "anova")
and
tableNominal() for nominal variables.
The output is LaTeX and it is possible to specify table captions and lables to the tables in the function call. I will give it a try inside Sweave… but first I have to get it to my Linux machine … which is blocked by the corporate firewall … after a recent Mircrosoft powergrab … after a Virusattack (Conficker) … after running XP unpached.
But thats another story
rgl is a R package for three-dimensional visualisation using OpenGL. The package provides functions implementing a new graphics device suitable for visualisation of R objects in three dimensions using the OpenGL libraries.
It can be installed from the Ubuntu repostitories with
sudo aptitude install r-cran-rgl
or on all platforms inside R with
install.packages("rgl")
BUT:
It depends on some GL libraries installed, which do not get installed by default. So you might expect unsuccessful installation, with an error message like
missing required header GL/gl.h.
The solution is to install the missing library manually with
sudo aptitude install libglu1-mesa-dev
… one of those inconveniences preventing mainstream users switching to OpenSource software – it seems to me.
Generalized additive models are an established tool to model correlations for nonlinear covariates without too much hazzle with the form of the assosiation between predictor and response. It is a great straightforward tool, which leaves most of the work to the computer.
The default plotting method produces clean plots for all covariates in the model (or a selection) but: They do not have presentation quality by any means! To achieve this one needs customization (colors, understandable axes-labels, scaling).
This example shows a customization variant for a additive regression model with one covariate. The goal was to display the absolute value of the response variable on the y-axis and not the “difference from intercept” which is default.
This is only meaningful for a single covariate in the model.
MyGAM1<- with(MyData[MyData$Strata==1,], gam(Y ~ s(Covariate)))
MyGAM0<- with(MyData[MyData$Strata==0,], gam(Y ~ s(Covariate)))
par( mfcol=c(1,2))
plot(MyGAM0)
plot(MyGAM1)
This is the resulting plot:

Stratified Additive Regression Model
1. Extract the values of the model response from the GAM object:
response1 <- predict(MyGAM1, type="response", se.fit=T)
response0 <- predict(MyGAM0, type="response", se.fit=T)
2. Print the response values against the covariate (note: this works just with one covariate)
par(mfcol=c(1,1))
plot(0, type="n", bty="n", main="Fancy GAM plot", xlab="MyCovariate", ylab="MyResponse", lwd=3,ylim=c(0,60), xlim=c(0,200))
legend("bottomright", bty="n", lwd=5, col=c("green","red"), legend=c("Strata = 0", "Strata = 1"))
lines(sm.spline(MyGAM1$model$Covariate , response1$fit) , lwd = 3 , col = "red")
lines(sm.spline(MyGAM1$model$Covariate , response1$fit+1.96*response1$se) , lty = 3 , lwd = 2 , col = "red")
lines(sm.spline(MyGAM1$model$Covariate , response1$fit-1.96*response1$se) , lty = 3 , lwd = 2 , col = "red")
lines(sm.spline(MyGAM0$model$Covariate , response0$fit) , lwd = 3 , col = "green")
lines(sm.spline(MyGAM0$model$Covariate, response0$fit + 1.96 * response0$se) , lty = 3 , lwd = 2, col = "green")
lines(sm.spline(MyGAM0$model$Covariate, response0$fit - 1.96 * response0$se) , lty = 3 , lwd = 2 , col = "green")
abline(h=gam.dm1$coefficients[1], lty=2, lwd=1, col="red")
abline(h=gam.dm0$coefficients[1], lty=2, lwd=1, col="green")

Stratified Additive Regression Model on Response Scale
Update:I have written a much more detailed static page about the additive COX model: http://rforge.org/plothr/
The page has a download link to the function plotHR() which does all the fuzz. It is extensively commented. It should be easy to understand the syntax and modify it for individual purposes.
Therneau et al. refer to the proportional hazards model or COX-regression model as “the workhorse of regression analysis for censored data”. They show how to implement the additive form of this model in SAS and S-pluss; already mentioned by Hastie and Tibshirany in 1986 when introducing Generalized Additive Models (GAM).
I found modelling the functional form of the covariates in a regression model for rightcensored survival times with smoothing splines extremely useful. And the implementation is absolutely straightforward in R.
The only thing needed is the installation of the R-libraries “survival” and “pspline”:
install.packages("pspline")
and
install.packages("survival")
In the following code I will refer to a dataset “MyData” with a binary status variable “death” and a time-to-event variable “days2death”.
The status variable “death” should be (not necessarily) 1 if the event of interesst occured to the subject and “days2death” gives then the time to this event.
Viualizing the functional form of a covariate takes the following steps:
Note that there is the termplot() function in R which gives you the GAM plots after the modelfit, so step 3 would not be necessary – BUT: it has a bug and fails plotting a single covariate; and it does not allow all to much customizing.
This is the R code to achieve the analysis:
surv.death <- Surv(MyData$days2death, MyData$death)
library(survival)
library(pspline)
pham.fit <- coxph( surv.death ~ pspline(EF, df=4) + pspline(Age, df=4) + strata (Sex, df=4) , data = MyData)
The model above includes the continuous covariates “EF” (ejection fraction) and “Age” and stratifies for “Sex”.
predicted <- predict(pham.fit , type = "terms" , se.fit = TRUE , terms = 1)
“terms=1″ refers to “pspline(EF,df=4)”
First plotting axes and labels
plot(0 , xlab="Ejection Fraction" , ylab = "Hazard Ratio" , main = "All-cause Death" , type = "n" , xlim=c(0,100) , ylim=c(0,3))
the range of values on the x-axis (“xlim=c(0,100)”) is chosen manually for this specific covariate; of course it is possible to use something like ylim = c( 0 , max(MyData$EF) ).
Now plot the fitted smoothing spline using the lines() function:
lines( sm.spline(MyData$EF , exp(predicted$fit)) , col = "red" , lwd = 0.8)
Note that the term prediction gives log-hazard-ratios; therefore exp(predicted$fit) is plotted against the values of the covariate. The sm.spline() function is necessary since the points of the plot appear in random order and density, according to the underlying dataset; a plain lines() function would produce just a chaotic pattern. Alternative:
plot(MyData$EF , exp(predicted$fit) , col = "red" , cex = 0.2)
produces a scattered plot that reflects the distribution of the underlying data – I do prefer adding a rug-plot on the bottom of the graph to illustrate this (see under).
… upper and lower confidence limits with dashed thinner lines
lines(sm.spline(MyData$EF , exp(predicted$fit + 1.96 * predicted$se)) , col = "orange" , lty = 2 , lwd = 0.4)
and
lines(sm.spline(MyData$EF , exp(predicted$fit - 1.96 * predicted$se)) , col = "orange" , lty = 2 , lwd = 0.4)
… a tiny horizontal line at hazard level 1, do see where the confidence limits cross:
abline( h = 1 , col = "lightgrey" , lty = 2 , lwd = 0.4)
… tiny tickmarks on the x-axes to reflect the distribution of the underlying data:
axis( side = 1 , at = MyData$EF, labels = F , tick = T , tcl = 0.4 , lwd.ticks = 0.1)
… and some fancy red tickmarks to mark minimum, lower hinge, median, upper hinge and maximum of the covariate in the dataset:
axis( side = 1 , at = fivenum(MyData$EF), labels = F , tick = T , tcl = -0.2 , lwd.ticks = 1 , col.ticks = "red")

Fancy customized smoothing spline fitted to the functional form of a covariate in a additive proportional hazard model
Thats it!
termplot(pham.fit, se=T, rug=T)
Resulting in …

The default termplot method for fitted smoothing splines