Plot Function For Additive Cox Proportional Hazard Regression

30 10 2009

Usually a Cox-regression is achieved in R by
library(survival)
model <- coxph ( SuvivalObject ~ Covariate1 + Covariate2 + Factor1 + Factor2 , data = Dataset )

The covariates can be enclosed in other funtions:

  • factors should be enclosed by factor()
  • strata, which allow to adjust for a factor without getting an estimate, should be enclosed by strata()
  • non-log linear continuous terms can be enclosed by
    pspline()

In the latter case the model might look like
model <- coxph (SurvivalObject ~ pspline(Covariate1) + Covariate2 + factor(Factor1) + strata(Factor2) , data = Dataset )

The functional form of the covariates (including the factors) can now be plotted with
termplot(model)

Though the termplot() function fails with plotting just one covariate and leaves no cusomization.

The function plotHR() plots the functional form of the desired term: plotHR(model)
plots the first term in the model by default but other terms can be accessed by calling their number (e.g. the second one):
plotHR(model , terms = 2)

In order to use the function you have to “source” it into R. It is the same procedure as calling a package, but using “source” instead of “library”.

Paste the function syntax into a textfile and safe it (as plot.HR.R) on your harddisk, remember the path and include
source("C:Path/to/plotHR_0.5.R")
before using the function.

Download plotHR()

Note: I have rewritten the function several times since I wrote the initial post … using version numbers now …

  • V0.5 – bug fix for the y-scale and slight adjustment of the default plotting colors (paler CI shade and stronger term-line)
  • V0.4 – the y-scale should be logarithmic; a HR of 0.5 (50% reduced Hazard) should show the same distance from HR = 1 as a doubled Hazard (HR = 2); this is now default. The linear scale I used initially is biased in this concern (Hat-tip: Arve Ulvik, Eva Pedersen and Roy Nilsen). The option y.log allows both ways (linear and log-scale); the axis labels denote Hazard Ratio instead of log(HR).

Usage:

plotHR(model , terms = 1 , se = TRUE , rug = TRUE , x.log = FALSE , y.log = TRUE , xlab = "" , ylab = "Hazard Ratio" , main = NULL , xlim = NULL , ylim = NULL, col.term = "#1F78B4", lwd.term = 3, col.se = "#A6CEE3", cex = 1 , axes = TRUE)

  • model – a coxph model
  • terms – integer; the number of the term to plot
  • se – logical TRUE/FALSE; plotting the CI
  • rug – logical TRUE/FALSE; rug plot at x-axis
  • x.log – logical TRUE/FALSE; log-transformed exposure variable
  • y.log – logical TRUE/FALSE; logarithmic y-scale
  • xlab – character; x-axis label
  • ylab – character; y-axis label
  • main – character; main plot title
  • xlim – 2×1 column vector; x-range of plot
  • ylim – 2×1 column vector; y-range of plot
  • col.term – color of HR-curve
  • lwd.term – line width of HR-curve
  • col.se – color of CI (if plotted)
  • cex – numeric; size factor of labels




Matrix Operations in R

21 08 2009

R can do all kinds of matrix calculations, like multiplication, tranposing and calculating the inverse. The following manual was created by Phil Ender.

Note: R wants the data to be entered by columns starting with column one.

The Matrix

# the matrix function
# 1st arg: c(2,3,-2,1,2,2) the values of the elements filling the columns
# 2nd arg: 3 the number of rows
# 3rd arg: 2 the number of columns

> A <- matrix(c(2,3,-2,1,2,2),3,2)
> A

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2

Is Something a Matrix

> is.matrix(A)

[1] TRUE

> is.vector(A)

[1] FALSE

Multiplication by a Scalar

> c <- 3
> c*A

[,1] [,2]
[1,]    6    3
[2,]    9    6
[3,]   -6    6

Matrix Addition & Subtraction

> B <- matrix(c(1,4,-2,1,2,1),3,2)
> B

[,1] [,2]
[1,]    1    1
[2,]    4    2
[3,]   -2    1

> C <- A + B
> C

[,1] [,2]
[1,]    3    2
[2,]    7    4
[3,]   -4    3

> D <- A - B
> D

[,1] [,2]
[1,]    1    0
[2,]   -1    0
[3,]    0    1

Matrix Multiplication

> D <- matrix(c(2,-2,1,2,3,1),2,3)
> D

[,1] [,2] [,3]
[1,]    2    1    3
[2,]   -2    2    1

> C <- D %*% A
> C

[,1] [,2]
[1,]    1   10
[2,]    0    4

> C <- A %*% D
> C

[,1] [,2] [,3]
[1,]    2    4    7
[2,]    2    7   11
[3,]   -8    2   -4

> D <- matrix(c(2,1,3),1,3)
> D

[,1] [,2] [,3]
[1,]    2    1    3

> C <- D %*% A
> C

[,1] [,2]
[1,]    1   10

> C <- A %*% D

Error in A %*% D : non-conformable arguments

Transpose of a Matrix

> AT <- t(A)
> AT

[,1] [,2] [,3]
[1,]    2    3   -2
[2,]    1    2    2

> ATT <- t(AT)
>ATT

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2

Common Vectors

Unit Vector

> U <- matrix(1,3,1)
> U

[,1]
[1,]    1
[2,]    1
[3,]    1

Common Matrices

Unit Matrix

Using Stata

> U <- matrix(1,3,2)
> U

[,1] [,2]
[1,]    1    1
[2,]    1    1
[3,]    1    1

Diagonal Matrix

> S <- matrix(c(2,3,-2,1,2,2,4,2,3),3,3)
> S

[,1] [,2] [,3]
[1,]    2    1    4
[2,]    3    2    2
[3,]   -2    2    3

> D <- diag(S)
> D

[1] 2 2 3

> D <- diag(diag(S))
> D

[,1] [,2] [,3]
[1,]    2    0    0
[2,]    0    2    0
[3,]    0    0    3

Identity Matrix

> I <- diag(c(1,1,1))
> I

[,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

Symmetric Matrix

> C <- matrix(c(2,1,5,1,3,4,5,4,-2),3,3)
> C

[,1] [,2] [,3]
[1,]    2    1    5
[2,]    1    3    4
[3,]    5    4   -2

> CT <- t(C)
> CT

[,1] [,2] [,3]
[1,]    2    1    5
[2,]    1    3    4
[3,]    5    4   -2

Inverse of a Matrix

> A <- matrix(c(4,4,-2,2,6,2,2,8,4),3,3)
> A

[,1] [,2] [,3]
[1,]    4    2    2
[2,]    4    6    8
[3,]   -2    2    4

> # using MASS package

> library(MASS)

> AI <- ginv(A)
> AI

[,1] [,2] [,3]
[1,]  1.0 -0.5  0.5
[2,] -4.0  2.5 -3.0
[3,]  2.5 -1.5  2.0

> # using car package

> library(car)

> AI <- inv(A)
> AI

[,1] [,2] [,3]
[1,]  1.0 -0.5  0.5
[2,] -4.0  2.5 -3.0
[3,]  2.5 -1.5  2.0

> A %*% AI

[,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

> AI %*% A

[,1] [,2] [,3]
[1,]    1    0    0
[2,]    0    1    0
[3,]    0    0    1

Inverse & Determinant of a Matrix

> C <- matrix(c(2,1,6,1,3,4,6,4,-2),3,3)
> C

[,1] [,2] [,3]
[1,]    2    1    6
[2,]    1    3    4
[3,]    6    4   -2

> CI <- inv(C)
CI

[,1]        [,2]        [,3]
[1,]  0.2156863 -0.25490196  0.13725490
[2,] -0.2549020  0.39215686  0.01960784
[3,]  0.1372549  0.01960784 -0.04901961

> d <- det(C)
> d

[1] -102

Number of Rows & Columns

> X <- matrix(c(3,2,4,3,2,-2,6,1),4,2)
> X

[,1] [,2]
[1,]    3    2
[2,]    2   -2
[3,]    4    6
[4,]    3    1

> dim(X)

[1] 4 2

> r <- nrow(X)
> r

[1] 4

> c <- ncol(X)
> c

[1] 2

Computing Column & Row Sums

# note the uppercase S

> A <- matrix(c(2,3,-2,1,2,2),3,2)
> A

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2

> c <- colSums(A)
> c

[1] 3 5

> r <- rowSums(A)
> r

[1] 3 5 0

> a <- sum(A)
> a

[1] 8

Computing Column & Row Means

# note the uppercase M

> cm <- colMeans(A)
> cm

[1] 1.000000 1.666667

> rm <- rowMeans(A)
> rm

[1] 1.5 2.5 0.0

> m <- mean(A)
> m

[1] 1.333333

Horizontal Concatenation

> A
> A

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2

> B <- matrix(c(1,3,2,1,4,2),3,2)
> B

[,1] [,2]
[1,]    1    1
[2,]    3    4
[3,]    2    2

> C <- cbind(A,B)
> C

[,1] [,2] [,3] [,4]
[1,]    2    1    1    1
[2,]    3    2    3    4
[3,]   -2    2    2    2

Vertical Concatenation (Appending)

> C <- rbind(A,B)
> C

[,1] [,2]
[1,]    2    1
[2,]    3    2
[3,]   -2    2
[4,]    1    1
[5,]    3    4
[6,]    2    2




Densityplot Variations

18 08 2009

Variation I

Densityplot with filled area, quartiles and mean

Densityplot with filled area, quartiles and mean

Just playing around the other day to get the default plot.density() function a bit more like publishing quality. Above is my favorite so far. Further down are two more spartanic versions.

In this connection I also wrote my first function. When I get more customed to R I will package my ideas into an R library – but later…

There is the possibility to call R-functions without embedding the code into your analysis script, but I did not look that up yet, so embedding the following code into your script (at the beginning) and then using
densityplot(Dataset$Coviate)
will do the job, where Dataset and Covariate have to be replaced by the according values of course.

Now the function code:
# function densityplot
densityplot <- function(x , digits = 1 , xlab = "" , ylab = "" , main = "Density" , col = "blue"){
dens <- density(x , na.rm = T)
bins <- as.numeric(cut(dens$x , breaks = fivenum(x)))
i1 <- bins == 1 & !is.na(bins)
i2 <- bins == 2 & !is.na(bins)
i3 <- bins == 3 & !is.na(bins)
i4 <- bins == 4 & !is.na(bins)
x.p1 <- dens$x[i1]; x.p1 <- c(x.p1,max(x.p1),min(x.p1))
x.p2 <- dens$x[i2]; x.p2 <- c(x.p2,max(x.p2),min(x.p2))
x.p3 <- dens$x[i3]; x.p3 <- c(x.p3,max(x.p3),min(x.p3))
x.p4 <- dens$x[i4]; x.p4 <- c(x.p4,max(x.p4),min(x.p4))
y.p1 <- dens$y[i1]; y.p1 <- c(y.p1,0,0)
y.p2 <- dens$y[i2]; y.p2 <- c(y.p2,0,0)
y.p3 <- dens$y[i3]; y.p3 <- c(y.p3,0,0)
y.p4 <- dens$y[i4]; y.p4 <- c(y.p4,0,0)
plot(dens , type = "n" , axes = F, main = main, xlab = xlab , ylab = ylab)
polygon(x.p1,y.p1 , border = F , col = col)
polygon(x.p2,y.p2 , border = F , col = col)
polygon(x.p3,y.p3 , border = F , col = col)
polygon(x.p4,y.p4 , border = F , col = col)
axis(side = 1 , line = 0 , at = fivenum(x, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6)
axis(side = 1 , line = 1 , at = fivenum(x, na.rm = T))
axis(side = 1 , line = 1 , at = round(mean(x , na.rm = T) , digits = digits) , tcl = 0.4 , label = F)
axis(side = 1 , line = -1.5 , at = round(mean(x , na.rm = T) , digits = digits) , tick = F , cex.axis = 0.6)
axis(side = 1 , line = -2.0 , at = round(mean(x , na.rm = T) , digits = digits) , label = "Mean" , tick = F , cex.axis = 0.6)
}

Variation II

Some might find the colored area to much, although IMHO it puts the focus on the fact that one is looking at areas when ploting a density. But then something similar without the color fill. Not a function just a few lines of code to embed and adjust to the script:

Densityplot with quartiles and mean

Densityplot with quartiles and mean

… and the source…
plot(density(angio$PE_ALDER , na.rm = T), axes = F, main = "Basic densityplot", xlab = "" , ylab = "")
# Add Quartiles
axis(side = 1 , line = 1 , at = fivenum(angio$PE_ALDER, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6)
axis(side = 1 , line = 2 , at = fivenum(angio$PE_ALDER, na.rm = T))
abline(v = fivenum(angio$PE_ALDER, na.rm = T)[2:4] , lty = 3)
# Mean
axis(side = 1 , line = 2 , at = round(mean(angio$PE_ALDER , na.rm = T) , digits = 2) , tcl = 0.4 , label = F)
axis(side = 1 , line = -0.5 , at = round(mean(angio$PE_ALDER , na.rm = T) , digits = 2) , tick = F)
axis(side = 1 , line = -1.4 , at = round(mean(angio$PE_ALDER , na.rm = T) , digits = 2) , label = "Mean" , tick = F , cex.axis = 0.6)
abline(v = mean(angio$PE_ALDER , na.rm = T) , lty = 4)

Variation III

… finally an even more stripped down version without the mean:
Densityplot1
which was achieved by:
plot(density(angio$PE_ALDER , na.rm = T), axes = F, main = "Basic densityplot", xlab = "Age")
abline(v = fivenum(angio$PE_ALDER, na.rm = T)[2:4] , lty = 3)
axis(side = 1 , line = -1 , at = fivenum(angio$PE_ALDER, na.rm = T) , label = c("Minimum","Quartile 1", "Median", "Quartile 3", "Maximum"), lwd = 0, cex.axis = 0.6)
axis(side = 1 , at = fivenum(angio$PE_ALDER, na.rm = T))





GAM Plot with 95% Confidence Shade

11 08 2009

GAM plot with conficence "shade" and customized rugs
The lightblue shade denoting the 95% pointwise confidence limits of the GAM estimate is a polygon() object in R.

Usually one would plot the GAM model with the default termplot() function and specifiy se=T to get the confidence limits as lines. I showed in a recent post how to plot the fitted GAM smooth and the confidence limits manually.

The same approach can be used to have the confidence limits as a shade. In order to achieve this:

  1. Fit the model
  2. library(mgcv)
    model <- gam( MyResponse ~ MyCovariate , data = MyDataset)

  3. get the estimated values of the fitted smoothing spline
  4. fit <- predict( model , se = TRUE )$fit

  5. and pointwise standard errors
  6. se <- predict( model , se = TRUE)$se.fit

  7. calculate the values of the upper and lower 95%-confidence limits
  8. lcl <- fit - 1.96 * se
    ucl <- fit + 1.96 * se

  9. plot an empty coordinate system with right scaling, labels , titles and so on. I prefer to include axes = FALSE and add the axes manually. See here for an example.
  10. plot( 0 , type = "n" , bty = "n" , xlab = "The x-axis lable" , ylab = "The y-axis lable" , main = "The Title" , xlim = c(0,5) , ylim = c(0,3) )

  11. No it gets a bit tricky with sorting the coordinates in the right order. R provides a very effective way to achieve this, though it is not easily understandable at once. We create two indices on the dataset i.for and i.back which number the dataset according to the independend variable on the x-axis – forward and backward respectively:
  12. i.for <- order( MyDataset$MyCovariate )
    i.back <- order( MyDataset$MyCovariate , decreasing = TRUE )

  13. The points of the shade polygon follow here first the upper confidence line (ucl) forward and then the lower confidence line backward. The coordinates of the shade-polygon are
  14. x.polygon <- c( MyDataset$MyCovariate[i.for] , MyDataset$MyCovariate[i.back]
    y.polygon <- c( ucl[i.for] , lcl[i.back] )

  15. First plot the polygon with the confidence shade – otherwise the GAM plot vanishes behind it.
  16. polygon( x.polygon , y.polygon , col = "#A6CEE3" , border = NA )

  17. now the mainplot afterwards
  18. lines( MyDataset$MyCovariate[i.for] , fit[i.for], col = "#1F78B4" , lwd = 3 )

  19. add a horizontal line marking the intercept ( the mean of the covariate) – note: univariate only. I will post a remark on this later…
  20. abline( h = mean(MyDataset$MyCovariate) , lty = 2 )

      Hope no typos in the above how2. Comment if you find some.





Fancy Rugs in Regression Plots

10 08 2009
Additive Regression Model with Customized Rug

Additive Regression Model with Customized Rug

Rugplots along the axes show the distribution of the underlying data in regression model plots. This is particulary useful in connection with additive (nonparametric) models where the plotted smooth function is the exclusive representation of the model in order to assess how much data contributed to the model fit at the different values of the exlanatory variable.

The custom plot.gam() function includes the possibility of such rugs and pointwise conficence intervalls by default.

Adding quartiles to the rugs requires some customization, though. I included the complete code to produce the above plot underneath.

The example for this GAM is borrowed from the excellent book of Alain Zuur et. al. Mixed Effects Models and Extensions in Ecology with R p.55ff. The ISIT data to run the code above is included in the R package AED which can be downloaded from the books website.

Note: the package AED is needed for the example dataset only. It is NOT necessary to use the example code on ones own dataset.

The main points concerning the rugs and quantile lables on the x-axis are:

  1. Plot the coordinate system without lables
  2. plot( ... , axes = FALSE )

  3. Plot the x-axis with x-lables
  4. axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)

  5. Plot rugs – the jitter() is necessary since a lot of datapoints sit on the same values of SampleDepth – so shake them a bit.
  6. axis(side = 1 , line = -0.9 , at = jitter(ISIT$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)

  7. Print the lables “1Q”, “Median” and “3Q” or whatever you like to call them on the right position. The line and padj parameter set the position of the text and cex.axis the textsize.
  8. axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)

  9. Plot thick tickmarks crossing through the rug cloud at 1Q, median and 3Q
  10. axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)

  11. and finally short thick tickmarks under the text touching the x-axis
  12. axis(side = 1 , line = 0.3, at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)

Here goes the complete code:
library(mgcv)
library(AED)
data(ISIT)
#
# Fit a univariate GAM model
model <- gam(Sources ~ s(SampleDepth) , data = ISIT)
fit <- predict(model, se = T)$fit
se <- predict(model, se = T)$se.fit
lcl <- fit - 1.96 * se
ucl <- fit + 1.96 * se
#
# open a jpeg
jpeg("FancyRugs.jpg" , width=400, height=400)
#
# set plotting options: 1 plot per page, horizontal labels and textsize
par(mfrow = c(1,1) , las = 1 , cex = 1)
#
# plot coordinatesystem and labels
plot(0 , bty = "n" , type = "n" , xlim = c(0,5000) , ylim = c(-10,50) , xlab = "Depth (m)" , ylab = expression(paste("Number of sources (" , m^-3 , ")")) , axes = FALSE)
#
title(main="Association between number of sources of\nbioluminescent organisms and ocean depth" , cex.main = 0.8)
#
## _____ X-AXIS ______
# x-axis values
axis(side = 1 , line = 0.3 , at = 0:5*1000 , tick = TRUE)
#
# rugs at datapoints
axis(side = 1 , line = -0.9 , at = jitter(ISIT$SampleDepth) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)
#
# labels at 1Q, median and 3Q
axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black" , padj = -2.8)
#
# tick marks at 1Q, median and 3Q
axis(side = 1 , line = 0.3, at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 0.2 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)
#
axis(side = 1 , line = -0.8 , at = fivenum(ISIT$SampleDepth)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE)
#
## _____ Y-AXIS ______
# y-axis values
axis(side = 2 , at = 0:5*10)
#
# rugs at datapoints
axis(side = 2 , line = -0.9 , at = jitter(ISIT$Sources) , labels = F , tick = T , tcl = 0.8 , lwd.ticks = 0.1 , lwd = 0)
#
# labels at 1Q, median and 3Q
axis(side = 2 , line = -0.7 , at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = F , labels = c("1Q","median","3Q"), cex.axis = 0.7, col.axis = "black")
#
# thicker tick marks at 1Q, median and 3Q
axis(side = 2 , line = 0.3, at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = T, tcl = 0.3 , lwd.ticks = 1 , col.ticks = "black", labels = FALSE , padj = -2)
axis(side = 2 , line = -0.7 , at = fivenum(ISIT$Sources)[2:4], lwd = 0 , tick = T, tcl = 1.1 , lwd.ticks = 1 , col.ticks = "black" , labels = FALSE)
#
# horizontal line marking the intercept = mean(Sources) (for univariate model only)
abline(h=mean(ISIT$Sources), lty=3)
#
# Scatterplot
lines(ISIT$SampleDepth , ISIT$Source , type = "p" , cex = 0.4 , lwd = 0.2 , col = "grey")
#
# plot main figure
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , fit[order(ISIT$SampleDepth)] , col = "black" , lwd = 2)
#
# plot lower confidence limit (lcl)
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , lcl[order(ISIT$SampleDepth)] , col = "grey" , lwd = 1)
#
# plot upper confidence limit (ucl)
lines(ISIT$SampleDepth[order(ISIT$SampleDepth)] , ucl[order(ISIT$SampleDepth)] , col = "grey" , lwd = 1)
#
# closing the jpg file
dev.off()





Correlation matrices

6 08 2009

Correlation matrices are a great way to visualize covariation in large datasets. Thanks to Sarkar (2008) and Friendly (2002), we can borrow these examples. First we need to load a few packages:

library(lattice)
library(ellipse)

We then go on to importing our data:

MyData <- read.table("/FILE/LOCATION/FILE.txt", header=TRUE,sep='\t',quote='')

Then on for some tedious work. We’ll have to manually type in the name (as it named i the dataset!) of each of the variables we’re interested in (unless off course you’ll want to analyze the whole shebang – which often isn’t desirable).

var <- c("var1","var2","var3")

We then proceed to make a correlation matrix with the selected variables vector “var”:

cor.MyData <- cor(MyData[,var],use="pair")

And here’s the money shot; setting the order of the variables in the matrix based on the result of a cluster analysis. This is for easier visual interpretation.

ord <- order.dendrogram(as.dendrogram(hclust(dist(cor.MyData))))

We then can make our Corrgrams. First an easy one:

print(levelplot(cor.MyData[ord,ord],xlab=NULL,ylab=NULL,
at=do.breaks(c(-1.01,1.01),101),scales=list(x=list(rot=90)),
colorkey=list(space="top"),
col.regions=colorRampPalette(c("red","white","blue"))))

Simple correlation matrix

Then on to the more elaborate ones, first a informative one using the ellipse package. We’ll have to write our own panel function here (again, thanks to Sarkar 2008), but this is completly generic, so cut ‘n’ paste should suffice.

panel.corrgram<-function(x,y,z,subscripts,at,
level=0.9,label=FALSE,...
{require('ellipse',quietly=TRUE)
x<-as.numeric(x)[subscripts]
y<-as.numeric(y)[subscripts]
z<-as.numeric(z)[subscripts]
zcol<-level.colors(z,at=at,...)
for(i in seq(along=z)){
ell<-ellipse(z[i],level=level,npoints=50,
scale=c(.2,.2),centre=c(x[i],y[i]))
panel.polygon(ell,col=zcol[i],border=zcol[i],...)
}
if (label)
panel.text(x=x,y=y,lab=100*round(z,2),
cex=0.8,col=ifelse(z<0,'white','black'))
}

To create a *.pdf of your output we’ll run following code (remember your working directory!):

pdf('corrgram_ellipse.pdf',10,10)
levelplot(cor.MyData[ord,ord],
at=do.breaks(c(-1.01,1.01),20),xlab=NULL,
ylab=NULL,colorkey=list(space='top'),
scales=list(x=list(rot=90)),
panel=panel.corrgram,label=TRUE)
dev.off()

Correlation matrix - ellipse

However, if you’re more in the pacman thing, this next one might be a better choice. We’ll start by writing a new panel function:

panel.corrgram.2<-function(x,y,z,
subscripts,at=pretty(z),scale=0.8,...)
{
require('grid',quietly=TRUE)
x<-as.numeric(x)[subscripts]
y<-as.numeric(y)[subscripts]
z<-as.numeric(z)[subscripts]
zcol<-level.colors(z,at=at,...)
for(i in seq(along=z))
{
lims<-range(0,z[i])
tval<-2*base::pi*
seq(from=lims[1],to=lims[2],by=0.01)
grid.polygon(x=x[i]+.5*scale*c(0,sin(tval)),
y=y[i]+.5*scale*c(0,cos(tval)),
default.units='native',
gp=gpar(fill=zcol[i]))
grid.circle(x=x[i],y=y[i],r=.5*scale,
default.units='native')
}
}

Which we’ll export:

pdf('corrgram_pacman.pdf',10,10)
levelplot(cor.MyData[ord,ord],
at=do.breaks(c(-1.01,1.01),101),xlab=NULL,
ylab=NULL,colorkey=list(space='top'),
scales=list(x=list(rot=90)),
panel=panel.corrgram.2,
col.regions=colorRampPalette(c('red','white','blue')))
dev.off()

Correlation matrix - pacman

Of course, changing the output size (and type) is easy. Changing pdf –> png (or the format of your choosing). Size were in these examples 10 by 10 inches.





How2plot nicer GAM curves

16 06 2009

Generalized additive models are an established tool to model correlations for nonlinear covariates without too much hazzle with the form of the assosiation between predictor and response. It is a great straightforward tool, which leaves most of the work to the computer.

The default plotting method produces clean plots for all covariates in the model (or a selection) but: They do not have presentation quality by any means! To achieve this one needs customization (colors, understandable axes-labels, scaling).

This example shows a customization variant for a additive regression model with one covariate. The goal was to display the absolute value of the response variable on the y-axis and not the “difference from intercept” which is default.

This is only meaningful for a single covariate in the model.

1 The default plot.gam() method

MyGAM1<- with(MyData[MyData$Strata==1,], gam(Y ~ s(Covariate)))
MyGAM0<- with(MyData[MyData$Strata==0,], gam(Y ~ s(Covariate)))

par( mfcol=c(1,2))
plot(MyGAM0)
plot(MyGAM1)

This is the resulting plot:

Stratified Additive Regression Model

Stratified Additive Regression Model

2 The fancy way

1. Extract the values of the model response from the GAM object:

response1 <- predict(MyGAM1, type="response", se.fit=T)
response0 <- predict(MyGAM0, type="response", se.fit=T)

2. Print the response values against the covariate (note: this works just with one covariate)

par(mfcol=c(1,1))

plot(0, type="n", bty="n", main="Fancy GAM plot", xlab="MyCovariate", ylab="MyResponse", lwd=3,ylim=c(0,60), xlim=c(0,200))
legend("bottomright", bty="n", lwd=5, col=c("green","red"), legend=c("Strata = 0", "Strata = 1"))

lines(sm.spline(MyGAM1$model$Covariate , response1$fit) , lwd = 3 , col = "red")
lines(sm.spline(MyGAM1$model$Covariate , response1$fit+1.96*response1$se) , lty = 3 , lwd = 2 , col = "red")
lines(sm.spline(MyGAM1$model$Covariate , response1$fit-1.96*response1$se) , lty = 3 , lwd = 2 , col = "red")

lines(sm.spline(MyGAM0$model$Covariate , response0$fit) , lwd = 3 , col = "green")
lines(sm.spline(MyGAM0$model$Covariate, response0$fit + 1.96 * response0$se) , lty = 3 , lwd = 2, col = "green")
lines(sm.spline(MyGAM0$model$Covariate, response0$fit - 1.96 * response0$se) , lty = 3 , lwd = 2 , col = "green")

abline(h=gam.dm1$coefficients[1], lty=2, lwd=1, col="red")
abline(h=gam.dm0$coefficients[1], lty=2, lwd=1, col="green")

Stratified Additive Regression Model on Response Scale

Stratified Additive Regression Model on Response Scale





Additive COX-regression

3 06 2009

Therneau et al. refer to the proportional hazards model or COX-regression model as “the workhorse of regression analysis for censored data”. They show how to implement the additive form of this model in SAS and S-pluss; already mentioned by Hastie and Tibshirany in 1986 when introducing Generalized Additive Models (GAM).

I found modelling the functional form of the covariates in a regression model for rightcensored survival times with smoothing splines extremely useful. And the implementation is absolutely straightforward in R.

The only thing needed is the installation of the R-libraries “survival” and “pspline”:

install.packages("pspline")
and
install.packages("survival")

In the following code I will refer to a dataset “MyData” with a binary status variable “death” and a time-to-event variable “days2death”.
The status variable “death” should be (not necessarily) 1 if the event of interesst occured to the subject and “days2death” gives then the time to this event.

Viualizing the functional form of a covariate takes the following steps:

  1. create the survival object of interesst
  2. fit a proportional hazards model with smoothing splines,
  3. predict the functional form of the covariate of interesst and
  4. plot it!

Note that there is the termplot() function in R which gives you the GAM plots after the modelfit, so step 3 would not be necessary – BUT: it has a bug and fails plotting a single covariate; and it does not allow all to much customizing.

This is the R code to achieve the analysis:

1 Create survival object:

surv.death <- Surv(MyData$days2death, MyData$death)

2 Fit proportional hazards model with smoothing splines for continuous covariates:

library(survival)
library(pspline)
pham.fit <- coxph( surv.death ~ pspline(EF, df=4) + pspline(Age, df=4) + strata (Sex, df=4) , data = MyData)

The model above includes the continuous covariates “EF” (ejection fraction) and “Age” and stratifies for “Sex”.

3 Produce the fitted smoothing spline for the first covariate in the above model formula with standard errors

predicted <- predict(pham.fit , type = "terms" , se.fit = TRUE , terms = 1)
“terms=1″ refers to “pspline(EF,df=4)”

4 Plot it

First plotting axes and labels
plot(0 , xlab="Ejection Fraction" , ylab = "Hazard Ratio" , main = "All-cause Death" , type = "n" , xlim=c(0,100) , ylim=c(0,3))
the range of values on the x-axis (“xlim=c(0,100)”) is chosen manually for this specific covariate; of course it is possible to use something like ylim = c( 0 , max(MyData$EF) ).

Now plot the fitted smoothing spline using the lines() function:
lines( sm.spline(MyData$EF , exp(predicted$fit)) , col = "red" , lwd = 0.8)
Note that the term prediction gives log-hazard-ratios; therefore exp(predicted$fit) is plotted against the values of the covariate. The sm.spline() function is necessary since the points of the plot appear in random order and density, according to the underlying dataset; a plain lines() function would produce just a chaotic pattern. Alternative:
plot(MyData$EF , exp(predicted$fit) , col = "red" , cex = 0.2)
produces a scattered plot that reflects the distribution of the underlying data – I do prefer adding a rug-plot on the bottom of the graph to illustrate this (see under).

… upper and lower confidence limits with dashed thinner lines

lines(sm.spline(MyData$EF , exp(predicted$fit + 1.96 * predicted$se)) , col = "orange" , lty = 2 , lwd = 0.4)
and
lines(sm.spline(MyData$EF , exp(predicted$fit - 1.96 * predicted$se)) , col = "orange" , lty = 2 , lwd = 0.4)

… a tiny horizontal line at hazard level 1, do see where the confidence limits cross:
abline( h = 1 , col = "lightgrey" , lty = 2 , lwd = 0.4)

… tiny tickmarks on the x-axes to reflect the distribution of the underlying data:
axis( side = 1 , at = MyData$EF, labels = F , tick = T , tcl = 0.4 , lwd.ticks = 0.1)

… and some fancy red tickmarks to mark minimum, lower hinge, median, upper hinge and maximum of the covariate in the dataset:
axis( side = 1 , at = fivenum(MyData$EF), labels = F , tick = T , tcl = -0.2 , lwd.ticks = 1 , col.ticks = "red")

Fancy customized smoothing spline fitted to the functional form of a covariate in a additive proportional hazard model

Fancy customized smoothing spline fitted to the functional form of a covariate in a additive proportional hazard model

Thats it!

4b) The easy way (works ONLY with MORE then 1 continous covariate) – predicting the terms can be omitted:

termplot(pham.fit, se=T, rug=T)

Resulting in …

The default termplot method for fitted smoothing splines

The default termplot method for fitted smoothing splines