Scientist. Husband. Daddy. --- TOLLE. LEGE

외부자료의 인용에 있어 대한민국 저작권법(28조)과 U.S. Copyright Act (17 USC. §107)에 정의된 "저작권물의 공정한 이용원칙 | the U.S. fair use doctrine" 을 따릅니다. 저작권(© 최광민)이 명시된 모든 글과 번역문들에 대해 (1) 복제-배포, (2) 임의수정 및 자의적 본문 발췌, (3) 무단배포를 위한 화면캡처를 금하며, (4) 인용 시 URL 주소 만을 사용할 수 있습니다. [후원 | 운영] [대문으로] [방명록] [티스토리 (백업)] [신시내티]

R: contour plots for matrix data

Labels: Informatics
Email This BlogThis!Share to X Share to Facebook

Contour Plots of Matrix Data

[c] This page shows how to use R to draw a table or matrix of numericalvalues as a contour plot with an overlayed grid, like the image below,and using level-plots as an alternative.
[Protein-Protein Interaction Potential, Lu et al. 2003 Figure 2]

[Protein-Protein Interaction Potential, Lu et al. 2003 Figure 2]

This plot is Figure 2 from Lu et al 2003, and shows their residue based "Protein-Protein Interaction Potential" between each of the twenty amino acids.
Contour plots are usually used for continuous variables (e.g.latitude and longitude) rather than categorical data (like amino acidtype) as in this case.

The Data

The data shown in this figure is available as a simple table, MULTIPOT_lu.txt

. If you download this file, you can load it into R like so:

potentials <- as.matrix(read.table("MULTIPOT_lu.txt", row.names=1, header=TRUE))

What does this data look like? Well its a 20 by 20 numeric matrix:

> dim(potentials)
[1] 20 20
> rownames(potentials)
 [1] "GLY" "ALA" "SER" "CYS" "VAL" "THR" "ILE" "PRO" "MET" "ASP"
[11] "ASN" "LEU" "LYS" "GLU" "GLN" "ARG" "HIS" "PHE" "TYR" "TRP"
> colnames(potentials)
 [1] "GLY" "ALA" "SER" "CYS" "VAL" "THR" "ILE" "PRO" "MET" "ASP"
[11] "ASN" "LEU" "LYS" "GLU" "GLN" "ARG" "HIS" "PHE" "TYR" "TRP"
> min(potentials)
[1] -4.4
> max(potentials)
[1] 1.9
> potentials["CYS","CYS"]
[1] -4.4

Simple Contour Plot

Now lets use the filled.contour function from the gplots package to display this:

library(gplots) filled.contour(potentials, main="Protein-Protein Interaction Potential")

Well, I don't care for the default pink and blue colour scheme, but more importantly, notice the axes labels run from 0.0 to 1.0, rather than using the row and column names (i.e. three letter amino acid codes). Trying filled.contour(potentials, axes=FALSE, frame.plot=TRUE) will get rid of the axes labels - but you also loose the key axes.

Labeling the Axes

There may be a more concise method, but I wrote my own axes drawing function (which calls axis() twice), and supplied this to the filled.contour() function using the plot.axes argument:

matrix.axes <- function(data) { # Do the rows, las=2 for text perpendicular to the axis x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1); axis(side=1, at=x, labels=rownames(data), las=2); # Do the columns x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1); axis(side=2, at=x, labels=colnames(data), las=2); } filled.contour(potentials, plot.axes=matrix.axes(potentials), main="Protein-Protein Interaction Potential")

Much nicer. This brings out another issue - to match the published figure, we need to reorder our matrix...

Permuting the Matrix

I re-ordered the matrix like so:

new.order <- c("ILE", "VAL", "LEU", "PHE","CYS", "MET", "ALA", "GLY", "THR", "SER", "TRP", "TYR", "PRO", "HIS","ASN", "GLN", "ASP", "GLU", "LYS", "ARG") pots2 <- potentials[new.order, new.order] filled.contour(pots2, plot.axes=matrix.axes(pots2), main="Protein-Protein Interaction Potential")

Switching the Colours

We can use the gplots package's function colorpanel()to generate a suitable grey-scale colour scheme to mimic the publishedfigure. I have used seven levels, and choosen to range from white to grey10 (a dark grey):

> library(gplots) > colorpanel(7, "white", "grey10") [1] "#FFFFFF" "#D9D9D9" "#B3B3B3" "#8D8D8D" "#666666" "#404040" "#1A1A1A"

As you can see, that gives seven colours (as hexidemical RGB strings). We can pass that to the filled.contour function like this:

filled.contour(pots2,plot.axes=matrix.axes(pots2), col=colorpanel(7, "white", "grey10"),nlevels=7, main="Protein-Protein Interaction Potential")

Adding Grid Lines

Apart from the key layout, the only remaining big difference to thepublished figure is we don't have the grid lines in our plot... let'shave a go at fixing this with the grid() command:

matrix.axes <- function(data) { # Do the rows, las=2 for text perpendicular to the axis x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1); axis(side=1, at=x, labels=rownames(data), las=2); # Do the columns x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1); axis(side=2, at=x, labels=colnames(data), las=2); # Add a solid black grid grid(nx=(dim(data)[1]-1), ny=(dim(data)[2]-1), col="black", lty="solid"); } filled.contour(pots2, plot.axes=matrix.axes(pots2), col=colorpanel(7,"white", "grey10"), nlevels=7, main="Protein-Protein InteractionPotential")

That's now looking pretty close, but if you look carefully thepublished figure seems to determine its contour lines slightlydifferently:
[Protein-Protein Interaction Potential, Lu et al. 2003 Figure 2]

Discussion

Looking at their Figure 2 does beg the question, why did Lu et al. choose to display the amino acids in this order? Quoting from their paper (see references):

The 20×20 residue-based potential constructedfrom DIMER- 1 is plotted in Fig. 2. Hydrophobic residues are attractiveto each other and hydrophilic residues are repulsive. For example, Leuhas an attractive potential with all residues except Lys, Glu, and Asp.Besides the Cys-Cys pair that may form a disulfide bond, the mostattractive pairs are between hydrophobic residues pairs such as Leu,Ilu, and Phe. The most repulsive pairs are Asp-Asp, Lys-Lys, Glu- Glu,Glu-Gly, and Asp-Gly. This pattern is similar to what has been observedin the statistical potential obtained from monomers.

Given this, a sorting based on hydrophobicity would seem sensible. According to Shaun Black,the trend for physiological L-alpha-Amino Acids is: Phe > Leu = Ile> Tyr = Trp > Val > Met > Pro > Cys > Ala > Gly> Thr > Ser > Lys > Gln > Asn > His > Glu > Asp> Arg. Thus:

hydro.order <- c("PHE", "LEU", "ILE", "TYR","TRP", "VAL", "MET", "PRO", "CYS", "ALA", "GLY", "THR", "SER", "LYS","GLN", "ASN", "HIS", "GLU", "ASP", "ARG") pots3 <- potentials[hydro.order, hydro.order] filled.contour(pots3, plot.axes=matrix.axes(pots3), col=colorpanel(7,"white", "grey10"), nlevels=7, main="Protein-Protein InteractionPotential")

This does show that hydrophobicity seems to account for much of thepotential, with Cys and His perhaps as special cases. Tweaking thisorder by hand its possible to get a much more eye-pleasing graph.
For the sake of argument, what ordering does the data itself suggest? We could try this using a heatmap, see this page for details:

heatmap.2(pots2, scale="none", breaks=-5:2,col=colorpanel(7, "white", "grey10"), key=TRUE, symkey=FALSE,density.info="none", trace="none")

Level Plots

The above heatmap image brings up another point - are contour plotsreally suitable for a categorical matrix? What about a "level plot"instead? Internally the heatmap functions use the image() function. As before, getting the axis labelled with the row/column names takes a little work...

library(gplots) # for colorpanel() nr <- dim(pots2)[1] nc <- dim(pots2)[2] #Request a square plot area: par(pty="s") #Plot an image using the row/column numbers as the x/y variables: image(x=1:nr, y=1:nc, z=pots2, axes=FALSE, frame.plot=TRUE,breaks=-5:2, col=colorpanel(7, "white", "grey10"),main="Protein-Protein Interaction Potential", xlab=NA, ylab=NA) #Add axis with the amino acids names, use las=2 for perpendicular text: axis(1, 1:nr, labels=rownames(pots2), las=2) axis(2, 1:nc, labels=colnames(pots2), las=2) #Get a nice border round the image: box() #Add a solid black grid: grid(nx=nr, ny=nc, col="black", lty="solid")

As an alternative, we can use the levelplot() function in the lattice package. I have tweaked the tick marks (to turn them off) and rotated the x-axis labels:

library(gplots) # for colorpanel() library(lattice) # for levelplot() levelplot(pots2, scales=list(tck=0, x=list(rot=90)),col.regions=colorpanel(7, "white", "grey10"),at=c(-5,-4,-3,-2,-1,0,1,2), main="Protein-Protein InteractionPotential", xlab=NULL, ylab=NULL)

One subtle difference between these figures (i.e. image() versus levelplot()) is the treatment of values exactly on the thresholds (e.g. PHE-PHE or VAL-VAL).
P.S. If you use levelplot() inside your own functions, you have to use print(levelplot(...)) to actually produce the graph. See this R-FAQ entry.

References

Lu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein Interactions, Biophysical Journal 84(3), p1895-1901 (PMID: 12609891, journal link, supplementary data)

The R Project

Page contact: Peter Cock Last revised: Wed 25 Jul 2007

Labels: Informatics

Scientist. Husband. Daddy. --- TOLLE. LEGE

GoogleSearch

블로그 내부검색

R: contour plots for matrix data

Labels: Informatics
Email This BlogThis!Share to X Share to Facebook

Contour Plots of Matrix Data

The Data

Simple Contour Plot

Labeling the Axes

Permuting the Matrix

Switching the Colours

Adding Grid Lines

Discussion

Level Plots

References

GoogleSearch

블로그 내부검색

R: contour plots for matrix data Labels: Informatics Email ThisBlogThis!Share to XShare to Facebook

Contour Plots of Matrix Data

The Data

Simple Contour Plot

Labeling the Axes

Permuting the Matrix

Switching the Colours

Adding Grid Lines

Discussion

Level Plots

References

R: contour plots for matrix data

Labels: Informatics
Email This BlogThis!Share to X Share to Facebook