-->

GoogleSearch



Scientist. Husband. Daddy. --- TOLLE. LEGE
외부자료의 인용에 있어 대한민국 저작권법(28조)과 U.S. Copyright Act (17 USC. §107)에 정의된 "저작권물의 공정한 이용원칙 | the U.S. fair use doctrine" 을 따릅니다. 저작권(© 최광민)이 명시된 모든 글과 번역문들에 대해 (1) 복제-배포, (2) 임의수정 및 자의적 본문 발췌, (3) 무단배포를 위한 화면캡처를 금하며, (4) 인용 시 URL 주소 만을 사용할 수 있습니다. [후원 | 운영] [대문으로] [방명록] [옛 방명록] [티스토리 (백업)]

이 블로그 검색

R: contour plots for matrix data

라벨:


Contour Plots of Matrix Data

[c] This page shows how to use R to draw a table or matrix of numericalvalues as a contour plot with an overlayed grid, like the image below,and using level-plots as an alternative.
[Protein-Protein Interaction Potential, Lu et al. 2003 Figure 2]
This plot is Figure 2 from Lu et al 2003, and shows their residue based "Protein-Protein Interaction Potential" between each of the twenty amino acids.
Contour plots are usually used for continuous variables (e.g.latitude and longitude) rather than categorical data (like amino acidtype) as in this case.

The Data

The data shown in this figure is available as a simple table, MULTIPOT_lu.txt [TXT]. If you download this file, you can load it into R like so:
potentials <- as.matrix(read.table("MULTIPOT_lu.txt", row.names=1, header=TRUE))
What does this data look like? Well its a 20 by 20 numeric matrix:
> dim(potentials)
[1] 20 20
> rownames(potentials)
 [1] "GLY" "ALA" "SER" "CYS" "VAL" "THR" "ILE" "PRO" "MET" "ASP"
[11] "ASN" "LEU" "LYS" "GLU" "GLN" "ARG" "HIS" "PHE" "TYR" "TRP"
> colnames(potentials)
 [1] "GLY" "ALA" "SER" "CYS" "VAL" "THR" "ILE" "PRO" "MET" "ASP"
[11] "ASN" "LEU" "LYS" "GLU" "GLN" "ARG" "HIS" "PHE" "TYR" "TRP"
> min(potentials)
[1] -4.4
> max(potentials)
[1] 1.9
> potentials["CYS","CYS"]
[1] -4.4

Simple Contour Plot

Now lets use the filled.contour function from the gplots package to display this:
library(gplots)
filled.contour(potentials, main="Protein-Protein Interaction Potential")
[Filled contour plot]
Well, I don't care for the default pink and blue colour scheme, but more importantly, notice the axes labels run from 0.0 to 1.0, rather than using the row and column names (i.e. three letter amino acid codes). Trying filled.contour(potentials, axes=FALSE, frame.plot=TRUE) will get rid of the axes labels - but you also loose the key axes.

Labeling the Axes

There may be a more concise method, but I wrote my own axes drawing function (which calls axis() twice), and supplied this to the filled.contour() function using the plot.axes argument:
matrix.axes <- function(data) {
    # Do the rows, las=2 for text perpendicular to the axis
    x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1);
    axis(side=1, at=x, labels=rownames(data), las=2);
    # Do the columns
    x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1);
    axis(side=2, at=x, labels=colnames(data), las=2);
}
filled.contour(potentials, plot.axes=matrix.axes(potentials), main="Protein-Protein Interaction Potential")
[Filled contour plot]
Much nicer. This brings out another issue - to match the published figure, we need to reorder our matrix...

Permuting the Matrix

I re-ordered the matrix like so:
new.order <- c("ILE", "VAL", "LEU", "PHE","CYS", "MET", "ALA", "GLY", "THR", "SER", "TRP", "TYR", "PRO", "HIS","ASN", "GLN", "ASP", "GLU", "LYS", "ARG")
pots2 <- potentials[new.order, new.order]
filled.contour(pots2, plot.axes=matrix.axes(pots2), main="Protein-Protein Interaction Potential")
[Filled contour plot]

Switching the Colours

We can use the gplots package's function colorpanel()to generate a suitable grey-scale colour scheme to mimic the publishedfigure. I have used seven levels, and choosen to range from white to grey10 (a dark grey):
> library(gplots)
> colorpanel(7, "white", "grey10")
[1] "#FFFFFF" "#D9D9D9" "#B3B3B3" "#8D8D8D" "#666666" "#404040" "#1A1A1A"

As you can see, that gives seven colours (as hexidemical RGB strings). We can pass that to the filled.contour function like this:
filled.contour(pots2,plot.axes=matrix.axes(pots2), col=colorpanel(7, "white", "grey10"),nlevels=7, main="Protein-Protein Interaction Potential")
[Greyscale filled contour plot, drawn using R from the R-Project]

Adding Grid Lines

Apart from the key layout, the only remaining big difference to thepublished figure is we don't have the grid lines in our plot... let'shave a go at fixing this with the grid() command:
matrix.axes <- function(data) {
    # Do the rows, las=2 for text perpendicular to the axis
    x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1);
    axis(side=1, at=x, labels=rownames(data), las=2);
    # Do the columns
    x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1);
    axis(side=2, at=x, labels=colnames(data), las=2);
    # Add a solid black grid
    grid(nx=(dim(data)[1]-1), ny=(dim(data)[2]-1), col="black", lty="solid");
}
filled.contour(pots2, plot.axes=matrix.axes(pots2), col=colorpanel(7,"white", "grey10"), nlevels=7, main="Protein-Protein InteractionPotential")
[Filled contour plot with grid]
That's now looking pretty close, but if you look carefully thepublished figure seems to determine its contour lines slightlydifferently:
[Protein-Protein Interaction Potential, Lu et al. 2003 Figure 2]

Discussion

Looking at their Figure 2 does beg the question, why did Lu et al. choose to display the amino acids in this order? Quoting from their paper (see references):
The 20×20 residue-based potential constructedfrom DIMER- 1 is plotted in Fig. 2. Hydrophobic residues are attractiveto each other and hydrophilic residues are repulsive. For example, Leuhas an attractive potential with all residues except Lys, Glu, and Asp.Besides the Cys-Cys pair that may form a disulfide bond, the mostattractive pairs are between hydrophobic residues pairs such as Leu,Ilu, and Phe. The most repulsive pairs are Asp-Asp, Lys-Lys, Glu- Glu,Glu-Gly, and Asp-Gly. This pattern is similar to what has been observedin the statistical potential obtained from monomers.
Given this, a sorting based on hydrophobicity would seem sensible. According to Shaun Black,the trend for physiological L-alpha-Amino Acids is: Phe > Leu = Ile> Tyr = Trp > Val > Met > Pro > Cys > Ala > Gly> Thr > Ser > Lys > Gln > Asn > His > Glu > Asp> Arg. Thus:
hydro.order <- c("PHE", "LEU", "ILE", "TYR","TRP", "VAL", "MET", "PRO", "CYS", "ALA", "GLY", "THR", "SER", "LYS","GLN", "ASN", "HIS", "GLU", "ASP", "ARG")
pots3 <- potentials[hydro.order, hydro.order]
filled.contour(pots3, plot.axes=matrix.axes(pots3), col=colorpanel(7,"white", "grey10"), nlevels=7, main="Protein-Protein InteractionPotential")
[Filled contour plot with grid]
This does show that hydrophobicity seems to account for much of thepotential, with Cys and His perhaps as special cases. Tweaking thisorder by hand its possible to get a much more eye-pleasing graph.
For the sake of argument, what ordering does the data itself suggest? We could try this using a heatmap, see this page for details:
heatmap.2(pots2, scale="none", breaks=-5:2,col=colorpanel(7, "white", "grey10"), key=TRUE, symkey=FALSE,density.info="none", trace="none")
[Protein-protein Interaction Potential as a Heatmap]

Level Plots

The above heatmap image brings up another point - are contour plotsreally suitable for a categorical matrix? What about a "level plot"instead? Internally the heatmap functions use the image() function. As before, getting the axis labelled with the row/column names takes a little work...
library(gplots) # for colorpanel()
nr <- dim(pots2)[1]
nc <- dim(pots2)[2]
#Request a square plot area:
par(pty="s")
#Plot an image using the row/column numbers as the x/y variables:
image(x=1:nr, y=1:nc, z=pots2, axes=FALSE, frame.plot=TRUE,breaks=-5:2, col=colorpanel(7, "white", "grey10"),main="Protein-Protein Interaction Potential", xlab=NA, ylab=NA)
#Add axis with the amino acids names, use las=2 for perpendicular text:
axis(1, 1:nr, labels=rownames(pots2), las=2)
axis(2, 1:nc, labels=colnames(pots2), las=2)
#Get a nice border round the image:
box()
[Protein-protein Interaction Potential using the image and axis commands in R]
#Add a solid black grid:
grid(nx=nr, ny=nc, col="black", lty="solid")
[Protein-protein Interaction Potential using the image, axis and grid commands in R]
As an alternative, we can use the levelplot() function in the lattice package. I have tweaked the tick marks (to turn them off) and rotated the x-axis labels:
library(gplots) # for colorpanel()
library(lattice) # for levelplot()
levelplot(pots2, scales=list(tck=0, x=list(rot=90)),col.regions=colorpanel(7, "white", "grey10"),at=c(-5,-4,-3,-2,-1,0,1,2), main="Protein-Protein InteractionPotential", xlab=NULL, ylab=NULL)
[Protein-protein Interaction Potential using the levelplot  command in R]
One subtle difference between these figures (i.e. image() versus levelplot()) is the treatment of values exactly on the thresholds (e.g. PHE-PHE or VAL-VAL).
P.S. If you use levelplot() inside your own functions, you have to use print(levelplot(...)) to actually produce the graph. See this R-FAQ entry.

References

Lu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein Interactions, Biophysical Journal 84(3), p1895-1901 (PMID: 12609891, journal link, supplementary data)
Page contact: Peter Cock Last revised: Wed 25 Jul 2007
Back to top of page





라벨:





Scientist. Husband. Daddy. --- TOLLE. LEGE
외부자료의 인용에 있어 대한민국 저작권법(28조)과 U.S. Copyright Act (17 USC. §107)에 정의된 "저작권물의 공정한 이용원칙 | the U.S. fair use doctrine" 을 따릅니다. 저작권(© 최광민)이 명시된 모든 글과 번역문들에 대해 (1) 복제-배포, (2) 임의수정 및 자의적 본문 발췌, (3) 무단배포를 위한 화면캡처를 금하며, (4) 인용 시 URL 주소 만을 사용할 수 있습니다. [후원 | 운영] [대문으로] [방명록] [옛 방명록] [티스토리 (백업)] [신시내티]

-