GoogleSearch
이 블로그 검색
R: contour plots for matrix data
라벨:
Informatics
이메일로 전송BlogThis!Twitter에서 공유Facebook에서 공유
Contour Plots of Matrix Data
[c] This page shows how to use R to draw a table or matrix of numericalvalues as a contour plot with an overlayed grid, like the image below,and using level-plots as an alternative.
This plot is Figure 2 from Lu et al 2003, and shows their residue based "Protein-Protein Interaction Potential" between each of the twenty amino acids.
Contour plots are usually used for continuous variables (e.g.latitude and longitude) rather than categorical data (like amino acidtype) as in this case.
The Data
The data shown in this figure is available as a simple table, MULTIPOT_lu.txt . If you download this file, you can load it into R like so:
potentials <- as.matrix(read.table("MULTIPOT_lu.txt", row.names=1, header=TRUE))What does this data look like? Well its a 20 by 20 numeric matrix:
> dim(potentials)
[1] 20 20
> rownames(potentials)
[1] "GLY" "ALA" "SER" "CYS" "VAL" "THR" "ILE" "PRO" "MET" "ASP"
[11] "ASN" "LEU" "LYS" "GLU" "GLN" "ARG" "HIS" "PHE" "TYR" "TRP"
> colnames(potentials)
[1] "GLY" "ALA" "SER" "CYS" "VAL" "THR" "ILE" "PRO" "MET" "ASP"
[11] "ASN" "LEU" "LYS" "GLU" "GLN" "ARG" "HIS" "PHE" "TYR" "TRP"
> min(potentials)
[1] -4.4
> max(potentials)
[1] 1.9
> potentials["CYS","CYS"]
[1] -4.4
Simple Contour Plot
Now lets use the filled.contour function from the gplots package to display this:
library(gplots)
filled.contour(potentials, main="Protein-Protein Interaction Potential")
Well, I don't care for the default pink and blue colour scheme, but more importantly, notice the axes labels run from 0.0 to 1.0, rather than using the row and column names (i.e. three letter amino acid codes). Trying filled.contour(potentials, axes=FALSE, frame.plot=TRUE) will get rid of the axes labels - but you also loose the key axes.
Labeling the Axes
There may be a more concise method, but I wrote my own axes drawing function (which calls axis() twice), and supplied this to the filled.contour() function using the plot.axes argument:
matrix.axes <- function(data) {
# Do the rows, las=2 for text perpendicular to the axis
x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1);
axis(side=1, at=x, labels=rownames(data), las=2);
# Do the columns
x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1);
axis(side=2, at=x, labels=colnames(data), las=2);
}
filled.contour(potentials, plot.axes=matrix.axes(potentials), main="Protein-Protein Interaction Potential")
Much nicer. This brings out another issue - to match the published figure, we need to reorder our matrix...
Permuting the Matrix
I re-ordered the matrix like so:
new.order <- c("ILE", "VAL", "LEU", "PHE","CYS", "MET", "ALA", "GLY", "THR", "SER", "TRP", "TYR", "PRO", "HIS","ASN", "GLN", "ASP", "GLU", "LYS", "ARG")
pots2 <- potentials[new.order, new.order]
filled.contour(pots2, plot.axes=matrix.axes(pots2), main="Protein-Protein Interaction Potential")
Switching the Colours
We can use the gplots package's function colorpanel()to generate a suitable grey-scale colour scheme to mimic the publishedfigure. I have used seven levels, and choosen to range from white to grey10 (a dark grey):
> library(gplots)
> colorpanel(7, "white", "grey10")
[1] "#FFFFFF" "#D9D9D9" "#B3B3B3" "#8D8D8D" "#666666" "#404040" "#1A1A1A"
As you can see, that gives seven colours (as hexidemical RGB strings). We can pass that to the filled.contour function like this:
filled.contour(pots2,plot.axes=matrix.axes(pots2), col=colorpanel(7, "white", "grey10"),nlevels=7, main="Protein-Protein Interaction Potential")
Adding Grid Lines
Apart from the key layout, the only remaining big difference to thepublished figure is we don't have the grid lines in our plot... let'shave a go at fixing this with the grid() command:
matrix.axes <- function(data) {
# Do the rows, las=2 for text perpendicular to the axis
x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1);
axis(side=1, at=x, labels=rownames(data), las=2);
# Do the columns
x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1);
axis(side=2, at=x, labels=colnames(data), las=2);
# Add a solid black grid
grid(nx=(dim(data)[1]-1), ny=(dim(data)[2]-1), col="black", lty="solid");
}
filled.contour(pots2, plot.axes=matrix.axes(pots2), col=colorpanel(7,"white", "grey10"), nlevels=7, main="Protein-Protein InteractionPotential")
That's now looking pretty close, but if you look carefully thepublished figure seems to determine its contour lines slightlydifferently:
Discussion
Looking at their Figure 2 does beg the question, why did Lu et al. choose to display the amino acids in this order? Quoting from their paper (see references):
The 20×20 residue-based potential constructedfrom DIMER- 1 is plotted in Fig. 2. Hydrophobic residues are attractiveto each other and hydrophilic residues are repulsive. For example, Leuhas an attractive potential with all residues except Lys, Glu, and Asp.Besides the Cys-Cys pair that may form a disulfide bond, the mostattractive pairs are between hydrophobic residues pairs such as Leu,Ilu, and Phe. The most repulsive pairs are Asp-Asp, Lys-Lys, Glu- Glu,Glu-Gly, and Asp-Gly. This pattern is similar to what has been observedin the statistical potential obtained from monomers.Given this, a sorting based on hydrophobicity would seem sensible. According to Shaun Black,the trend for physiological L-alpha-Amino Acids is: Phe > Leu = Ile> Tyr = Trp > Val > Met > Pro > Cys > Ala > Gly> Thr > Ser > Lys > Gln > Asn > His > Glu > Asp> Arg. Thus:
hydro.order <- c("PHE", "LEU", "ILE", "TYR","TRP", "VAL", "MET", "PRO", "CYS", "ALA", "GLY", "THR", "SER", "LYS","GLN", "ASN", "HIS", "GLU", "ASP", "ARG")
pots3 <- potentials[hydro.order, hydro.order]
filled.contour(pots3, plot.axes=matrix.axes(pots3), col=colorpanel(7,"white", "grey10"), nlevels=7, main="Protein-Protein InteractionPotential")
This does show that hydrophobicity seems to account for much of thepotential, with Cys and His perhaps as special cases. Tweaking thisorder by hand its possible to get a much more eye-pleasing graph.
For the sake of argument, what ordering does the data itself suggest? We could try this using a heatmap, see this page for details:
heatmap.2(pots2, scale="none", breaks=-5:2,col=colorpanel(7, "white", "grey10"), key=TRUE, symkey=FALSE,density.info="none", trace="none")
Level Plots
The above heatmap image brings up another point - are contour plotsreally suitable for a categorical matrix? What about a "level plot"instead? Internally the heatmap functions use the image() function. As before, getting the axis labelled with the row/column names takes a little work...
library(gplots) # for colorpanel()
nr <- dim(pots2)[1]
nc <- dim(pots2)[2]
#Request a square plot area:
par(pty="s")
#Plot an image using the row/column numbers as the x/y variables:
image(x=1:nr, y=1:nc, z=pots2, axes=FALSE, frame.plot=TRUE,breaks=-5:2, col=colorpanel(7, "white", "grey10"),main="Protein-Protein Interaction Potential", xlab=NA, ylab=NA)
#Add axis with the amino acids names, use las=2 for perpendicular text:
axis(1, 1:nr, labels=rownames(pots2), las=2)
axis(2, 1:nc, labels=colnames(pots2), las=2)
#Get a nice border round the image:
box()
#Add a solid black grid:
grid(nx=nr, ny=nc, col="black", lty="solid")
As an alternative, we can use the levelplot() function in the lattice package. I have tweaked the tick marks (to turn them off) and rotated the x-axis labels:
library(gplots) # for colorpanel()
library(lattice) # for levelplot()
levelplot(pots2, scales=list(tck=0, x=list(rot=90)),col.regions=colorpanel(7, "white", "grey10"),at=c(-5,-4,-3,-2,-1,0,1,2), main="Protein-Protein InteractionPotential", xlab=NULL, ylab=NULL)
One subtle difference between these figures (i.e. image() versus levelplot()) is the treatment of values exactly on the thresholds (e.g. PHE-PHE or VAL-VAL).
P.S. If you use levelplot() inside your own functions, you have to use print(levelplot(...)) to actually produce the graph. See this R-FAQ entry.
References
Lu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein Interactions, Biophysical Journal 84(3), p1895-1901 (PMID: 12609891, journal link, supplementary data)
Page contact: Peter Cock Last revised: Wed 25 Jul 2007Back to top of page
이메일로 전송BlogThis!Twitter에서 공유Facebook에서 공유
라벨:
Informatics
[c] This page shows how to use R to draw a table or matrix of numericalvalues as a contour plot with an overlayed grid, like the image below,and using level-plots as an alternative. This plot is Figure 2 from Lu et al 2003, and shows their residue based "Protein-Protein Interaction Potential" between each of the twenty amino acids. Contour plots are usually used for continuous variables (e.g.latitude and longitude) rather than categorical data (like amino acidtype) as in this case. The DataThe data shown in this figure is available as a simple table, MULTIPOT_lu.txt . If you download this file, you can load it into R like so:potentials <- as.matrix(read.table("MULTIPOT_lu.txt", row.names=1, header=TRUE)) What does this data look like? Well its a 20 by 20 numeric matrix:> dim(potentials) [1] 20 20 > rownames(potentials) [1] "GLY" "ALA" "SER" "CYS" "VAL" "THR" "ILE" "PRO" "MET" "ASP" [11] "ASN" "LEU" "LYS" "GLU" "GLN" "ARG" "HIS" "PHE" "TYR" "TRP" > colnames(potentials) [1] "GLY" "ALA" "SER" "CYS" "VAL" "THR" "ILE" "PRO" "MET" "ASP" [11] "ASN" "LEU" "LYS" "GLU" "GLN" "ARG" "HIS" "PHE" "TYR" "TRP" > min(potentials) [1] -4.4 > max(potentials) [1] 1.9 > potentials["CYS","CYS"] [1] -4.4 Simple Contour PlotNow lets use the filled.contour function from the gplots package to display this:library(gplots) Well, I don't care for the default pink and blue colour scheme, but more importantly, notice the axes labels run from 0.0 to 1.0, rather than using the row and column names (i.e. three letter amino acid codes). Trying filled.contour(potentials, axes=FALSE, frame.plot=TRUE) will get rid of the axes labels - but you also loose the key axes.filled.contour(potentials, main="Protein-Protein Interaction Potential") Labeling the AxesThere may be a more concise method, but I wrote my own axes drawing function (which calls axis() twice), and supplied this to the filled.contour() function using the plot.axes argument:matrix.axes <- function(data) { Much nicer. This brings out another issue - to match the published figure, we need to reorder our matrix...# Do the rows, las=2 for text perpendicular to the axis x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1); axis(side=1, at=x, labels=rownames(data), las=2); # Do the columns x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1); axis(side=2, at=x, labels=colnames(data), las=2); } filled.contour(potentials, plot.axes=matrix.axes(potentials), main="Protein-Protein Interaction Potential") Permuting the MatrixI re-ordered the matrix like so:new.order <- c("ILE", "VAL", "LEU", "PHE","CYS", "MET", "ALA", "GLY", "THR", "SER", "TRP", "TYR", "PRO", "HIS","ASN", "GLN", "ASP", "GLU", "LYS", "ARG") pots2 <- potentials[new.order, new.order] filled.contour(pots2, plot.axes=matrix.axes(pots2), main="Protein-Protein Interaction Potential") Switching the ColoursWe can use the gplots package's function colorpanel()to generate a suitable grey-scale colour scheme to mimic the publishedfigure. I have used seven levels, and choosen to range from white to grey10 (a dark grey):> library(gplots) > colorpanel(7, "white", "grey10") [1] "#FFFFFF" "#D9D9D9" "#B3B3B3" "#8D8D8D" "#666666" "#404040" "#1A1A1A" As you can see, that gives seven colours (as hexidemical RGB strings). We can pass that to the filled.contour function like this: filled.contour(pots2,plot.axes=matrix.axes(pots2), col=colorpanel(7, "white", "grey10"),nlevels=7, main="Protein-Protein Interaction Potential") Adding Grid LinesApart from the key layout, the only remaining big difference to thepublished figure is we don't have the grid lines in our plot... let'shave a go at fixing this with the grid() command:matrix.axes <- function(data) { That's now looking pretty close, but if you look carefully thepublished figure seems to determine its contour lines slightlydifferently:# Do the rows, las=2 for text perpendicular to the axis x <- (1:dim(data)[1] - 1) / (dim(data)[1] - 1); axis(side=1, at=x, labels=rownames(data), las=2); # Do the columns x <- (1:dim(data)[2] - 1) / (dim(data)[2] - 1); axis(side=2, at=x, labels=colnames(data), las=2); # Add a solid black grid grid(nx=(dim(data)[1]-1), ny=(dim(data)[2]-1), col="black", lty="solid"); } filled.contour(pots2, plot.axes=matrix.axes(pots2), col=colorpanel(7,"white", "grey10"), nlevels=7, main="Protein-Protein InteractionPotential") DiscussionLooking at their Figure 2 does beg the question, why did Lu et al. choose to display the amino acids in this order? Quoting from their paper (see references):The 20×20 residue-based potential constructedfrom DIMER- 1 is plotted in Fig. 2. Hydrophobic residues are attractiveto each other and hydrophilic residues are repulsive. For example, Leuhas an attractive potential with all residues except Lys, Glu, and Asp.Besides the Cys-Cys pair that may form a disulfide bond, the mostattractive pairs are between hydrophobic residues pairs such as Leu,Ilu, and Phe. The most repulsive pairs are Asp-Asp, Lys-Lys, Glu- Glu,Glu-Gly, and Asp-Gly. This pattern is similar to what has been observedin the statistical potential obtained from monomers. Given this, a sorting based on hydrophobicity would seem sensible. According to Shaun Black,the trend for physiological L-alpha-Amino Acids is: Phe > Leu = Ile> Tyr = Trp > Val > Met > Pro > Cys > Ala > Gly> Thr > Ser > Lys > Gln > Asn > His > Glu > Asp> Arg. Thus:hydro.order <- c("PHE", "LEU", "ILE", "TYR","TRP", "VAL", "MET", "PRO", "CYS", "ALA", "GLY", "THR", "SER", "LYS","GLN", "ASN", "HIS", "GLU", "ASP", "ARG") This does show that hydrophobicity seems to account for much of thepotential, with Cys and His perhaps as special cases. Tweaking thisorder by hand its possible to get a much more eye-pleasing graph.pots3 <- potentials[hydro.order, hydro.order] filled.contour(pots3, plot.axes=matrix.axes(pots3), col=colorpanel(7,"white", "grey10"), nlevels=7, main="Protein-Protein InteractionPotential") For the sake of argument, what ordering does the data itself suggest? We could try this using a heatmap, see this page for details: heatmap.2(pots2, scale="none", breaks=-5:2,col=colorpanel(7, "white", "grey10"), key=TRUE, symkey=FALSE,density.info="none", trace="none") Level PlotsThe above heatmap image brings up another point - are contour plotsreally suitable for a categorical matrix? What about a "level plot"instead? Internally the heatmap functions use the image() function. As before, getting the axis labelled with the row/column names takes a little work...library(gplots) # for colorpanel() As an alternative, we can use the levelplot() function in the lattice package. I have tweaked the tick marks (to turn them off) and rotated the x-axis labels:nr <- dim(pots2)[1] nc <- dim(pots2)[2] #Request a square plot area: par(pty="s") #Plot an image using the row/column numbers as the x/y variables: image(x=1:nr, y=1:nc, z=pots2, axes=FALSE, frame.plot=TRUE,breaks=-5:2, col=colorpanel(7, "white", "grey10"),main="Protein-Protein Interaction Potential", xlab=NA, ylab=NA) #Add axis with the amino acids names, use las=2 for perpendicular text: axis(1, 1:nr, labels=rownames(pots2), las=2) axis(2, 1:nc, labels=colnames(pots2), las=2) #Get a nice border round the image: box() #Add a solid black grid: grid(nx=nr, ny=nc, col="black", lty="solid") library(gplots) # for colorpanel() One subtle difference between these figures (i.e. image() versus levelplot()) is the treatment of values exactly on the thresholds (e.g. PHE-PHE or VAL-VAL).library(lattice) # for levelplot() levelplot(pots2, scales=list(tck=0, x=list(rot=90)),col.regions=colorpanel(7, "white", "grey10"),at=c(-5,-4,-3,-2,-1,0,1,2), main="Protein-Protein InteractionPotential", xlab=NULL, ylab=NULL) P.S. If you use levelplot() inside your own functions, you have to use print(levelplot(...)) to actually produce the graph. See this R-FAQ entry. ReferencesLu et al. (2003) Development of Unified Statistical Potentials Describing Protein-Protein Interactions, Biophysical Journal 84(3), p1895-1901 (PMID: 12609891, journal link, supplementary data) |
Page contact: Peter Cock Last revised: Wed 25 Jul 2007
Back to top of pageScientist. Husband. Daddy. --- TOLLE. LEGE
외부자료의 인용에 있어 대한민국 저작권법(28조)과 U.S. Copyright Act (17 USC. §107)에 정의된 "저작권물의 공정한 이용원칙 | the U.S. fair use doctrine" 을 따릅니다. 저작권(© 최광민)이 명시된 모든 글과 번역문들에 대해 (1) 복제-배포, (2) 임의수정 및 자의적 본문 발췌, (3) 무단배포를 위한 화면캡처를 금하며, (4) 인용 시 URL 주소 만을 사용할 수 있습니다. [후원 | 운영] [대문으로] [방명록] [옛 방명록] [티스토리 (백업)] [신시내티]
-