-->

GoogleSearch



Scientist. Husband. Daddy. --- TOLLE. LEGE
외부자료의 인용에 있어 대한민국 저작권법(28조)과 U.S. Copyright Act (17 USC. §107)에 정의된 "저작권물의 공정한 이용원칙 | the U.S. fair use doctrine" 을 따릅니다. 저작권(© 최광민)이 명시된 모든 글과 번역문들에 대해 (1) 복제-배포, (2) 임의수정 및 자의적 본문 발췌, (3) 무단배포를 위한 화면캡처를 금하며, (4) 인용 시 URL 주소 만을 사용할 수 있습니다. [후원 | 운영] [대문으로] [방명록] [옛 방명록] [티스토리 (백업)]

이 블로그 검색

limma: design matrix w/ or w/o intercept

라벨:






 https://stat.ethz.ch/pipermail/bioconductor/2006-April/012825.html

https://stat.ethz.ch/pipermail/bioconductor/2011-June/039777.html


The tilde has a different meaning within R, specifying the right
hand side of a model equation. The default in R is to fit an intercept
in all linear models (which in the context of ANOVA is better thought of
as a 'baseline' sample, to which all other samples are compared).

So when you do something like

f = factor(rep(c("A","B"), each = 3))
design = model.matrix(~f)

you are by default setting the 'A' samples as the baseline sample, and
the second coefficient in the model is the B - A comparison.

To eliminate the intercept, you add either a 0 or a -1 to the right hand
side of the equation:

design = model.matrix(~0+f)

which will then compute the average expression of the A and B samples
separately, so you have to explicitly create a contrasts matrix in order
to compute the B - A contrast.

Without an intercept you are fitting a cell means model in which you are
estimating the mean expression for each factor level (e.g., the model is
y_ij = u_i + e_ij). In this case, doing the contrasts is quite
straightforward.

With an intercept you are fitting a factor effects model in
which all of the other factors are specified in relation to some mean
value
. In this case, all the other factors are specified in relation to
the mean of the BASE (e.g., the model is y_ij = u. + t_i + e_ij).
Here u. is the mean of the BASE samples, and the t_i are the amounts
that each of the other group means differ from the BASE mean.
Therefore, the contrasts are specified by the t_i values themselves if
you are comparing to BASE, and are specified by e.g., groupPE -
BASE for the other contrasts.

See the limmaUsersGuide, and ?formula for more information.


2. design matrix w/ or w/o intercept


As for '~ 0 + Group' versus '~ Group', the first instance means that you
don't want an intercept term, whereas the second means you do (as that
is the default).

  • design matrix w/o intercept  term
    • model.matrix( ~0 + factor)  
    • I almost always use a cell means model (design matrix without an intercept term). 
    • Cons
      • you cannot make any comparisons without specifying contrasts (which you might be able to do with a factor effects model, where there is an intercept).
    • Pros
      • I don't have to figure out each time which level is being used as the baseline.
  • design matrix  w/ intercept term
    • model.matrix( factor)

    As an example, using the two design matrices below, the first model is a
    factor effects model where WT is used as the baseline, so the second
    coefficient gives the difference between MU and WT. For this you don't
    need a contrast, and for this simple comparison it is probably easier.
    If you had two factors and were interested in the interaction, then you
    would have to do the algebra to figure out the contrasts.

    > > Group-> factor(c("WT","WT","MU","MU","MU"),levels=c("WT","MU"))
    > > Group
    > [1] WT WT MU MU MU
    > Levels: WT MU
    > > design-> model.matrix(~Group)
    > > design
    > (Intercept) GroupMU
    > 1 1 0
    > 2 1 0
    > 3 1 1
    > 4 1 1
    > 5 1 1
    > attr(,"assign")
    > [1] 0 1
    > attr(,"contrasts")
    > attr(,"contrasts")$Group
    > [1] "contr.treatment"


    The second model simply computes the mean for each factor level, (hence,
    cell means model) so you have to explicitly compute the contrast of
    interest. However, in this case it would be easier to figure out
    an interaction if you have two factors.

    >
    > > design2-> model.matrix(~0+Group)
    > > design2
    > GroupWT GroupMU
    > 1 1 0
    > 2 1 0
    > 3 0 1
    > 4 0 1
    > 5 0 1
    > attr(,"assign")
    > [1] 1 1
    > attr(,"contrasts")
    > attr(,"contrasts")$Group
    > [1] "contr.treatment"
    >

    The tilde is used to specify a model, separating the right hand side
    (explanatory variables) from the left hand side (dependent variable). So
    if you were fitting a model as above, but for just one gene, you would
    do something like

    • lm(gene_expression_values ~ Group)

    However, when you are using model.matrix, you are only specifying the
    right hand side of that equation (e.g., the design matrix), so you just
    use the tilde followed by your explanatory variables.

    For a more complete explanation, see ?formula.








    라벨:





    Scientist. Husband. Daddy. --- TOLLE. LEGE
    외부자료의 인용에 있어 대한민국 저작권법(28조)과 U.S. Copyright Act (17 USC. §107)에 정의된 "저작권물의 공정한 이용원칙 | the U.S. fair use doctrine" 을 따릅니다. 저작권(© 최광민)이 명시된 모든 글과 번역문들에 대해 (1) 복제-배포, (2) 임의수정 및 자의적 본문 발췌, (3) 무단배포를 위한 화면캡처를 금하며, (4) 인용 시 URL 주소 만을 사용할 수 있습니다. [후원 | 운영] [대문으로] [방명록] [옛 방명록] [티스토리 (백업)] [신시내티]

    -