banner



Can Limma Be Used With Already Normalized Data

0. Intro

limma is an R package that was originally developed for differential expression (DE) analysis of microarray data.

voom is a function in the limma package that modifies RNA-Seq data for use with limma.

Together they allow fast, flexible, and powerful analyses of RNA-Seq information. Limma-voom is our tool of choice for DE analyses considering information technology:

  • Allows for incredibly flexible model specification (you can include multiple chiselled and continuous variables, allowing incorporation of almost any kind of metadata)

  • Based on simulation studies, maintains the false discovery rate at or below the nominal charge per unit, unlike some other packages

  • Empirical Bayes smoothing of gene-wise standard deviations provides increased ability.

1. Setup

Input data for this example is on the course github page.

First, install the edgeR package if not already installed (which installs limma as a dependency)

            # source("https://bioconductor.org/biocLite.R") # biocLite("edgeR")          

Load the edgeR package (which loads limma as a dependency)

            library(edgeR)          
            ## Loading required parcel: limma          

Read in the counts table

            counts <- read.delim("all_counts.txt", row.names = 1) head(counts)          
            ##           C61 C62 C63 C64 C91  C92 C93 C94 I561 I562 I563 I564 I591 I592 ## AT1G01010 341 371 275 419 400  542 377 372  677  522  455  508  821  466 ## AT1G01020 164  94 176 155 200  183 166 115  172  157  122  152  189  171 ## AT1G03987   0   0   0   0   0    0   0   0    0    0    0    0    0    0 ## AT1G01030  twenty  34  xl  27  28   36  22  xl   20    seven   57   38   25   ten ## AT1G01040 738 487 610 690 945 1033 836 908  857  821  770  751  848  607 ## AT1G03993   ane   0   0   0   0    0   0   1    3    0    1    ane    1    0 ##           I593 I594 I861 I862 I863 I864 I891 I892 I893 I894 ## AT1G01010  691  500  157  473  459  228  590  491  565  496 ## AT1G01020  163  185   46  162  119   53  172  212  169  157 ## AT1G03987    0    0    0    0    0    0    0    0    0    0 ## AT1G01030   17   26   49   17   24   48   27   28   47   32 ## AT1G01040  871  756  361  618  641  439  783  692  768  625 ## AT1G03993    0    0    0    1    0    1    three    2    one    i          

OR read the file directly from the github page:

            counts <- read.delim("https://raw.githubusercontent.com/ucdavis-bioinformatics-training/2018-June-RNA-Seq-Workshop/chief/thursday/all_counts.txt") head(counts)          
            ##           C61 C62 C63 C64 C91 C92 C93 C94 I561 I562 I563 I564 I591 I592 ## AT1G01010 289 317 225 343 325 449 310 299  563  438  380  407  678  386 ## AT1G01020 127  78 142 130 156 146 144  95  138  129   99  118  154  140 ## AT1G03987   0   0   0   0   0   0   0   0    0    0    0    0    0    0 ## AT1G01030  17  25  32  24  22  25  21  35   18    6   46   33   19    8 ## AT1G01040 605 415 506 565 762 854 658 753  704  692  641  601  704  508 ## AT1G03993   i   ane   0   0   0   0   i   1    3    0    1    ane    1    0 ##           I593 I594 I861 I862 I863 I864 I891 I892 I893 I894 ## AT1G01010  567  421  130  411  382  190  501  390  480  407 ## AT1G01020  127  154   35  132   97   46  139  175  137  123 ## AT1G03987    0    0    0    0    0    0    0    0    0    0 ## AT1G01030   xiv   21   37   sixteen   19   38   24   18   37   23 ## AT1G01040  733  614  297  521  542  381  651  573  650  550 ## AT1G03993    0    0    0    1    0    1    3    i    ane    1          

Create DGEList object

            d0 <- DGEList(counts)          

two. Preprocessing

Calculate normalization factors

            d0 <- calcNormFactors(d0) d0          
            ## An object of grade "DGEList" ## $counts ##           C61 C62 C63 C64 C91 C92 C93 C94 I561 I562 I563 I564 I591 I592 ## AT1G01010 289 317 225 343 325 449 310 299  563  438  380  407  678  386 ## AT1G01020 127  78 142 130 156 146 144  95  138  129   99  118  154  140 ## AT1G03987   0   0   0   0   0   0   0   0    0    0    0    0    0    0 ## AT1G01030  17  25  32  24  22  25  21  35   xviii    6   46   33   19    viii ## AT1G01040 605 415 506 565 762 854 658 753  704  692  641  601  704  508 ##           I593 I594 I861 I862 I863 I864 I891 I892 I893 I894 ## AT1G01010  567  421  130  411  382  190  501  390  480  407 ## AT1G01020  127  154   35  132   97   46  139  175  137  123 ## AT1G03987    0    0    0    0    0    0    0    0    0    0 ## AT1G01030   fourteen   21   37   16   19   38   24   xviii   37   23 ## AT1G01040  733  614  297  521  542  381  651  573  650  550 ## 34257 more rows ... ##  ## $samples ##     group lib.size norm.factors ## C61     1 10502901    ane.0404937 ## C62     1  9423745    0.9719425 ## C63     1  9437115    one.0401597 ## C64     1 10415490    1.0360806 ## C91     1 10169158    one.0468854 ## xix more rows ...          

Note: calcNormFactors doesn't normalize the data, it just calculates normalization factors for use downstream.

Filter low-expressed genes

            cutoff <- ane drop <- which(apply(cpm(d0), 1, max) < cutoff) d <- d0[-drib,]  dim(d) # number of genes left          
            ## [1] 21080    24          

"Low-expressed" is subjective and depends on the dataset.

Derive experiment data from the sample names

Our experiment has two factors, cultivar ("C", "I5", or "I8") and time (half dozen or 9)

The sample names are the cultivar, followed past the time, followed by the replicate

            snames <- colnames(counts) # Sample names snames          
            ##  [ane] "C61"  "C62"  "C63"  "C64"  "C91"  "C92"  "C93"  "C94"  "I561" "I562" ## [eleven] "I563" "I564" "I591" "I592" "I593" "I594" "I861" "I862" "I863" "I864" ## [21] "I891" "I892" "I893" "I894"          
            cultivar <- substr(snames, 1, nchar(snames) - 2)  time <- substr(snames, nchar(snames) - i, nchar(snames) - 1) cultivar          
            ##  [one] "C"  "C"  "C"  "C"  "C"  "C"  "C"  "C"  "I5" "I5" "I5" "I5" "I5" "I5" ## [15] "I5" "I5" "I8" "I8" "I8" "I8" "I8" "I8" "I8" "I8"          
            time          
            ##  [ane] "six" "6" "6" "half-dozen" "9" "9" "9" "9" "6" "6" "6" "6" "nine" "ix" "9" "9" "6" ## [18] "6" "6" "6" "9" "9" "ix" "9"          

Create a new variable "grouping" that combines cultivar and time

            group <- interaction(cultivar, time) group          
            ##  [ane] C.6  C.6  C.half dozen  C.6  C.9  C.9  C.nine  C.9  I5.6 I5.6 I5.6 I5.six I5.9 I5.9 ## [15] I5.9 I5.9 I8.6 I8.6 I8.6 I8.6 I8.nine I8.9 I8.9 I8.9 ## Levels: C.6 I5.6 I8.six C.9 I5.9 I8.nine          

Note: y'all can also enter grouping data manually, or read it in from an external file. If you exercise this, it is \(very, very, very\) important that you brand sure the metadata is in the same society every bit the column names of the counts table.

Multidimensional scaling (MDS) plot

            plotMDS(d, col = every bit.numeric(group))          

3. Voom transformation and calculation of variance weights

Specify the model to be fitted. We do this before using voom since voom uses variances of the model residuals (observed - fitted)

            mm <- model.matrix(~0 + group)          

The above specifies a model where each coefficient corresponds to a group mean

Voom

            y <- voom(d, mm, plot = T)          

What is voom doing?

  1. Counts are transformed to log2 counts per 1000000 reads (CPM), where "per million reads" is defined based on the normalization factors nosotros calculated before
  2. A linear model is fitted to the log2 CPM for each gene, and the residuals are calculated
  3. A smoothed curve is fitted to the sqrt(residual standard departure) by average expression (run into red line in plot above)
  4. The smoothed curve is used to obtain weights for each gene and sample that are passed into limma along with the log2 CPMs.

More than details at https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-ii-r29

The above is a "good" voom plot. If your voom plot looks like the below, y'all might want to filter more:

            tmp <- voom(d0, mm, plot = T)          

4. Plumbing fixtures linear models in limma

lmFit fits a linear model using weighted least squares for each cistron:

            fit <- lmFit(y, mm) head(coef(fit))          
            ##           groupC.half dozen groupI5.6 groupI8.6 groupC.9 groupI5.ix groupI8.9 ## AT1G01010 iv.837410 5.3738532  5.065354 5.043214 5.5240004  v.363809 ## AT1G01020 3.530869 3.4993460  iii.212860 iii.689622 3.7209961  3.736297 ## AT1G01030 1.250817 0.9293832  1.559242 1.285596 0.4831707  ane.215591 ## AT1G01040 5.676015 five.9469878  5.778889 half-dozen.182374 v.8641107  5.815498 ## AT1G01050 6.598712 half dozen.5013631  vi.463936 6.619239 six.7532886  six.711772 ## AT1G01060 seven.807988 seven.4624783  7.390741 viii.966047 viii.2706387  8.376129          

Comparisons between groups (log fold-changes) are obtained as contrasts of these fitted linear models:

Specify which groups to compare:

Comparison betwixt times 6 and nine for cultivar I5

            contr <- makeContrasts(groupI5.nine - groupI5.six, levels = colnames(coef(fit))) contr          
            ##            Contrasts ## Levels      groupI5.9 - groupI5.half-dozen ##   groupC.6                      0 ##   groupI5.6                    -1 ##   groupI8.half dozen                     0 ##   groupC.9                      0 ##   groupI5.nine                     ane ##   groupI8.9                     0          

Approximate contrast for each cistron

            tmp <- contrasts.fit(fit, contr)          

Empirical Bayes smoothing of standard errors (shrinks standard errors that are much larger or smaller than those from other genes towards the average standard error) (encounter https://www.degruyter.com/doi/10.2202/1544-6115.1027)

            tmp <- eBayes(tmp)          

What genes are most differentially expressed?

            top.table <- topTable(tmp, sort.by = "P", northward = Inf) head(summit.table, 20)          
            ##                logFC  AveExpr          t      P.Value    adj.P.Val ## AT5G37260  3.0480867 6.964609  24.154998 i.193341e-16 two.515563e-12 ## AT3G02990  ane.6484885 iii.304597  13.605334 8.573977e-12 ix.036971e-08 ## AT2G29500 -five.0342224 5.525802 -12.062120 7.937586e-11 5.577477e-07 ## AT3G24520  1.8715741 five.882965  11.710003 1.360008e-10 7.167244e-07 ## AT3G46230 -6.7068648 4.544494 -11.189910 3.081581e-x 1.299195e-06 ## AT5G65630  i.0468903 vii.553775  10.657591 seven.329270e-10 2.575017e-06 ## AT3G24100 -1.2759156 5.988295 -10.237187 1.485118e-09 4.021633e-06 ## AT1G53540 -7.1847583 4.316965 -x.160183 one.693866e-09 4.021633e-06 ## AT4G21320 -2.8641453 4.705136 -10.152258 1.717016e-09 4.021633e-06 ## AT5G48570 -4.0078194 half dozen.422341 -10.036624 2.094832e-09 4.415906e-06 ## AT5G52882  0.8786555 vii.649621   nine.828722 3.007116e-09 5.398811e-06 ## AT5G02810 -4.8872284 3.419336  -ix.816290 3.073327e-09 5.398811e-06 ## AT1G09140 -1.2469280 7.027052  -9.496095 v.419786e-09 eight.788392e-06 ## AT4G24780  1.9288845 six.979568   9.380927 half-dozen.666788e-09 1.003828e-05 ## AT5G59570  1.5315276 three.005741   9.330689 7.300769e-09 1.026001e-05 ## AT4G11880  2.7436628 2.919357   ix.183289 9.547854e-09 one.245739e-05 ## AT4G25500  one.0517676 6.143738   9.155503 i.004628e-08 1.245739e-05 ## AT1G22770 -3.5882843 3.953036  -9.090404 1.132259e-08 i.326002e-05 ## AT1G31230  1.1574470 6.114996   viii.721535 2.252449e-08 2.499033e-05 ## AT2G27820 -0.8709118 5.735808  -8.680724 2.433146e-08 2.564536e-05 ##                   B ## AT5G37260 26.622628 ## AT3G02990 16.006216 ## AT2G29500 fourteen.645646 ## AT3G24520 xiv.385545 ## AT3G46230 12.519533 ## AT5G65630 12.802559 ## AT3G24100 12.109394 ## AT1G53540 ten.864734 ## AT4G21320 eleven.825499 ## AT5G48570 11.752325 ## AT5G52882 11.430101 ## AT5G02810 10.430908 ## AT1G09140 10.854555 ## AT4G24780 10.651922 ## AT5G59570 10.146392 ## AT4G11880  ix.781987 ## AT4G25500 x.250838 ## AT1G22770  nine.958652 ## AT1G31230  9.459672 ## AT2G27820  9.385693          
  • logFC: log2 fold modify of I5.9/I5.half dozen
  • AveExpr: Average expression across all samples, in log2 CPM
  • t: logFC divided by its standard fault
  • P.Value: Raw p-value (based on t) from test that logFC differs from 0
  • adj.P.Val: Benjamini-Hochberg false discovery rate adjusted p-value
  • B: log-odds that gene is DE (arguably less useful than the other columns)

AT5G37260 has higher expression at time 9 than at time 6 (logFC is positive). AT2G29500 has lower expression at time 9 than at time vi (logFC is negative).

How many DE genes are there?

            length(which(superlative.table$adj.P.Val < 0.05))          
            ## [i] 4680          

Write elevation.tabular array to a file

            pinnacle.table$Gene <- rownames(tiptop.table) elevation.table <- superlative.tabular array[,c("Gene", names(top.tabular array)[1:6])] write.table(top.table, file = "time9_v_time6_I5.txt", row.names = F, sep = "\t", quote = F)          

Let's say we want to compare cultivars C and I5 at time half dozen. The only thing we have to alter is the phone call to makeContrasts:

            contr <- makeContrasts(groupI5.half-dozen - groupC.6, levels = colnames(coef(fit))) tmp <- contrasts.fit(fit, contr) tmp <- eBayes(tmp) top.table <- topTable(tmp, sort.past = "P", n = Inf) caput(meridian.tabular array, 20)          
            ##                logFC    AveExpr          t      P.Value    adj.P.Val ## AT4G12520 -9.3396792  0.5671202 -14.118616 4.276991e-12 nine.015898e-08 ## AT3G30720  5.6550000  3.5130064  11.628491 i.543229e-10 one.626564e-06 ## AT5G26270  2.3722487  4.4417048   9.728356 iii.587045e-09 2.520497e-05 ## AT3G33528 -4.7012468 -1.6468149  -8.175366 vi.441093e-08 3.394456e-04 ## AT3G05955 -iii.7995906 -1.6958534  -vii.210032 4.545141e-07 1.916232e-03 ## AT4G28100 -0.8099448  iv.5746806  -vii.041358 6.477323e-07 i.986777e-03 ## AT1G64795 -four.7147903 -1.1051093  -7.032657 6.597456e-07 ane.986777e-03 ## AT3G25730  1.3872163  3.9538977   6.242201 three.650963e-06 nine.620287e-03 ## AT5G05480 -0.4869486  4.6067884  -v.647317 i.392471e-05 three.005506e-02 ## AT2G18193  1.0100187  3.8571257   5.626785 1.459359e-05 iii.005506e-02 ## AT4G01870  1.6373400  5.6170176   5.595305 i.568338e-05 3.005506e-02 ## AT2G14878 -0.5162040  six.4733722  -5.531829 1.814030e-05 3.186647e-02 ## AT2G06995 -3.2115972 -2.3585729  -5.439207 2.244863e-05 3.640132e-02 ## AT4G15248 -1.6999189  2.1452779  -v.392221 2.501939e-05 3.673680e-02 ## AT1G62280 -i.7551000  2.7349000  -5.373243 2.614099e-05 3.673680e-02 ## AT3G28740  ii.2032568  5.5413479   5.284877 3.207698e-05 4.226142e-02 ## AT5G14370  0.8031693  3.7422520   5.174352 4.147430e-05 4.748115e-02 ## AT2G25737  0.8732454  three.6798171   5.163677 4.251876e-05 4.748115e-02 ## AT2G37760  0.7703328  5.4139388   v.141986 iv.472409e-05 four.748115e-02 ## AT2G30400  1.0087276  two.0253964   5.138887 four.504853e-05 4.748115e-02 ##                     B ## AT4G12520  half-dozen.42577720 ## AT3G30720  nine.54391447 ## AT5G26270 11.00209007 ## AT3G33528  1.56894746 ## AT3G05955  2.19224634 ## AT4G28100  half-dozen.18609294 ## AT1G64795  two.33667151 ## AT3G25730  four.53222502 ## AT5G05480  3.26213083 ## AT2G18193  three.23531092 ## AT4G01870  3.11677860 ## AT2G14878  two.94274140 ## AT2G06995  0.02144583 ## AT4G15248  2.49861111 ## AT1G62280  2.67518742 ## AT3G28740  two.42959865 ## AT5G14370  2.25381821 ## AT2G25737  ii.23281223 ## AT2G37760  two.08993315 ## AT2G30400  2.09909628          
            length(which(top.table$adj.P.Val < 0.05)) # number of DE genes          
            ## [ane] xx          
            acme.tabular array$Gene <- rownames(top.table) peak.table <- top.table[,c("Cistron", names(peak.table)[i:half-dozen])] write.table(acme.table, file = "I5_v_C_time6.txt", row.names = F, sep = "\t", quote = F)          

What if we refit our model every bit a two-factor model (rather than using the grouping variable)?

Create new model matrix:

            mm <- model.matrix(~cultivar*time)          

Nosotros are specifying that model includes effects for cultivar, time, and the cultivar-time interaction (which allows the differences between cultivars to differ across time)

            colnames(mm)          
            ## [1] "(Intercept)"      "cultivarI5"       "cultivarI8"       ## [four] "time9"            "cultivarI5:time9" "cultivarI8:time9"          
            y <- voom(d, mm, plot = F) fit <- lmFit(y, mm) head(coef(fit))          
            ##           (Intercept)  cultivarI5 cultivarI8      time9 cultivarI5:time9 ## AT1G01010    4.837410  0.53644370  0.2279446 0.20580445      -0.05565729 ## AT1G01020    3.530869 -0.03152318 -0.3180096 0.15875297       0.06289715 ## AT1G01030    i.250817 -0.32143420  0.3084243 0.03477863      -0.48099113 ## AT1G01040    5.676015  0.27097286  0.1028739 0.50635951      -0.58923660 ## AT1G01050    6.598712 -0.09734846 -0.1347759 0.02052702       0.23139851 ## AT1G01060    7.807988 -0.34550979 -0.4172467 i.15805850      -0.34989810 ##           cultivarI8:time9 ## AT1G01010       0.09265044 ## AT1G01020       0.36468449 ## AT1G01030      -0.37842909 ## AT1G01040      -0.46975071 ## AT1G01050       0.22730960 ## AT1G01060      -0.17267051          
  • The coefficient cultivarI5 represents the difference in mean expression between cultivar I5 and the reference cultivar (cultivar C), for time 6 (the reference level for time)
  • The coefficient time9 represents the difference in mean expression between time 9 and fourth dimension 6, for cultivar C
  • The coefficient cultivarI5:time9 is the difference between times ix and vi of the differences betwixt cultivars I5 and C (interaction result)

Let'southward estimate the divergence between cultivars I5 and C at fourth dimension 6

            tmp <- contrasts.fit(fit, coef = 2) # Direct examination 2d coefficient tmp <- eBayes(tmp) acme.table <- topTable(tmp, sort.by = "P", n = Inf) head(top.tabular array, 20)          
            ##                logFC    AveExpr          t      P.Value    adj.P.Val ## AT4G12520 -nine.3396792  0.5671202 -14.118616 4.276991e-12 9.015898e-08 ## AT3G30720  5.6550000  3.5130064  xi.628491 1.543229e-x i.626564e-06 ## AT5G26270  2.3722487  4.4417048   9.728356 three.587045e-09 2.520497e-05 ## AT3G33528 -4.7012468 -1.6468149  -8.175366 half-dozen.441093e-08 3.394456e-04 ## AT3G05955 -3.7995906 -ane.6958534  -7.210032 iv.545141e-07 i.916232e-03 ## AT4G28100 -0.8099448  4.5746806  -7.041358 6.477323e-07 one.986777e-03 ## AT1G64795 -iv.7147903 -1.1051093  -seven.032657 half-dozen.597456e-07 i.986777e-03 ## AT3G25730  one.3872163  3.9538977   six.242201 3.650963e-06 9.620287e-03 ## AT5G05480 -0.4869486  4.6067884  -5.647317 ane.392471e-05 3.005506e-02 ## AT2G18193  1.0100187  3.8571257   5.626785 1.459359e-05 3.005506e-02 ## AT4G01870  i.6373400  v.6170176   5.595305 ane.568338e-05 3.005506e-02 ## AT2G14878 -0.5162040  6.4733722  -5.531829 1.814030e-05 3.186647e-02 ## AT2G06995 -3.2115972 -2.3585729  -v.439207 2.244863e-05 3.640132e-02 ## AT4G15248 -i.6999189  ii.1452779  -5.392221 2.501939e-05 3.673680e-02 ## AT1G62280 -ane.7551000  2.7349000  -five.373243 ii.614099e-05 3.673680e-02 ## AT3G28740  2.2032568  5.5413479   5.284877 3.207698e-05 4.226142e-02 ## AT5G14370  0.8031693  3.7422520   5.174352 4.147430e-05 4.748115e-02 ## AT2G25737  0.8732454  3.6798171   five.163677 4.251876e-05 iv.748115e-02 ## AT2G37760  0.7703328  5.4139388   5.141986 4.472409e-05 iv.748115e-02 ## AT2G30400  1.0087276  2.0253964   5.138887 4.504853e-05 4.748115e-02 ##                     B ## AT4G12520  six.42577720 ## AT3G30720  nine.54391447 ## AT5G26270 11.00209007 ## AT3G33528  1.56894746 ## AT3G05955  ii.19224634 ## AT4G28100  six.18609294 ## AT1G64795  2.33667151 ## AT3G25730  four.53222502 ## AT5G05480  iii.26213083 ## AT2G18193  3.23531092 ## AT4G01870  3.11677860 ## AT2G14878  2.94274140 ## AT2G06995  0.02144583 ## AT4G15248  2.49861111 ## AT1G62280  ii.67518742 ## AT3G28740  two.42959865 ## AT5G14370  2.25381821 ## AT2G25737  2.23281223 ## AT2G37760  2.08993315 ## AT2G30400  2.09909628          
            length(which(superlative.tabular array$adj.P.Val < 0.05)) # number of DE genes          
            ## [1] 20          

Nosotros get the same results as with the model where each coefficient corresponded to a group hateful. In essence, these are the same model, so use whichever is almost convenient for what you are estimating.

The interaction furnishings cultivarI5:time9 and cultivarI8:time9 are easier to estimate and test in this setup

            head(coef(fit))          
            ##           (Intercept)  cultivarI5 cultivarI8      time9 cultivarI5:time9 ## AT1G01010    four.837410  0.53644370  0.2279446 0.20580445      -0.05565729 ## AT1G01020    3.530869 -0.03152318 -0.3180096 0.15875297       0.06289715 ## AT1G01030    1.250817 -0.32143420  0.3084243 0.03477863      -0.48099113 ## AT1G01040    5.676015  0.27097286  0.1028739 0.50635951      -0.58923660 ## AT1G01050    vi.598712 -0.09734846 -0.1347759 0.02052702       0.23139851 ## AT1G01060    7.807988 -0.34550979 -0.4172467 ane.15805850      -0.34989810 ##           cultivarI8:time9 ## AT1G01010       0.09265044 ## AT1G01020       0.36468449 ## AT1G01030      -0.37842909 ## AT1G01040      -0.46975071 ## AT1G01050       0.22730960 ## AT1G01060      -0.17267051          
            tmp <- contrasts.fit(fit, coef = 5) # Test cultivarI5:time9 tmp <- eBayes(tmp) superlative.table <- topTable(tmp, sort.by = "P", north = Inf) head(top.table, xx)          
            ##                logFC  AveExpr         t      P.Value    adj.P.Val ## AT5G38200  5.2681384 five.459342 12.169571 six.750884e-11 i.423086e-06 ## AT5G06860  3.9331835 6.449356  7.667986 1.770961e-07 ane.306381e-03 ## AT1G08430  7.3873178 5.946190  seven.640080 1.874141e-07 ane.306381e-03 ## AT1G76990 -0.9683284 8.651885 -7.503011 ii.478901e-07 1.306381e-03 ## AT5G18170  ane.4739946 six.811722  vii.145965 5.197419e-07 2.191232e-03 ## AT5G24030  one.5597102 half-dozen.422377  6.737394 1.238209e-06 3.821427e-03 ## AT1G62280  3.3975934 2.734900  6.652058 one.488488e-06 3.821427e-03 ## AT2G41380  3.7331437 5.866261  6.615737 1.610271e-06 3.821427e-03 ## AT2G45360 -iv.8763311 1.093918 -half-dozen.592316 1.694190e-06 3.821427e-03 ## AT2G46790  3.2686459 3.970742  half dozen.530515 ane.937883e-06 iii.821427e-03 ## AT5G15950  3.2415341 5.516374  6.517393 ane.994103e-06 3.821427e-03 ## AT5G67520 -1.8307730 3.290640 -6.362042 2.802299e-06 4.649936e-03 ## AT1G01650 -i.1996420 5.524074 -half dozen.351573 2.867608e-06 four.649936e-03 ## AT5G15330 -ane.3784824 4.115181 -6.268550 three.444132e-06 5.065418e-03 ## AT1G48100  ane.9296680 2.376553  half dozen.247994 3.604425e-06 5.065418e-03 ## AT4G19160 -2.3207711 7.235367 -6.179190 4.198820e-06 v.356807e-03 ## AT1G05890 -0.6529356 half-dozen.455076 -6.160149 iv.380439e-06 five.356807e-03 ## AT5G56010  i.2941382 eight.452257  6.130758 4.676715e-06 5.356807e-03 ## AT4G37870  0.8880398 8.434194  6.116456 4.828242e-06 5.356807e-03 ## AT1G58110 -0.7654977 5.850866 -6.004302 6.205156e-06 6.540234e-03 ##                   B ## AT5G38200 13.058923 ## AT5G06860  seven.282910 ## AT1G08430  6.287578 ## AT1G76990  7.111815 ## AT5G18170  six.408931 ## AT5G24030  5.581114 ## AT1G62280  4.943495 ## AT2G41380  5.221262 ## AT2G45360  iii.503402 ## AT2G46790  4.956204 ## AT5G15950  5.114242 ## AT5G67520  4.615689 ## AT1G01650  four.778464 ## AT5G15330  4.553861 ## AT1G48100  iii.973489 ## AT4G19160  4.412505 ## AT1G05890  4.372387 ## AT5G56010  iv.309468 ## AT4G37870  4.278843 ## AT1G58110  4.041773          
            length(which(acme.table$adj.P.Val < 0.05))                      
            ## [one] 882          

The log fold change hither is the divergence between cultivarI5 and cultivar C in the log fold changes betwixt times 9 and 6. It is ALSO the difference between times ix and 6 in the log fold changes between cultivarI5 and cultivar C.

A gene for which this interaction effect is meaning is one for which the event of time differs between cultivars, and for which the effect of cultivar differs between times.

5. More complicated models

Specifying a unlike model is simply a thing of changing the calls to model.matrix (and maybe to contrasts.fit).

Let'due south say nosotros have information on the RNA extraction batch:

            batch <- factor(rep(rep(i:2, each = 2), half dozen)) batch          
            ##  [1] 1 1 two 2 1 1 2 2 i 1 2 ii 1 1 2 2 1 1 2 2 1 i ii two ## Levels: 1 2          

To adjust for batch in the analysis, add batch to the end of the telephone call to model matrix. Everything else about the code stays the same:

            mm <- model.matrix(~0 + group + batch) y <- voom(d, mm, plot = F) fit <- lmFit(y, mm) contr <- makeContrasts(groupI5.vi - groupC.6, levels = colnames(coef(fit))) tmp <- contrasts.fit(fit, contr) tmp <- eBayes(tmp) superlative.table <- topTable(tmp, sort.by = "P", n = Inf) head(pinnacle.table, 20)          
            ##                logFC    AveExpr          t      P.Value    adj.P.Val ## AT4G12520 -9.1849480  0.5671202 -13.894622 1.254284e-11 2.644031e-07 ## AT3G30720  5.6553502  3.5130064  11.626230 two.902705e-ten 3.059451e-06 ## AT5G26270  ii.3756632  four.4417048  ten.532157 1.568494e-09 1.102129e-05 ## AT3G33528 -iv.6930275 -one.6468149  -8.114771 1.054174e-07 5.555497e-04 ## AT1G64795 -4.6966105 -1.1051093  -7.585967 2.920305e-07 i.231200e-03 ## AT4G28100 -0.8142969  4.5746806  -7.332425 iv.823635e-07 1.694704e-03 ## AT3G05955 -three.7806772 -ane.6958534  -7.158687 six.837456e-07 two.059051e-03 ## AT1G62280 -ane.7706304  2.7349000  -6.183451 5.223128e-06 1.278365e-02 ## AT2G18193  1.0171979  3.8571257   6.155724 5.544006e-06 1.278365e-02 ## AT3G25730  i.3861828  3.9538977   6.114089 6.064349e-06 1.278365e-02 ## AT4G34138  i.2583243  vi.2839069   5.874446 1.020572e-05 1.955787e-02 ## AT1G67890  0.4262907  five.1117436   5.759619 1.312886e-05 two.306302e-02 ## AT3G63430 -ane.2020149  three.7073650  -5.617984 one.794992e-05 ii.838999e-02 ## AT5G05480 -0.4876253  4.6067884  -5.595805 1.885483e-05 two.838999e-02 ## AT4G15248 -1.7089812  2.1452779  -v.492499 ii.372592e-05 3.079319e-02 ## AT2G14878 -0.5161993  6.4733722  -5.474049 two.472296e-05 3.079319e-02 ## AT2G06995 -3.1599964 -2.3585729  -5.445134 2.637233e-05 3.079319e-02 ## AT4G01870  one.6376531  5.6170176   5.444359 2.641803e-05 3.079319e-02 ## AT1G15380 -0.6846986  7.6756786  -5.421227 two.782070e-05 3.079319e-02 ## AT3G45980 -0.3280034  7.2242831  -5.399375 2.921555e-05 3.079319e-02 ##                    B ## AT4G12520  5.6334654 ## AT3G30720  8.9070420 ## AT5G26270 eleven.6801080 ## AT3G33528  1.1444487 ## AT1G64795  ii.4478920 ## AT4G28100  half-dozen.4756663 ## AT3G05955  1.7078155 ## AT1G62280  4.1194174 ## AT2G18193  4.1644124 ## AT3G25730  4.0704267 ## AT4G34138  iii.5467936 ## AT1G67890  3.3320337 ## AT3G63430  3.0358961 ## AT5G05480  two.9985093 ## AT4G15248  2.5246096 ## AT2G14878  2.6674208 ## AT2G06995 -0.2708462 ## AT4G01870  2.6397139 ## AT1G15380  2.5436455 ## AT3G45980  2.4978614          
            length(which(top.table$adj.P.Val < 0.05))          
            ## [1] 27          

What if we desire to adjust for a continuous variable like RIN score:

            # Generate example RIN data fix.seed(99) RIN <- rnorm(north = 24, mean = seven.v, sd = 1) RIN          
            ##  [1] seven.713963 vii.979658 vii.587829 7.943859 7.137162 7.622674 6.636155 ##  [8] 7.989624 7.135883 6.205758 6.754231 8.421550 viii.250054 4.991446 ## [fifteen] four.459066 vii.500266 7.105981 5.754972 vii.998631 7.770954 eight.598922 ## [22] 8.252513 7.440583 vii.155431          

Model adjusting for RIN score

            mm <- model.matrix(~0 + group + RIN) y <- voom(d, mm, plot = F) fit <- lmFit(y, mm) contr <- makeContrasts(groupI5.vi - groupC.6, levels = colnames(coef(fit))) tmp <- contrasts.fit(fit, contr) tmp <- eBayes(tmp) top.table <- topTable(tmp, sort.by = "P", north = Inf) head(top.table, 20)          
            ##                logFC    AveExpr          t      P.Value    adj.P.Val ## AT3G30720  5.6296669  3.5130064  xviii.264852 8.228016e-14 i.078373e-09 ## AT4G12520 -9.2914672  0.5671202 -18.052498 i.023125e-thirteen 1.078373e-09 ## AT5G26270  2.4136835  4.4417048  10.642693 one.300777e-09 9.140129e-06 ## AT3G33528 -four.9109883 -1.6468149  -9.609516 7.125167e-09 three.754963e-05 ## AT1G64795 -4.7654967 -1.1051093  -vii.254218 v.606683e-07 2.363777e-03 ## AT4G28100 -0.8023348  4.5746806  -vi.994622 9.486720e-07 3.333001e-03 ## AT3G25730  ane.4185849  3.9538977   six.546417 2.402778e-06 7.235795e-03 ## AT4G15248 -1.7173926  2.1452779  -6.178114 5.259531e-06 1.385886e-02 ## AT3G28740  two.3364878  5.5413479   5.900899 9.592908e-06 2.114492e-02 ## AT3G05955 -3.6893043 -1.6958534  -5.880481 ane.003080e-05 2.114492e-02 ## AT4G01870  one.6906107  v.6170176   five.821660 1.141042e-05 2.126536e-02 ## AT1G60750  ane.8497644  3.8830018   5.794730 1.210552e-05 2.126536e-02 ## AT1G17180  1.5714528  4.2264956   v.523743 2.205086e-05 3.575631e-02 ## AT2G18193  ane.0022545  three.8571257   five.472489 ii.472195e-05 3.722420e-02 ## AT5G07870  0.8857028  v.0986202   5.401616 2.896986e-05 3.817990e-02 ## AT3G27940  1.4469855 -0.4298164   5.401474 ii.897905e-05 3.817990e-02 ## AT4G34135  i.0569416  6.3882644   5.238191 4.183945e-05 four.792741e-02 ## AT5G48010  2.1658813  5.3481653   5.225912 iv.301563e-05 4.792741e-02 ## AT5G05480 -0.4793068  four.6067884  -5.191084 4.653849e-05 iv.792741e-02 ## AT4G34138  1.3466558  half-dozen.2839069   v.184933 iv.719053e-05 4.792741e-02 ##                   B ## AT3G30720 17.859502 ## AT4G12520 11.272023 ## AT5G26270 12.140983 ## AT3G33528  3.415722 ## AT1G64795  iii.228510 ## AT4G28100  5.862248 ## AT3G25730  4.969968 ## AT4G15248  4.106498 ## AT3G28740  three.577535 ## AT3G05955  0.636844 ## AT4G01870  three.399840 ## AT1G60750  3.421066 ## AT1G17180  2.815864 ## AT2G18193  two.734741 ## AT5G07870  2.518925 ## AT3G27940  1.914480 ## AT4G34135  2.102290 ## AT5G48010  2.121253 ## AT5G05480  2.095733 ## AT4G34138  i.986345          
            length(which(pinnacle.tabular array$adj.P.Val < 0.05))          
            ## [1] 28          

What if we want to await at the correlation of cistron expression with a continuous variable like pH?

            # Generate example pH data set up.seed(99) pH <- rnorm(north = 24, mean = 8, sd = one.five) pH          
            ##  [1] 8.320944 8.719487 8.131743 eight.665788 7.455743 8.184011 6.704232 ##  [8] 8.734436 7.453825 half dozen.058637 half-dozen.881346 9.382326 9.125082 4.237169 ## [15] 3.438599 8.000399 7.408972 5.382459 viii.747947 8.406431 ix.648382 ## [22] 9.128770 7.910875 seven.483147          

Specify model matrix:

            mm <- model.matrix(~pH) head(mm)          
            ##   (Intercept)       pH ## one           1 8.320944 ## ii           ane viii.719487 ## iii           1 8.131743 ## four           one 8.665788 ## 5           1 7.455743 ## 6           1 8.184011          
            y <- voom(d, mm, plot = F) fit <- lmFit(y, mm) tmp <- contrasts.fit(fit, coef = two) # test "pH" coefficient tmp <- eBayes(tmp) top.tabular array <- topTable(tmp, sort.by = "P", n = Inf) head(top.table, 20)          
            ##                 logFC     AveExpr         t      P.Value adj.P.Val ## AT3G63220  0.10930007  3.72231850  3.984813 0.0005224914 0.9979999 ## AT4G22790 -0.18156791  2.62688789 -three.924496 0.0006095427 0.9979999 ## AT3G25855 -0.18661169  0.39590825 -3.888223 0.0006686278 0.9979999 ## AT5G48060 -0.14380178  3.08645012 -3.729192 0.0010015260 0.9979999 ## AT1G60030 -0.18144388  1.90130680 -3.688498 0.0011101065 0.9979999 ## AT5G50290 -0.19527796  0.87276139 -3.683545 0.0011240846 0.9979999 ## AT1G69588 -0.18176455  ane.07793155 -3.430115 0.0021225677 0.9979999 ## AT1G17250  0.18143048  0.91048376  3.423565 0.0021574364 0.9979999 ## AT5G66350 -0.18602484  0.59789247 -3.376519 0.0024247787 0.9979999 ## AT5G49320 -0.17105888  1.05400394 -3.292867 0.0029815543 0.9979999 ## AT3G17730 -0.16801297  1.56449303 -3.282393 0.0030594390 0.9979999 ## AT5G61420 -0.18139376  5.05638487 -3.280444 0.0030741424 0.9979999 ## AT4G05430 -0.20982687 -0.33505919 -3.272344 0.0031360082 0.9979999 ## AT1G51460 -0.26599783 -0.19515578 -3.249322 0.0033184675 0.9979999 ## AT1G59730 -0.26312342  2.24935746 -3.235196 0.0034354577 0.9979999 ## AT2G38920 -0.25174501  0.02998871 -iii.228661 0.0034909211 0.9979999 ## AT4G11655  0.30473405 -0.14566023  iii.217560 0.0035871102 0.9979999 ## AT1G63020 -0.11884059  3.38076346 -3.210234 0.0036519841 0.9979999 ## AT2G16400 -0.09043316  4.49774646 -3.198184 0.0037611636 0.9979999 ## AT1G17240  0.22499668  1.07480792  three.181368 0.0039188025 0.9979999 ##                    B ## AT3G63220 -0.3281633 ## AT4G22790 -0.5717697 ## AT3G25855 -1.5833958 ## AT5G48060 -0.8575577 ## AT1G60030 -1.1988239 ## AT5G50290 -1.5961103 ## AT1G69588 -one.8900360 ## AT1G17250 -2.2539850 ## AT5G66350 -2.1461291 ## AT5G49320 -2.1077044 ## AT3G17730 -1.9798846 ## AT5G61420 -ane.6446842 ## AT4G05430 -ii.6782487 ## AT1G51460 -2.5644227 ## AT1G59730 -ane.8823323 ## AT2G38920 -2.5302070 ## AT4G11655 -3.0281427 ## AT1G63020 -1.8275637 ## AT2G16400 -i.8149709 ## AT1G17240 -2.5144550          
            length(which(top.table$adj.P.Val < 0.05))          
            ## [1] 0          

In this case, limma is fitting a linear regression model, which here is a straight line fit, with the slope and intercept defined by the model coefficients:

            AT1G60030 <- y$E["AT1G60030",] plot(AT1G60030 ~ pH, ylim = c(0, 3.v)) intercept <- coef(fit)["AT1G60030", "(Intercept)"] gradient <- coef(fit)["AT1G60030", "pH"] abline(a = intercept, b = slope)          

In this example, the log fold alter logFC is the gradient of the line, or the change in gene expression (on the log2 CPM calibration) for each unit increase in pH.

Hither, a logFC of -0.xix means a 0.19 log2 CPM decrease in gene expression for each unit increment in pH, or a 14% decrease on the CPM scale (2^0.19 = ane.14).

6. A bit more on linear models

Limma fits a linear model to each gene.

Linear models include analysis of variance (ANOVA) models, linear regression, and any model of the form

\[Y = \beta_0 + \beta_{1}X_{1} + \beta_{2}X_{ii} + \dots + \beta_{p}X_{p} + \epsilon\] The covariates 10 tin be:

  • a continuous variable (pH, RIN score, age, weight, temperature, etc.)
  • Dummy variables coding a chiselled covariate (similar cultivar, time, and group)

The \(\beta\)'s are unknown parameters to be estimated.

In limma, the \(\beta\)'south are the log fold changes.

The error (residual) term \(\epsilon\) is assumed to be normally distributed with a variance that is constant across the range of the data.

Normally distributed means the residuals come from a distribution that looks like this:

The log2 transformation that voom applies to the counts makes the information "normal plenty", but doesn't completely stabilize the variance:

            tmp <- voom(d, mm, plot = T)          

The log2 counts per million are more variable at lower expression levels. The variance weights calculated past voom accost this situation.

            sessionInfo()          
            ## R version 3.five.0 (2018-04-23) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 10 x64 (build 16299) ##  ## Matrix products: default ##  ## locale: ## [i] LC_COLLATE=English_United States.1252  ## [2] LC_CTYPE=English_United States.1252    ## [3] LC_MONETARY=English_United States.1252 ## [4] LC_NUMERIC=C                           ## [5] LC_TIME=English_United States.1252     ##  ## attached base packages: ## [1] stats     graphics  grDevices utils     datasets  methods   base of operations      ##  ## other attached packages: ## [ane] edgeR_3.22.2 limma_3.36.1 ##  ## loaded via a namespace (and not fastened): ##  [ane] locfit_1.5-9.one  Rcpp_0.12.17    lattice_0.20-35 digest_0.6.fifteen   ##  [five] rprojroot_1.3-2 grid_3.v.0      backports_1.i.ii magrittr_1.5    ##  [9] evaluate_0.10.one stringi_1.1.vii   rmarkdown_1.10  tools_3.5.0     ## [13] stringr_1.three.1   yaml_2.1.19     compiler_3.5.0  htmltools_0.three.6 ## [17] knitr_1.xx          

Can Limma Be Used With Already Normalized Data,

Source: https://ucdavis-bioinformatics-training.github.io/2018-June-RNA-Seq-Workshop/thursday/DE.html

Posted by: haidereverporly.blogspot.com

0 Response to "Can Limma Be Used With Already Normalized Data"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel