airway之workflow
1)airway簡(jiǎn)介
在該workflow中,所用的數(shù)據(jù)集來(lái)自RNA-seq,氣道平滑肌細(xì)胞(airway smooth muscle cells )用氟美松(糖皮質(zhì)激素,抗炎藥)處理。例如,哮喘患者使用糖皮質(zhì)激素來(lái)減少呼吸道炎癥,在該實(shí)驗(yàn)設(shè)計(jì)中,4種細(xì)胞類(lèi)型(airway smooth muscle cell lines )用1微米地塞米松處理18個(gè)小時(shí)。每一個(gè)cell lines都有對(duì)照與處理組(a treated and an untreated sample).關(guān)于實(shí)驗(yàn)設(shè)計(jì)的更多信息可以查看PubMed entry 24926665 ,原始數(shù)據(jù) GEO entry GSE52778.
2)Preparing count matrices(準(zhǔn)備制作表達(dá)矩陣)
這個(gè)矩陣必須是原始的表達(dá)量,不能是任何矯正過(guò)的表達(dá)量(RPKM等)。因?yàn)镈ESeq2的統(tǒng)計(jì)模型在處理原始表達(dá)矩陣會(huì)展示出強(qiáng)大實(shí)力,且會(huì)自動(dòng)處理library size differences。
2.1 Aligning reads to a reference genome
for f in 'cat files'
do
STAR --genomeDir ../STAR/ENSEMBL.homo_sapiens.release-75 --readFilesIn fastq/$f\_1.fastq fastq/$f\_2.fastq --runThreadN 12 --outFileNamePrefix aligned/$f.
done #比對(duì)
for f in 'cat files'
do
samtools view -bS aligned/$f.Aligned.out.sam -o aligned/$f.bam; done #轉(zhuǎn)bam文件
done
因?yàn)榘凑赵墨I(xiàn)跑流程,則數(shù)據(jù)量太大,耗時(shí)間,因此在airway中只用了8個(gè)樣本的一些子集,來(lái)減少運(yùn)行時(shí)間及內(nèi)存。
library("airway")
dir <- system.file("extdata", package="airway", mustWork=TRUE) #查看數(shù)據(jù)集所在路徑
list.files(dir)
csvfile <- file.path(dir,"sample_table.csv") #文件所在絕對(duì)路徑
(sampleTable <- read.csv(csvfile,row.names=1)) #讀取表格
filenames <- file.path(dir, paste0(sampleTable$Run, "_subset.bam"))
file.exists(filenames) #驗(yàn)證這8個(gè)文件在不
library("Rsamtools")
bamfiles <- BamFileList(filenames, yieldSize=2000000) #提取2000000序列,降低時(shí)間和消耗,文件1:用于構(gòu)建assay信息
seqinfo(bamfiles[1]) #查看信息
sampleTable信息:
2.2 Defining gene models( 最主要的一步是構(gòu)建TxDb 對(duì)象)
Next, we need to read in the gene model that will be used for counting reads/fragments.Here,we want to make a list of exons grouped by gene for counting read/fragments.
如果是自己的數(shù)據(jù),首先下載gtf/gff3文件,并用GenomicFeatures 包中的makeTxDbFromGFF()函數(shù)構(gòu)建TxDb object。(TxDb對(duì)象是一個(gè)包含frange-based objects, such as exons, transcripts, and genes)
library("GenomicFeatures")
gtffile <- file.path(dir,"Homo_sapiens.GRCh37.75_subset.gtf") #airway測(cè)試自帶的gtf文件,自己的數(shù)據(jù)需要下載gtf文件
(txdb <- makeTxDbFromGFF(gtffile, format="gtf", circ_seqs=character())) #讀取gtf文件構(gòu)建TxDb object對(duì)象
(ebg <- exonsBy(txdb, by="gene")) #構(gòu)建外顯子groped by gene ,文件2:rowRanges()信息
2.3 Read counting step
上述準(zhǔn)備工作做完之后,統(tǒng)計(jì)表達(dá)量非常簡(jiǎn)單。 利用GenomicAlignments包中的summarizeOverlaps()函數(shù),返回一個(gè)包含多種信息的SummarizedExperiment 對(duì)象。同時(shí)由于數(shù)據(jù)量大,對(duì)于一個(gè)包含30 million aligned reads的單個(gè)文件,summarizeOverlaps()函數(shù)至少需要30 minutes。為了節(jié)省時(shí)間可以利用BiocParallel包中的register()函數(shù)進(jìn)行多核運(yùn)行。
register(SerialParam()) #設(shè)置多核運(yùn)行
se <- summarizeOverlaps(features=ebg, reads=bamfiles, #創(chuàng)建SummarizedExperiment object with counts,上面文件1和2
mode="Union", # describes what kind of read overlaps will be counted.
singleEnd=FALSE, ######表示是pair-end
ignore.strand=TRUE, ##是否是鏈特異性 strand-specific
fragments=TRUE ) ##specify if unpaired reads should be counted
save(se,file='count_matrix.Rdata')
2.4 SummarizedExperiment
上述產(chǎn)生的SummarizedExperiment 對(duì)象,由3部分組成:粉色框是表達(dá)矩陣(matrix of counts,)儲(chǔ)存在assay中,如果是多個(gè)矩陣可以儲(chǔ)存在assays當(dāng)中。淺藍(lán)色框是rowRanges,包含genomic ranges的信息,及gtf中的gene model信息。綠色框包含樣本信息(information about the samples)。
dim(se) ## [1] 20 8 assayNames(se) ## "counts" head(assay(se), 3) #查看表達(dá)矩陣 colSums(assay(se)) rowRanges(se) #rowRanges信息 str(metadata(rowRanges(se))) #metadata是構(gòu)建基因模型的gtf文件信息 colData(se) #查看sample信息,因?yàn)檫€沒(méi)有制作所以目前是空的 (colData(se) <- DataFrame(sampleTable)) #將樣本實(shí)驗(yàn)設(shè)計(jì)信息傳遞給colData(se),文件3:colData信息。涉及到實(shí)驗(yàn)設(shè)計(jì)
-------------------------------------------------------------------------------------------------華麗麗的分割線----------------------------------------------------------------------
At this point, we have counted the fragments which overlap the genes in the gene model we specified。這里統(tǒng)計(jì)完之后,我們可以用一系列的包,進(jìn)行差異分析,包括(limma,edgeR,DESeq2等)。下面我們將用DESeq2。在DESeq2中最主要的是構(gòu)建DESeqDataSet 對(duì)象,可以用上述 SummarizedExperiment objects轉(zhuǎn)化為 DESeqDataSet objects。兩者之間的區(qū)別是:
1)SummarizedExperiment 對(duì)象中的assay slot 被用 DESeq2中的counts accessor function代替, 且 DESeqDataSet class 強(qiáng)制表達(dá)矩陣中的values值為非負(fù)整數(shù)。
2)DESeqDataSet 包含design formula(設(shè)計(jì)公式)。實(shí)驗(yàn)設(shè)計(jì)在最初的分析中被指定,將 告訴DESeq2在分析中如何處理samples。
A second difference is that the DESeqDataSet has an associated design formula. The experimental design is specified at the beginning of the analysis, as it will inform many of the DESeq2 functions how to treat the samples in the analysis (one exception is the size factor estimation, i.e., the adjustment for differing library sizes, which does not depend on
the design formula). The design formula tells which columns in the sample information table (colData) specify the experimental design and how these factors should be used in the analysis
最簡(jiǎn)單的實(shí)驗(yàn)設(shè)計(jì)是 ~ condition,這里condition是 column in colData(dds),指定樣本所在的分組。在airway experiment,這里我們指定實(shí)驗(yàn)設(shè)計(jì)為~ cell + dex表示希望檢測(cè)對(duì)于不同cell line(cell). 進(jìn)行dexamethasonecontrolling (dex,氟美松處理) 的效果。
----------------------------------------------------------------------------------------------------分割完畢-----------------------------------------------------------------------------
3、The DESeqDataSet, sample information, and the design formula
有兩種方法構(gòu)建DESeqDataSet 對(duì)象:
3.1 Starting from SummarizedExperiment( 直接將SummarizedExperiment轉(zhuǎn)化為DESeqDataSet )
以airway自帶數(shù)據(jù)為例:
data("airway")
se <- airway
se$dex <- relevel(se$dex, "untrt") #specify that untrt is the reference level for the dex variable
se$dex
round( colSums(assay(se)) / 1e6, 1 ) #count that uniquely aligned to the genes 。round表示四舍五入,1表示小數(shù)點(diǎn)后一位
colData(se) #因?yàn)閍irway數(shù)據(jù)集制作好了這個(gè)矩陣,自己的數(shù)據(jù)需要用前面的方法制作這個(gè)。
library("DESeq2")
dds <- DESeqDataSet(se, design = ~ cell + dex) #構(gòu)建DESeqDataSet對(duì)象
3.2 Starting from count matrices( 如果你之前沒(méi)有準(zhǔn)備SummarizedExperiment, 只有count matrix 和 sample information ,利用這兩個(gè)文件及DESeqDataSetFromMatrix()函數(shù)構(gòu)建 DESeqDataSet對(duì)象 )
countdata <- assay(se) ##這里從前面的assay中取出表達(dá)矩陣用于演示,countdata: a table with the fragment counts
head(countdata, 3)
coldata <- colData(se) ##這里從前面的assay中取出樣本信息用于演示,coldata: a table with information about the samples
(ddsMat <- DESeqDataSetFromMatrix(countData = countdata,
colData = coldata,
design = ~ cell + dex) #從countdata表達(dá)矩陣和coldata樣本信息構(gòu)建DESeqDataSet對(duì)象
4)Exploratory analysis and visualization(數(shù)據(jù)探索分析及可視化)
4.1 Pre-filtering the dataset
nrow(dds) #64102 dds <- dds[ rowSums(counts(dds)) > 1, ]#remove rows of the DESeqDataSet that have no counts, or only a single count across all samples: nrow(dds) #29391
4.2 The rlog transformation
需要rlog原因:
When the expected amount of variance is approximately the same across different mean values, the data is said to be homoskedastic(同方差). For RRNA-seq raw counts, however, the variance grows with the mean. For example, if one performs PCA directly on a matrix of size-factor-normalized read counts, the result typically depends only on the few most strongly expressed genes because they show the largest absolute differences between samples. A simple and often used strategy to avoid this is to take the logarithm of the normalized count values plus a small pseudocount; however, now the genes with the very lowest counts will tend to dominate the results because, due to the strong Poisson noise inherent to small count values, and the fact that the logarithm amplifies differences for the smallest values, these low count genes will show the strongest relative differences between samples.As a solution, DESeq2 offers transformations for count data that stabilize the variance across the mean. One such transformation is the regularized-logarithm transformation or rlog2. For genes with high counts, the rlog transformation will give similar result to the ordinary log2 transformation of normalized counts. For genes with lower counts, however, the values are shrunken towards the genes’ averages across all samples. Using an empirical Bayesian prior on inter-sample differences in the form of a ridge penalty, the rlog-transformed data then becomes approximately homoskedastic, and can be used directly for computing distances between samples and making PCA plots. Another transformation, the variance stabilizing transformation17, is discussed alongside the rlog in the DESeq2 vignette.
rlog()函數(shù)返回一個(gè)SummarizedExperiment對(duì)象,其中 assay slot包含 rlog-轉(zhuǎn)換后的值。
rld <- rlog(dds, blind=FALSE) head(assay(rld), 3) par( mfrow = c( 1, 2 ) ) dds <- estimateSizeFactors(dds) #估計(jì)size factors to account for sequencing depth plot(log2(counts(dds, normalized=TRUE)[,1:2] + 1),pch=16, cex=0.3) #then specify normalized=TRUE plot(assay(rld)[,1:2],pch=16, cex=0.3) #Sequencing depth correction is done automatically for the rlog method (and for varianceStabilizingTransformation).
普通的log2(左)及rlog2(右)之間的比較
4.3 Sample distances
評(píng)估樣本間關(guān)系(即總體相似度),哪些樣本更相近,哪些更遠(yuǎn),是否符合實(shí)驗(yàn)設(shè)計(jì)?
方法一:用歐式距離(Euclidean distance ),用rlog-transformed data(即normalized之后的矩陣)
sampleDists <- dist( t( assay(rld) ) ) #歐氏距離,為確保所有基因大致相同的contribution用rlog-transformed data sampleDists
樣本間關(guān)系,歐式距離。
library("pheatmap") ##用熱圖展示樣品間關(guān)系
library("RColorBrewer")
sampleDistMatrix <- as.matrix( sampleDists )
rownames(sampleDistMatrix) <- paste( rld$dex, rld$cell, sep="-" )
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
clustering_distance_rows=sampleDists,
clustering_distance_cols=sampleDists,
col=colors)
方法二:用PoiClaClu 包計(jì)算泊松距離(Poisson Distance),必須是原始表達(dá)矩陣。The PoissonDistance function takes the original count matrix
(not normalized) with samples as rows instead of columns, so we need to transpose the counts in dds.
library("PoiClaClu")
poisd <- PoissonDistance(t(counts(dds))) ##對(duì)數(shù)據(jù)轉(zhuǎn)置
samplePoisDistMatrix <- as.matrix( poisd$dd ) #計(jì)算樣品間關(guān)系
rownames(samplePoisDistMatrix) <- paste( rld$dex, rld$cell, sep="-" )
colnames(samplePoisDistMatrix) <- NULL
pheatmap(samplePoisDistMatrix,
clustering_distance_rows=poisd$dd,
clustering_distance_cols=poisd$dd,
col=colors)
4.4 PCA plot(另一個(gè)展示樣品間關(guān)系的圖形)
一種方法是用DESeq2包中的plotPCA()函數(shù)
plotPCA(rld, intgroup = c("dex", "cell")) #DESeq2包plotPCA()函數(shù)
另一種方法是用ggplot2.需要調(diào)用 plotPCA()返回用于畫(huà)圖的數(shù)據(jù)
(data <- plotPCA(rld, intgroup = c( "dex", "cell"), returnData=TRUE)) #調(diào)用plotPCA生成要畫(huà)圖的data
percentVar <- round(100 * attr(data, "percentVar")) #PC1,pC2
library("ggplot2") #畫(huà)圖
ggplot(data, aes(PC1, PC2, color=dex, shape=cell)) + geom_point(size=3) +
xlab(paste0("PC1: ",percentVar[1],"% variance")) +
ylab(paste0("PC2: ",percentVar[2],"% variance"))
data:
4.4 MDS plot(另一種PCA圖,multidimensional scaling (MDS) function )
在沒(méi)有表達(dá)矩陣值,但只有一個(gè)距離矩陣的情況下使用。This is useful when we don’t have a matrix of data, but only a matrix of distances. Here we compute the MDS for the distances calculated from the rlog transformed counts and plot these。
##用歸一化后的數(shù)據(jù)產(chǎn)生的距離矩陣(calculated from the rlog transformed) mdsData <- data.frame(cmdscale(sampleDistMatrix)) mds <- cbind(mdsData, as.data.frame(colData(rld))) ggplot(mds, aes(X1,X2,color=dex,shape=cell)) + geom_point(size=3) ###用原始數(shù)據(jù)產(chǎn)生的泊松距離矩陣(calculated from the raw) mdsPoisData <- data.frame(cmdscale(samplePoisDistMatrix)) mdsPois <- cbind(mdsPoisData, as.data.frame(colData(dds))) ggplot(mdsPois, aes(X1,X2,color=dex,shape=cell)) + geom_point(size=3)
5)Differential expression analysis(差異分析)
5,1)Running the differential expression pipeline
用DESeq()函數(shù)對(duì)原始表達(dá)矩陣進(jìn)行差異表達(dá)分析(run the differential expression pipeline on the raw counts with a single call to the function DESeq)
dds <- DESeq(dds) #差異表達(dá)分析
5.2)Building the results table
利用result函數(shù)(),將返回一個(gè)數(shù)據(jù)框?qū)ο?包含有 metadata information(每一列的意義)。
(res <- results(dds)) #產(chǎn)生log2 fold change 及p值 mcols(res, use.names=TRUE) #查看meta信息 summary(res) #產(chǎn)看其他信息
每列的含義:
baseMean:is a just the average of the normalized count values, dividing by size factors, taken over all samples in the DESeqDataSet
log2FoldChange: the effect size estimate. It tells us how much the gene’s expression seems to have changed due to treatment with dexamethasone in comparison to untreated samples
lfcSE:the standard error estimate for the log2 fold change estimate,(the effect size estimate has an uncertainty associated with it,)
p value:statistical test , the result of this test is reported as a p value. Remember that a p value indicates the probability that a fold change as strong as the observed one, or even stronger, would be seen under the situation described by the null hypothesis.
padj:adjusted p-value
2種更嚴(yán)格的方法去篩選顯著差異基因:
降低false discovery rate threshold (the threshold on padj in the results table)
提升log2 fold change threshold(from 0 using the lfc Threshold argument of results)
#如果要降低false discovery rate threshold,將要設(shè)置的值傳遞給results(),使用該閾值進(jìn)行獨(dú)立的過(guò)濾: res.05 <- results(dds, alpha=.05) #設(shè)置閾值FDR=0.05 table(res.05$padj < .05) #產(chǎn)看結(jié)果 #如果提升log2 fold change threshold show more substantial changes due to treatment, we simply supply a value on the log2 scale. resLFC1 <- results(dds, lfcThreshold=1) table(resLFC1$padj < 0.1)
Sometimes a subset of the p values in res will be NA (“not available”). This is DESeq’s way of reporting that all counts for this gene were zero, and hence no test was applied. In addition, p values can be assigned NA if the gene was excluded from analysis because it contained an extreme count outlier. For more information, see the outlier detection section of the DESeq2 vignette.
5.2)Other comparisons
可以利用result函數(shù)中的contrast參數(shù)來(lái)選擇任何2組數(shù)據(jù)進(jìn)行比較
results(dds, contrast=c("cell", "N061011", "N61311"))
5.3)Multiple testing(詳細(xì)看文檔理解FDR)
sum(res$pvalue < 0.05, na.rm=TRUE) sum(!is.na(res$pvalue)) sum(res$padj < 0.1, na.rm=TRUE) resSig <- subset(res, padj < 0.1) head(resSig[ order(resSig$log2FoldChange), ]) head(resSig[ order(resSig$log2FoldChange, decreasing=TRUE), ])
5.4)Plotting results
用 plotCounts()函數(shù)查看特定基因的表達(dá)量:需要三個(gè)參數(shù),DESeqDataSet, gene name, and the group over which to plot the counts (Figure 8).
topGene <- rownames(res)[which.min(res$padj)]
plotCounts(dds, gene=topGene, intgroup=c("dex"))
也可以用ggplot2畫(huà)傳統(tǒng)的散點(diǎn)圖
data <- plotCounts(dds, gene=topGene, intgroup=c("dex","cell"), returnData=TRUE)
ggplot(data, aes(x=dex, y=count, color=cell)) +
scale_y_log10() +
geom_point(position=position_jitter(width=.1,height=0), size=3)
ggplot(data, aes(x=dex, y=count, fill=dex)) +
scale_y_log10() +
geom_dotplot(binaxis="y", stackdir="center")
ggplot(data, aes(x=dex, y=count, color=cell, group=cell)) +
scale_y_log10() + geom_point(size=3) + geom_line()
An MA-plot provides a useful overview for an experiment with a two-group comparison:
plotMA(res, ylim=c(-5,5))
可以看到DESeq通過(guò)統(tǒng)計(jì)學(xué)技術(shù),narrowing垂直方向的離散(左部分)
make an MA-plot for the results table in which we raised the log2 fold change threshold,并標(biāo)記除特定的一個(gè)基因:
plotMA(resLFC1, ylim=c(-5,5))
topGene <- rownames(resLFC1)[which.min(resLFC1$padj)]
with(resLFC1[topGene, ], {
points(baseMean, log2FoldChange, col="dodgerblue", cex=2, lwd=2)
text(baseMean, log2FoldChange, topGene, pos=2, col="dodgerblue")
})
histogram of the p values:
hist(res$pvalue[res$baseMean > 1], breaks=0:20/20, col="grey50", border="white")
5.5)Plotting results
提取前20個(gè)highest variance across samples.基因進(jìn)行聚類(lèi)
library("genefilter")
topVarGenes <- head(order(rowVars(assay(rld)),decreasing=TRUE),20)
mat <- assay(rld)[ topVarGenes, ]
mat <- mat - rowMeans(mat)
df <- as.data.frame(colData(rld)[,c("cell","dex")])
pheatmap(mat, annotation_col=df)
5.6)Independent filtering
qs <- c(0, quantile(resLFC1$baseMean[resLFC1$baseMean > 0], 0:6/6))
bins <- cut(resLFC1$baseMean, qs)
levels(bins) <- paste0("~",round(signif(.5*qs[-1] + .5*qs[-length(qs)],2)))
ratios <- tapply(resLFC1$pvalue, bins, function(p) mean(p < .05, na.rm=TRUE))
barplot(ratios, xlab="mean normalized count", ylab="ratio of small p values")
5.7)Annotating and exporting results
因?yàn)檫@里只有Ensembl gene IDs,而其它的名稱(chēng)也非常有用。用 mapIds()函數(shù), 其中column 參數(shù)表示需要哪種信息, multiVals參數(shù)表示如果有多個(gè)對(duì)應(yīng)值的時(shí)候如何選擇.,這里選擇第一個(gè). 調(diào)用兩次函數(shù) add the gene symbol and Entrez ID.
library("AnnotationDbi")
library("org.Hs.eg.db")
columns(org.Hs.eg.db)
res$symbol <- mapIds(org.Hs.eg.db,
keys=row.names(res),
column="SYMBOL", #需要哪種信息
keytype="ENSEMBL",
multiVals="first") #有多個(gè)對(duì)應(yīng)值,這里選第一個(gè)
res$entrez <- mapIds(org.Hs.eg.db,
keys=row.names(res),
column="ENTREZID",
keytype="ENSEMBL",
multiVals="first")
resOrdered <- res[order(res$padj),]
head(resOrdered)
5.8)Annotating and exporting results
resOrderedDF <- as.data.frame(resOrdered)[1:100,]
write.csv(resOrderedDF, file="results.csv") #保存結(jié)果
library("ReportingTools") #生成網(wǎng)頁(yè)版結(jié)果
htmlRep <- HTMLReport(shortName="report", title="My report",
reportDirectory="./report")
publish(resOrderedDF, htmlRep)
url <- finish(htmlRep)
browseURL(url)
5.9)Plotting fold changes in genomic space
在基因組上展示差異結(jié)果
(resGR <- results(dds, lfcThreshold=1, format="GRanges"))
resGR$symbol <- mapIds(org.Hs.eg.db, names(resGR), "SYMBOL", "ENSEMBL") #加基因名
library("Gviz") #用于可視化for plotting the GRanges and associated metadata: the log fold changes due to dexam- ethasone treatment.
window <- resGR[topGene] + 1e6
strand(window) <- "*"
resGRsub <- resGR[resGR %over% window]
naOrDup <- is.na(resGRsub$symbol) | duplicated(resGRsub$symbol)
resGRsub$group <- ifelse(naOrDup, names(resGRsub), resGRsub$symbol)
sig <- factor(ifelse(resGRsub$padj < .1 & !is.na(resGRsub$padj),"sig","notsig"))
options(ucscChromosomeNames=FALSE)
g <- GenomeAxisTrack()
a <- AnnotationTrack(resGRsub, name="gene ranges", feature=sig)
d <- DataTrack(resGRsub, data="log2FoldChange", baseline=0,
type="h", name="log2 fold change", strand="+")
plotTracks(list(g,d,a), groupAnnotation="group", notsig="grey", sig="hotpink")
5.10)Removing hidden batch effects
詳細(xì)看文檔
library("sva")
dat <- counts(dds, normalized=TRUE)
idx <- rowMeans(dat) > 1
dat <- dat[idx,]
mod <- model.matrix(~ dex, colData(dds))
mod0 <- model.matrix(~ 1, colData(dds))
svseq <- svaseq(dat, mod, mod0, n.sv=2)
svseq$sv
par(mfrow=c(2,1),mar=c(3,5,3,1))
stripchart(svseq$sv[,1] ~ dds$cell,vertical=TRUE,main="SV1")
abline(h=0)
stripchart(svseq$sv[,2] ~ dds$cell,vertical=TRUE,main="SV2")
abline(h=0)
ddssva <- dds
ddssva$SV1 <- svseq$sv[,1]
ddssva$SV2 <- svseq$sv[,2]
design(ddssva) <- ~ SV1 + SV2 + dex
ddssva <- DESeq(ddssva)
5.11)Time course experiments
詳細(xì)看文檔
library("fission")
data("fission")
ddsTC <- DESeqDataSet(fission, ~ strain + minute + strain:minute)
ddsTC <- DESeq(ddsTC, test="LRT", reduced = ~ strain + minute)
resTC <- results(ddsTC)
resTC$symbol <- mcols(ddsTC)$symbol
head(resTC[order(resTC$padj),],4)
data <- plotCounts(ddsTC, which.min(resTC$padj),
intgroup=c("minute","strain"), returnData=TRUE)
ggplot(data, aes(x=minute, y=count, color=strain, group=strain)) +
geom_point() + stat_smooth(se=FALSE,method="loess") + scale_y_log10()
resultsNames(ddsTC)
res30 <- results(ddsTC, name="strainmut.minute30", test="Wald")
res30[which.min(resTC$padj),]
furthermore cluster significant genes by their profiles:
betas <- coef(ddsTC)
colnames(betas)
library("pheatmap")
topGenes <- head(order(resTC$padj),20)
mat <- betas[topGenes, -c(1,2)]
thr <- 3
mat[mat < -thr] <- -thr
mat[mat > thr] <- thr
pheatmap(mat, breaks=seq(from=-thr, to=thr, length=101),
cluster_col=FALSE)
總結(jié)
以上是生活随笔為你收集整理的airway之workflow的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: android 原生分享界面_这些技巧和
- 下一篇: 昨天看王建硕的blog,在说《胡适日记》