日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 编程问答 >内容正文

编程问答

单细胞数据初步处理 | drop-seq | QC | 质控 | 正则化 normalization

發布時間:2025/3/19 编程问答 27 豆豆
生活随笔 收集整理的這篇文章主要介紹了 单细胞数据初步处理 | drop-seq | QC | 质控 | 正则化 normalization 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

比對

The raw Drop-seq data was processed with?the standard pipeline (Drop-seq tools version 1.12 from McCarroll laboratory).?Reads were aligned to the ENSEMBL release 84 Mus musculus genome.

10x?Genomics data was processed using the same pipeline as for Drop-seq data,?adjusting the barcode locations accordingly?

我還沒有深入接觸10x和drop-seq的數據,目前的10x數據都是用官網cellranger跑出來的。

質控

We selected cells for downstream processing in each Drop-seq?run, using the quality control metrics output by the Drop-seq tools package9, as?well as metrics derived from the UMI matrix.

1) We first removed cells with a low?number (<700) of unique detected genes. From the remaining cells, we filtered?additional outliers.

2) We removed cells for which the overall alignment rate was?less than the mean minus three standard deviations.

3) We removed cells for which?the total number of reads (after log10 transformation) was not within three standard?deviations of the mean.

4) We removed cells for which the total number of unique?molecules (UMIs, after log10 transformation) was not within three standard deviations of the mean.

5) We removed cells for which the transcriptomic alignment?rate (defined by PCT_USABLE_BASES) was not within three standard deviations?of the mean.

6) We removed cells that showed an unusually high or low number of?UMIs given their number of reads by fitting a loess curve (span= 0.5, degree= 2)?to the number of UMIs with number of reads as predictor (both after log10 transformation). Cells with a residual more than three standard deviations away from?the mean were removed.

7) With the same criteria, we removed cells that showed?an unusually high or low number of genes given their number of UMIs. Of these?filter steps, step 1 removed the majority of cells.

Steps 2 to 7 removed only a small?number of additional cells from each eminence (2% to 4%), and these cells did not?exhibit unique or biologically informative patterns of gene expression.

1. 過濾掉基因數量太少的細胞;

2. 過濾基因組比對太差的細胞;

3. 過濾掉總reads數太少的細胞;

4. 過濾掉UMI太少的細胞;

5. 過濾掉轉錄本比對太少的細胞;

6. 根據統計分析,過濾reads過多或過少的細胞;

7.?根據統計分析,過濾UMI過低或過高的細胞;

注:連過濾都有點統計的門檻,其實也簡單,應該是默認為正態分布,去掉了左右極端值。

還有一個就是簡單的擬合回歸,LOESS Curve Fitting (Local Polynomial Regression)

How to fit a smooth curve to my data in R?

?

正則化

The raw data per Drop-seq run is a UMI count matrix with?genes as rows and cells as columns. The values represent the number of UMIs that?were detected. The aim of normalization is to make these numbers comparable?between cells by removing the effect of sequencing depth and biological sources of?heterogeneity that may confound the signal of interest, in our case cell cycle stage.

目前有很多正則化的方法,但是作者還是自己開發了一個。

正則化就是去掉一些影響因素,使得我們的數據之間可以相互比較。這里就提到了兩個最主要的因素:測序深度和細胞周期。

?

A common approach to correct for sequencing depth is to create a new normalized expression matrix x with (see Fig), in which ci,j is the molecule?count of gene i in cell j and mj is the sum of all molecule counts for cell j. This?approach assumes that ci,j increases linearly with mj, which is true only when the?set of genes detected in each cell is roughly the same.

可以看到常規的正則化方法是不適合的,

However, for Drop-seq, in?which the number of UMIs is low per cell compared to the number of genes?present, the set of genes detected per cell can be quite different. Hence, we?normalize the expression of each gene separately by modelling the UMI counts as?coming from a generalized linear model with negative binomial distribution, the?mean of which can be dependent on technical factors related to sequencing depth.?Specifically, for every gene we model the expected value of UMI counts as a function of the total number of reads assigned to that cell, and the number of UMIs per?detected gene (sum of UMI divided by number of unique detected genes).

這個就有些門檻了,用了廣義線性回歸模型來做正則化。

To solve?the regression problem, we use a generalized linear model (glm function of base?R package) with a regularized overdispersion parameter theta. Regularizing theta?helps us to avoid overfitting which could occur for genes whose variability is mostly?driven by biological processes rather than sampling noise and dropout events. To?learn a regularized theta for every gene, we perform the following procedure.

1) For every gene, obtain an empirical theta using the maximum likelihood?model (theta.ml function of the MASS R package) and the estimated mean vector?that is obtained by a generalized linear model with Poisson error distribution.?

2) Fit a line (loess, span = 0.33, degree = 2) through the variance–mean?UMI count relationship (both log10 transformed) and predict regularized theta?using the fit. The relationship between variance and theta and mean is given by?variance= mean + (mean2/theta).
Normalized expression is then defined as the Pearson residual of the regression?model, which can be interpreted as the number of standard deviations by which?an observed UMI count was higher or lower than its expected value. Unless stated?otherwise, we clip expression to the range [-30, 30] to prevent outliers from?dominating downstream analyses.
?

好的是,代碼人家都給出來了,你去跑跑,就能猜出大致的意思。

# for normalization # regularized overdispersion parameter theta. Regularizing theta helps us to avoid overfitting which could occur for genes whose variability is mostly driven by biological processes rather than sampling noise and dropout events. # divide all genes into 64 bins theta.reg <- function(cm, regressors, min.theta=0.01, bins=64) {b.id <- (1:nrow(cm)) %% max(1, bins, na.rm=TRUE) + 1cat(sprintf('get regularized theta estimate for %d genes and %d cells\n', nrow(cm), ncol(cm)))cat(sprintf('processing %d bins with ca %d genes in each\n', bins, round(nrow(cm)/bins, 0)))theta.estimate <- rep(NA, nrow(cm))# For every gene, obtain an empirical theta using the maximum likelihood model (theta.ml function of the MASS R package)for (bin in sort(unique(b.id))) {sel.g <- which(b.id == bin)bin.theta.estimate <- unlist(mclapply(sel.g, function(i) {# estimated mean vector that is obtained by a generalized linear model with Poisson error distributionas.numeric(theta.ml(cm[i, ], glm(cm[i, ] ~ ., data = regressors, family=poisson)$fitted))}), use.names = FALSE)theta.estimate[sel.g] <- bin.theta.estimatecat(sprintf('%d ', bin))}cat('done\n')raw.mean <- apply(cm, 1, mean)log.raw.mean <- log10(raw.mean)var.estimate <- raw.mean + raw.mean^2/theta.estimate# Fit a line (loess, span = 0.33, degree = 2) through the variance–mean UMI count relationship (both log10 transformed)fit <- loess(log10(var.estimate) ~ log.raw.mean, span=0.33)# predict regularized theta using the fit. The relationship between variance and theta and mean is given by variance= mean + (mean2/theta)theta.fit <- raw.mean^2 / (10^fit$fitted - raw.mean)to.fix <- theta.fit <= min.theta | is.infinite(theta.fit)if (any(to.fix)) {cat('Fitted theta below', min.theta, 'for', sum(to.fix), 'genes, setting them to', min.theta, '\n')theta.fit[to.fix] <- min.theta}names(theta.fit) <- rownames(cm)return(theta.fit) }nb.residuals.glm <- function(y, regression.mat, fitted.theta, gene) {fit <- 0try(fit <- glm(y ~ ., data = regression.mat, family=negative.binomial(theta=fitted.theta)), silent=TRUE)if (class(fit)[1] == 'numeric') {message(sprintf('glm and family=negative.binomial(theta=%f) failed for gene %s; falling back to scale(log10(y+1))', fitted.theta, gene))return(scale(log10(y+1))[, 1])}return(residuals(fit, type='pearson')) }## Main function norm.nb.reg <- function(cm, regressors, min.theta=0.01, bins=64, theta.fit=NA, pr.th=NA, save.theta.fit=c()) {cat('Normalizing data using regularized NB regression\n')cat('explanatory variables:', colnames(regressors), '\n')if (any(is.na(theta.fit))) {theta.fit <- theta.reg(cm, regressors, min.theta, bins)if (is.character(save.theta.fit)) {save(theta.fit, file=save.theta.fit)}}b.id <- (1:nrow(cm)) %% max(1, bins, na.rm=TRUE) + 1cat('Running NB regression\n')res <- matrix(NA, nrow(cm), ncol(cm), dimnames=dimnames(cm))for (bin in sort(unique(b.id))) {sel.g <- rownames(cm)[b.id == bin]expr.lst <- mclapply(sel.g, function(gene) nb.residuals.glm(cm[gene, ], regressors, theta.fit[gene], gene), mc.preschedule = TRUE)# Normalized expression is then defined as the Pearson residual of the regression model, which can be interpreted as the number of standard deviations by which an observed UMI count was higher or lower than its expected value.res[sel.g, ] <- do.call(rbind, expr.lst)cat(sprintf('%d ', bin))}cat('done\n')# clip expression to the range [-30, 30] to prevent outliers from dominating downstream analysesif (!any(is.na(pr.th))) {res[res > pr.th] <- pr.thres[res < -pr.th] <- -pr.th}attr(res, 'theta.fit') <- theta.fitreturn(res) }

  

  

總結

以上是生活随笔為你收集整理的单细胞数据初步处理 | drop-seq | QC | 质控 | 正则化 normalization的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。

主站蜘蛛池模板: 九色视频91 | 美女毛片在线观看 | 久久久少妇 | 欧美毛片免费看 | xxxxx国产 | 日韩免费精品视频 | 嫩草视频网站 | 在线不卡视频 | 国产精品久久久久久免费 | 亚洲av毛片一区二二区三三区 | 奴性白洁会所调教 | 求欧美精品网址 | www国产一区 | 无码人妻精品一区二区三区在线 | 免费萌白酱国产一区二区三区 | 亚洲无限观看 | 久草最新 | 精品少妇一二三区 | 亚州一二区 | 在线视频免费播放 | 中文在线视频观看 | 人人干人人搞 | 日本毛片在线观看 | 午夜三级福利 | 美女一二区 | 精品亚洲一区二区三区 | 久草精品在线观看视频 | 国产美女无遮挡网站 | 国产91在线看 | 色老汉av一区二区三区 | 日韩在线一级片 | 日本福利视频导航 | av观看网 | 精品无码一区二区三区爱欲 | bangbros性欧美18 | 国产精品成久久久久三级 | 日韩精品乱码久久久久久 | 久久偷看各类wc女厕嘘嘘偷窃 | 91在线免费视频 | 成人自拍视频在线观看 | 久久成人免费视频 | 国产永久免费无遮挡 | 微拍福利一区二区 | 狠狠操天天干 | 狠狠躁夜夜躁人 | 国产青草 | 午夜精品福利一区二区蜜股av | 国产精品国产三级国产a | 色四虎 | 日韩免费影院 | 国产精品后入内射日本在线观看 | 91麻豆产精品久久久久久夏晴子 | 四虎永久免费地址 | 精品人妻无码一区二区三区换脸 | 黄色一级一片 | 激情免费视频 | 午夜免费在线 | jizz日本在线播放 | 日本色呦呦| 亚洲精品h | 亚洲一级特黄 | 青青草91久久久久久久久 | 91免费福利 | 欧美影院一区二区三区 | 亚洲二三区 | 性调教学院高h学校 | 四虎永久免费地址 | 亚洲精品激情视频 | 国产伦精品一区二区三区视频1 | 免费av免费观看 | 女王脚交玉足榨精调教 | 在线观看免费看片 | 办公室荡乳欲伦交换bd电影 | 波多野结衣家庭主妇 | 亚洲综合大片69999 | 婷婷中文网 | 狠狠综合一区 | 美女一区 | 国产精品久久视频 | 少妇精品无码一区二区免费视频 | 色噜噜在线 | 久久精品6| 影音先锋中文字幕在线 | 欧美日韩电影一区 | 日本少妇三级 | 五月婷久久 | aaa黄色 | 国产欧美综合视频 | 小情侣高清国产在线播放 | 天天做天天爱天天操 | 国产在线一级 | 可以直接看的毛片 | 无码国精品一区二区免费蜜桃 | 一级空姐毛片 | 成人免费毛片高清视频 | av成人在线观看 | 日本特级黄色大片 | 午夜免费网 | 91好色先生 |