當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Seurat的单细胞免疫组库分析来了！

發布時間：2025/3/15 编程问答 15 豆豆

生活随笔收集整理的這篇文章主要介紹了 Seurat的单细胞免疫组库分析来了！小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

使用Seurat進行單細胞VDJ免疫分析

NGS系列文章包括NGS基礎、轉錄組分析?（Nature重磅綜述|關于RNA-seq你想知道的全在這）、ChIP-seq分析?（ChIP-seq基本分析流程）、單細胞測序分析?(重磅綜述：三萬字長文讀懂單細胞RNA測序分析的最佳實踐教程（原理、代碼和評述）)、DNA甲基化分析、重測序分析、GEO數據挖掘（典型醫學設計實驗GEO數據分析 (step-by-step) - Limma差異分析、火山圖、功能富集）等內容。

小劇場：

我：老板，你聽說沒有樓上做的單細胞實驗加了VDJ分析？
老板：VCD分析？哦，，我早就聽過了！
我：老板，是VDJ分析？
老板：VDJ分析，哦對對就是VDJ分析，，，我早就說咱們也應該做了，你看看，又遲了一步，都怪你不提醒我！
我瞟了一眼她桌子上的淘寶頁面（廉價燈芯褲），默默走開了。。。

其實我在介紹clonotypr (令我驚奇的是，當我把推送推出去的當天，我親愛的作者就把該包從github撤了下來啊！)時說明過VDJ免疫分析對免疫及抗體產生的重要意義，這也是為什么現在許多做新冠單細胞分析的都會使用5’端測序聯用VDJ測序分析。

1. 為什么要進行單細胞免疫組庫的分析？

應用方向一：探索腫瘤免疫微環境，輔助免疫治療。

每個人都擁有一個自己的適應性免疫組庫，TCR和BCR通過基因重組和體細胞突變取得多樣性，使得我們身體可以識別和抵御各種內部和外部的入侵者。而腫瘤的發生往往躲避了人體T淋巴細胞而產生、增殖和轉移。

使用10X Genomics ChromiumTM Single Cell Immune Profiling Solution可以捕捉腫瘤發生時的免疫微環境變化，尋找免疫治療的靶點，從而輔助免疫治療更好地抗擊腫瘤。

應用方向二：探索自身免疫性疾病和炎癥性疾病發生機制，輔助疫苗的研究

自身免疫性疾病發生起始和發展的中心環節被認為是抗原特異性T細胞激活導致的，使用10X Genomics ChromiumTM Single Cell Immune Profiling Solution，可以解析自身免疫性疾病的發病機制，從而為疾病的診療提供依據。

應用方向三：移植和免疫重建

器官或者骨髓移植時，經常會誘發宿主的排斥反應，從而發生慢性移植抗宿主病。同種異體反應隨機分布在整個T細胞組庫的交叉反應，因此延遲T細胞恢復和限制的T細胞受體多樣性與異體移植后感染和疾病復發風險增加相關。

而我比較注意的是在疫苗接種前后BCR/TCR CDR3免疫組庫的分析，最近medRxiv上發表的有關新冠的文獻Immune Cell Profiling of COVID-19 Patients in the recovery stage by Single-cell sequencing中對不同BCR/TCR的VDJ重排進行分析，揭示了針對新冠特異的克隆擴增。

2. 免疫組庫主要包括哪幾個方面？

T淋巴細胞（T cell）和B淋巴細胞（B cell）主要負責適應性免疫應答，其抗原識別主要依賴于T細胞受體（T cell recptor, TCR）和B細胞受體（B cell recptor, BCR），這兩類細胞表面分子的共同特點是其多樣性，可以識別多種多樣的抗原分子。BCR的輕鏈和TCRβ鏈由V、D、J、C四個基因片段組成，BCR的重鏈和TCRα鏈由V、J、C三個基因片段組成，這些基因片段在遺傳過程中發生重組、重排，組合成不同的形式，保證了受體多樣性。其中變化最大的就是CDR3區。

3. 10× Genomics VDJ測序進行cellranger后的輸出形式是什么樣的？

Outputs: - Run summary HTML: /home/jdoe/runs/sample345/outs/web_summary.html - Run summary CSV: /home/jdoe/runs/sample345/outs/metrics_summary.csv - All-contig FASTA: /home/jdoe/runs/sample345/outs/all_contig.fasta - All-contig FASTA index: /home/jdoe/runs/sample345/outs/all_contig.fasta.fai - All-contig FASTQ: /home/jdoe/runs/sample345/outs/all_contig.fastq - Read-contig alignments: /home/jdoe/runs/sample345/outs/all_contig.bam - Read-contig alignment index: /home/jdoe/runs/sample345/outs/all_contig.bam.bai - All contig annotations (JSON): /home/jdoe/runs/sample345/outs/all_contig_annotations.json - All contig annotations (BED): /home/jdoe/runs/sample345/outs/all_contig_annotations.bed - All contig annotations (CSV): /home/jdoe/runs/sample345/outs/all_contig_annotations.csv - Filtered contig sequences FASTA: /home/jdoe/runs/sample345/outs/filtered_contig.fasta - Filtered contig sequences FASTQ: /home/jdoe/runs/sample345/outs/filtered_contig.fastq - Filtered contigs (CSV): /home/jdoe/runs/sample345/outs/filtered_contig_annotations.csv - Clonotype consensus FASTA: /home/jdoe/runs/sample345/outs/consensus.fasta - Clonotype consensus FASTA index: /home/jdoe/runs/sample345/outs/consensus.fasta.fai - Clonotype consensus FASTQ: /home/jdoe/runs/sample345/outs/consensus.fastq - Concatenated reference sequences: /home/jdoe/runs/sample345/outs/concat_ref.fasta - Concatenated reference index: /home/jdoe/runs/sample345/outs/concat_ref.fasta.fai - Contig-consensus alignments: /home/jdoe/runs/sample345/outs/consensus.bam - Contig-consensus alignment index: /home/jdoe/runs/sample345/outs/consensus.bam.bai - Contig-reference alignments: /home/jdoe/runs/sample345/outs/concat_ref.bam - Contig-reference alignment index: /home/jdoe/runs/sample345/outs/concat_ref.bam.bai - Clonotype consensus annotations (JSON): /home/jdoe/runs/sample345/outs/consensus_annotations.json - Clonotype consensus annotations (CSV): /home/jdoe/runs/sample345/outs/consensus_annotations.csv - Clonotype info: /home/jdoe/runs/sample345/outs/clonotypes.csv - Barcodes that are declared to be targeted cells: /home/jdoe/runs/sample345/out/cell_barcodes.json - Loupe V(D)J Browser file: /home/jdoe/runs/sample345/outs/vloupe.vloupe

首先我們來看看web.html對整個測序質量的評估：

我們看到在Enrichment中reads映射到VDJ基因的比例為80.7%，其中TRA/TRB代表TCR α/β鏈，map到TRA的比例為24.4%，map到TRB的比例為56%。當然，后面也會有蠻多指標的，比如VDJ注釋，VDJ質量及表達等。。。

當然我們也會有很多表格，其中最重要的表格為contig_annotation和clonotype：

contig_annotation(BCR示例)

上面表格中的IGH和IGK/IGL代表BCR H和BCR L鏈。看到這個表格，我第一反應其實是為什么D區基因（d_gene）多數均為None，主要原因還是D區通常較短又突變較多，因技術限制而常常捕捉不到。數據中也提供了CDR3的蛋白序列和核苷酸序列。

clonotype(TCR示例)

從以上數據可以看出，有部分克隆是由單鏈決定的。

那么如何將VDJ的克隆表型和scRNA-seq結合起來呢？其實大佬已經回答了這個問題：

add_clonotype <- function(tcr_location, seurat_obj){tcr <- read.csv(paste(tcr_folder,"filtered_contig_annotations.csv", sep=""))# Remove the -1 at the end of each barcode.# Subsets so only the first line of each barcode is kept,# as each entry for given barcode will have same clonotype.tcr$barcode <- gsub("-1", "", tcr$barcode)tcr <- tcr[!duplicated(tcr$barcode), ]# Only keep the barcode and clonotype columns.# We'll get additional clonotype info from the clonotype table.tcr <- tcr[,c("barcode", "raw_clonotype_id")]names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"# Clonotype-centric info.clono <- read.csv(paste(tcr_folder,"clonotypes.csv", sep=""))# Slap the AA sequences onto our original table by clonotype_id.tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])# Reorder so barcodes are first column and set them as rownames.tcr <- tcr[, c(2,1,3)]rownames(tcr) <- tcr[,1]tcr[,1] <- NULL# Add to the Seurat object's metadata.clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)return(clono_seurat)}

怎么用呢？舉個栗子吧：

數據下載

download.file("https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/iimg5mz77whzzqc/vdj_v1_mm_balbc_pbmc.zip", "vdj_v1_mm_balbc_pbmc.zip")#這是小鼠的PBMC數據

加載R包

library(Seurat) library(cowplot)

加載數據

## Cellranger balbc_pbmc <- Read10X_h5("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_5gex_filtered_feature_bc_matrix.h5")s_balbc_pbmc <- CreateSeuratObject(counts = balbc_pbmc, min.cells = 3, min.features = 200, project = "cellranger")

提取線粒體基因

s_balbc_pbmc$percent.mito <- PercentageFeatureSet(s_balbc_pbmc, pattern = "^mt-")

增加T和B細胞的克隆信息

add_clonotype <- function(tcr_prefix, seurat_obj, type="t"){tcr <- read.csv(paste(tcr_prefix,"filtered_contig_annotations.csv", sep=""))# Remove the -1 at the end of each barcode.（注意，此步驟如果標記使用不同的barcode，比如多了個-1,可以使用 tcr$barcode <- gsub("-1", "", tcr$barcode)進行提取）# Subsets so only the first line of each barcode is kept,# as each entry for given barcode will have same clonotype.tcr <- tcr[!duplicated(tcr$barcode), ]# Only keep the barcode and clonotype columns.# We'll get additional clonotype info from the clonotype table.tcr <- tcr[,c("barcode", "raw_clonotype_id")]names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"# Clonotype-centric info.clono <- read.csv(paste(tcr_prefix,"clonotypes.csv", sep=""))# Slap the AA sequences onto our original table by clonotype_id.tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])names(tcr)[names(tcr) == "cdr3s_aa"] <- "cdr3s_aa"# Reorder so barcodes are first column and set them as rownames.tcr <- tcr[, c(2,1,3)]rownames(tcr) <- tcr[,1]tcr[,1] <- NULLcolnames(tcr) <- paste(type, colnames(tcr), sep="_")# Add to the Seurat object's metadata.clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)return(clono_seurat) }s_balbc_pbmc <- add_clonotype("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_t_", s_balbc_pbmc, "t") s_balbc_pbmc <- add_clonotype("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_b_", s_balbc_pbmc, "b") head(s_balbc_pbmc[[]])

我給解釋一下以上function中每一步都在干什么：

首先讀入contig_annotations.csv，并賦給tcr；
去除tcr中重復的barcode，即如果具有相同的barcode，將以第一次出現的barcode為主來去重；
將tcr中的barcode，raw_clonotype_id賦值于tcr；
讀入clonotypes.csv，并賦給clono；
將tcr和colono進行merge（單細胞分析Seurat使用相關的10個問題答疑精選！），并賦給tcr；
將最后得到的，帶有barcode，raw_clonotype_id和colono的tcr對象以metadata的形式加入seurat object中。

發現很多的NA，非常正常啊，不是每個細胞都是T/B cell，然后還列出來了T/B CDR3的蛋白序列。

table(!is.na(s_balbc_pbmc$t_clonotype_id),!is.na(s_balbc_pbmc$b_clonotype_id))

s_balbc_pbmc <- subset(s_balbc_pbmc, cells = colnames(s_balbc_pbmc)[!(!is.na(s_balbc_pbmc$t_clonotype_id) & !is.na(s_balbc_pbmc$b_clonotype_id))]) #刪除同時表達T、B克隆表型的細胞 s_balbc_pbmc

進行常規workflow

s_balbc_pbmc <- subset(s_balbc_pbmc, percent.mito <= 10)s_balbc_pbmc <- subset(s_balbc_pbmc, nCount_RNA >= 500 & nCount_RNA <= 40000) s_balbc_pbmc <- NormalizeData(s_balbc_pbmc, normalization.method = "LogNormalize", scale.factor = 10000) s_balbc_pbmc <- FindVariableFeatures(s_balbc_pbmc, selection.method = "vst", nfeatures = 2000)all.genes <- rownames(s_balbc_pbmc) s_balbc_pbmc <- ScaleData(s_balbc_pbmc, features = all.genes) s_balbc_pbmc <- RunPCA(s_balbc_pbmc, features = VariableFeatures(object = s_balbc_pbmc))

use.pcs = 1:30 s_balbc_pbmc <- FindNeighbors(s_balbc_pbmc, dims = use.pcs) s_balbc_pbmc <- FindClusters(s_balbc_pbmc, resolution = c(0.5)) s_balbc_pbmc <- RunUMAP(s_balbc_pbmc, dims = use.pcs) DimPlot(s_balbc_pbmc, reduction = "umap", label = TRUE)

讓我們看看T細胞的marker的表達情況：

t_cell_markers <- c("Cd3d","Cd3e") FeaturePlot(s_balbc_pbmc, features = t_cell_markers)

比如我知道某個b_cdr3s_aa我感覺特有意思，想在UMAP 圖上進行表示，于是我先把他的蛋白序列找了出來，IGH:CARWGGYGYDGGYFDYW;IGK:CGQSYSYPYTF，然后：

DimPlot(s_balbc_pbmc, cells.highlight = Cells(subset(s_balbc_pbmc, subset = b_cdr3s_aa + == "IGH:CARWGGYGYDGGYFDYW;IGK:CGQSYSYPYTF")))

萬“綠”叢中一點紅！

當然你也可以在metadata中納入更多的TCR/BCR中的有關信息，比如我們把annotation.csv中的chains也納入進來的話，就是這個樣子滴：

p2 <-DimPlot(s_balbc_pbmc,group.by = "t_chain") p2

參考來源

https://github.com/ucdavis-bioinformatics-training/2020-Advanced_Single_Cell_RNA_Seq/blob/master/data_analysis/VDJ_Analysis_fixed.md

你可能還想看

NBT：單細胞轉錄組新降維可視化方法PHATE
復現nature communication PCA原圖|代碼分析（一）
這篇Nature子刊文章的蛋白組學數據PCA分析竟花費了我兩天時間來重現|附全過程代碼
一個R包玩轉單細胞免疫組庫分析，還能與Seurat無縫對接

往期精品(點擊圖片直達文字對應教程)

后臺回復“生信寶典福利第一波”或點擊閱讀原文獲取教程合集

總結

以上是生活随笔為你收集整理的Seurat的单细胞免疫组库分析来了！的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：那个一年发四篇Cell的研究生，后来怎么
下一篇： R变量索引 - 什么时候使用 @或$