Seurat的单细胞免疫组库分析来了!
使用Seurat進行單細胞VDJ免疫分析
NGS系列文章包括NGS基礎、轉錄組分析?(Nature重磅綜述|關于RNA-seq你想知道的全在這)、ChIP-seq分析?(ChIP-seq基本分析流程)、單細胞測序分析?(重磅綜述:三萬字長文讀懂單細胞RNA測序分析的最佳實踐教程 (原理、代碼和評述))、DNA甲基化分析、重測序分析、GEO數據挖掘(典型醫學設計實驗GEO數據分析 (step-by-step) - Limma差異分析、火山圖、功能富集)等內容。
小劇場:
我:老板,你聽說沒有樓上做的單細胞實驗加了VDJ分析?
老板:VCD分析?哦,,我早就聽過了!
我:老板,是VDJ分析?
老板:VDJ分析,哦對對就是VDJ分析,,,我早就說咱們也應該做了,你看看,又遲了一步,都怪你不提醒我!
我瞟了一眼她桌子上的淘寶頁面(廉價燈芯褲),默默走開了。。。
其實我在介紹clonotypr (令我驚奇的是,當我把推送推出去的當天,我親愛的作者就把該包從github撤了下來啊!)時說明過VDJ免疫分析對免疫及抗體產生的重要意義,這也是為什么現在許多做新冠單細胞分析的都會使用5’端測序聯用VDJ測序分析。
1. 為什么要進行單細胞免疫組庫的分析?
應用方向一:探索腫瘤免疫微環境,輔助免疫治療。
每個人都擁有一個自己的適應性免疫組庫,TCR和BCR通過基因重組和體細胞突變取得多樣性,使得我們身體可以識別和抵御各種內部和外部的入侵者。而腫瘤的發生往往躲避了人體T淋巴細胞而產生、增殖和轉移。
使用10X Genomics ChromiumTM Single Cell Immune Profiling Solution可以捕捉腫瘤發生時的免疫微環境變化,尋找免疫治療的靶點,從而輔助免疫治療更好地抗擊腫瘤。
應用方向二:探索自身免疫性疾病和炎癥性疾病發生機制,輔助疫苗的研究
自身免疫性疾病發生起始和發展的中心環節被認為是抗原特異性T細胞激活導致的,使用10X Genomics ChromiumTM Single Cell Immune Profiling Solution,可以解析自身免疫性疾病的發病機制,從而為疾病的診療提供依據。
應用方向三:移植和免疫重建
器官或者骨髓移植時,經常會誘發宿主的排斥反應,從而發生慢性移植抗宿主病。同種異體反應隨機分布在整個T細胞組庫的交叉反應,因此延遲T細胞恢復和限制的T細胞受體多樣性與異體移植后感染和疾病復發風險增加相關。
而我比較注意的是在疫苗接種前后BCR/TCR CDR3免疫組庫的分析,最近medRxiv上發表的有關新冠的文獻Immune Cell Profiling of COVID-19 Patients in the recovery stage by Single-cell sequencing中對不同BCR/TCR的VDJ重排進行分析,揭示了針對新冠特異的克隆擴增。
2. 免疫組庫主要包括哪幾個方面?
T淋巴細胞(T cell)和B淋巴細胞(B cell)主要負責適應性免疫應答,其抗原識別主要依賴于T細胞受體(T cell recptor, TCR)和B細胞受體(B cell recptor, BCR),這兩類細胞表面分子的共同特點是其多樣性,可以識別多種多樣的抗原分子。BCR的輕鏈和TCRβ鏈由V、D、J、C四個基因片段組成,BCR的重鏈和TCRα鏈由V、J、C三個基因片段組成,這些基因片段在遺傳過程中發生重組、重排,組合成不同的形式,保證了受體多樣性。其中變化最大的就是CDR3區。
3. 10× Genomics VDJ測序進行cellranger后的輸出形式是什么樣的?
Outputs: - Run summary HTML: /home/jdoe/runs/sample345/outs/web_summary.html - Run summary CSV: /home/jdoe/runs/sample345/outs/metrics_summary.csv - All-contig FASTA: /home/jdoe/runs/sample345/outs/all_contig.fasta - All-contig FASTA index: /home/jdoe/runs/sample345/outs/all_contig.fasta.fai - All-contig FASTQ: /home/jdoe/runs/sample345/outs/all_contig.fastq - Read-contig alignments: /home/jdoe/runs/sample345/outs/all_contig.bam - Read-contig alignment index: /home/jdoe/runs/sample345/outs/all_contig.bam.bai - All contig annotations (JSON): /home/jdoe/runs/sample345/outs/all_contig_annotations.json - All contig annotations (BED): /home/jdoe/runs/sample345/outs/all_contig_annotations.bed - All contig annotations (CSV): /home/jdoe/runs/sample345/outs/all_contig_annotations.csv - Filtered contig sequences FASTA: /home/jdoe/runs/sample345/outs/filtered_contig.fasta - Filtered contig sequences FASTQ: /home/jdoe/runs/sample345/outs/filtered_contig.fastq - Filtered contigs (CSV): /home/jdoe/runs/sample345/outs/filtered_contig_annotations.csv - Clonotype consensus FASTA: /home/jdoe/runs/sample345/outs/consensus.fasta - Clonotype consensus FASTA index: /home/jdoe/runs/sample345/outs/consensus.fasta.fai - Clonotype consensus FASTQ: /home/jdoe/runs/sample345/outs/consensus.fastq - Concatenated reference sequences: /home/jdoe/runs/sample345/outs/concat_ref.fasta - Concatenated reference index: /home/jdoe/runs/sample345/outs/concat_ref.fasta.fai - Contig-consensus alignments: /home/jdoe/runs/sample345/outs/consensus.bam - Contig-consensus alignment index: /home/jdoe/runs/sample345/outs/consensus.bam.bai - Contig-reference alignments: /home/jdoe/runs/sample345/outs/concat_ref.bam - Contig-reference alignment index: /home/jdoe/runs/sample345/outs/concat_ref.bam.bai - Clonotype consensus annotations (JSON): /home/jdoe/runs/sample345/outs/consensus_annotations.json - Clonotype consensus annotations (CSV): /home/jdoe/runs/sample345/outs/consensus_annotations.csv - Clonotype info: /home/jdoe/runs/sample345/outs/clonotypes.csv - Barcodes that are declared to be targeted cells: /home/jdoe/runs/sample345/out/cell_barcodes.json - Loupe V(D)J Browser file: /home/jdoe/runs/sample345/outs/vloupe.vloupe首先我們來看看web.html對整個測序質量的評估:
我們看到在Enrichment中reads映射到VDJ基因的比例為80.7%,其中TRA/TRB代表TCR α/β鏈 ,map到TRA的比例為24.4%,map到TRB的比例為56%。當然,后面也會有蠻多指標的,比如VDJ注釋,VDJ質量及表達等。。。
當然我們也會有很多表格,其中最重要的表格為contig_annotation和clonotype:
contig_annotation(BCR示例)
上面表格中的IGH和IGK/IGL代表BCR H和BCR L鏈 。看到這個表格,我第一反應其實是為什么D區基因(d_gene)多數均為None,主要原因還是D區通常較短又突變較多,因技術限制而常常捕捉不到。數據中也提供了CDR3的蛋白序列和核苷酸序列。
clonotype(TCR示例)
從以上數據可以看出,有部分克隆是由單鏈決定的。
那么如何將VDJ的克隆表型和scRNA-seq結合起來呢?其實大佬已經回答了這個問題:
add_clonotype <- function(tcr_location, seurat_obj){tcr <- read.csv(paste(tcr_folder,"filtered_contig_annotations.csv", sep=""))# Remove the -1 at the end of each barcode.# Subsets so only the first line of each barcode is kept,# as each entry for given barcode will have same clonotype.tcr$barcode <- gsub("-1", "", tcr$barcode)tcr <- tcr[!duplicated(tcr$barcode), ]# Only keep the barcode and clonotype columns.# We'll get additional clonotype info from the clonotype table.tcr <- tcr[,c("barcode", "raw_clonotype_id")]names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"# Clonotype-centric info.clono <- read.csv(paste(tcr_folder,"clonotypes.csv", sep=""))# Slap the AA sequences onto our original table by clonotype_id.tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])# Reorder so barcodes are first column and set them as rownames.tcr <- tcr[, c(2,1,3)]rownames(tcr) <- tcr[,1]tcr[,1] <- NULL# Add to the Seurat object's metadata.clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)return(clono_seurat)}怎么用呢?舉個栗子吧:
數據下載
download.file("https://bioshare.bioinformatics.ucdavis.edu/bioshare/download/iimg5mz77whzzqc/vdj_v1_mm_balbc_pbmc.zip", "vdj_v1_mm_balbc_pbmc.zip")#這是小鼠的PBMC數據加載R包
library(Seurat) library(cowplot)加載數據
## Cellranger balbc_pbmc <- Read10X_h5("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_5gex_filtered_feature_bc_matrix.h5")s_balbc_pbmc <- CreateSeuratObject(counts = balbc_pbmc, min.cells = 3, min.features = 200, project = "cellranger")提取線粒體基因
s_balbc_pbmc$percent.mito <- PercentageFeatureSet(s_balbc_pbmc, pattern = "^mt-")增加T和B細胞的克隆信息
add_clonotype <- function(tcr_prefix, seurat_obj, type="t"){tcr <- read.csv(paste(tcr_prefix,"filtered_contig_annotations.csv", sep=""))# Remove the -1 at the end of each barcode.(注意,此步驟如果標記使用不同的barcode,比如多了個-1,可以使用 tcr$barcode <- gsub("-1", "", tcr$barcode)進行提取)# Subsets so only the first line of each barcode is kept,# as each entry for given barcode will have same clonotype.tcr <- tcr[!duplicated(tcr$barcode), ]# Only keep the barcode and clonotype columns.# We'll get additional clonotype info from the clonotype table.tcr <- tcr[,c("barcode", "raw_clonotype_id")]names(tcr)[names(tcr) == "raw_clonotype_id"] <- "clonotype_id"# Clonotype-centric info.clono <- read.csv(paste(tcr_prefix,"clonotypes.csv", sep=""))# Slap the AA sequences onto our original table by clonotype_id.tcr <- merge(tcr, clono[, c("clonotype_id", "cdr3s_aa")])names(tcr)[names(tcr) == "cdr3s_aa"] <- "cdr3s_aa"# Reorder so barcodes are first column and set them as rownames.tcr <- tcr[, c(2,1,3)]rownames(tcr) <- tcr[,1]tcr[,1] <- NULLcolnames(tcr) <- paste(type, colnames(tcr), sep="_")# Add to the Seurat object's metadata.clono_seurat <- AddMetaData(object=seurat_obj, metadata=tcr)return(clono_seurat) }s_balbc_pbmc <- add_clonotype("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_t_", s_balbc_pbmc, "t") s_balbc_pbmc <- add_clonotype("vdj_v1_mm_balbc_pbmc/vdj_v1_mm_balbc_pbmc_b_", s_balbc_pbmc, "b") head(s_balbc_pbmc[[]])我給解釋一下以上function中每一步都在干什么:
首先讀入contig_annotations.csv,并賦給tcr;
去除tcr中重復的barcode,即如果具有相同的barcode,將以第一次出現的barcode為主來去重;
將tcr中的barcode,raw_clonotype_id賦值于tcr;
讀入clonotypes.csv,并賦給clono;
將tcr和colono進行merge(單細胞分析Seurat使用相關的10個問題答疑精選!),并賦給tcr;
將最后得到的,帶有barcode,raw_clonotype_id和colono的tcr對象以metadata的形式加入seurat object中。
發現很多的NA,非常正常啊,不是每個細胞都是T/B cell,然后還列出來了T/B CDR3的蛋白序列。
table(!is.na(s_balbc_pbmc$t_clonotype_id),!is.na(s_balbc_pbmc$b_clonotype_id))s_balbc_pbmc <- subset(s_balbc_pbmc, cells = colnames(s_balbc_pbmc)[!(!is.na(s_balbc_pbmc$t_clonotype_id) & !is.na(s_balbc_pbmc$b_clonotype_id))]) #刪除同時表達T、B克隆表型的細胞 s_balbc_pbmc進行常規workflow
s_balbc_pbmc <- subset(s_balbc_pbmc, percent.mito <= 10)s_balbc_pbmc <- subset(s_balbc_pbmc, nCount_RNA >= 500 & nCount_RNA <= 40000) s_balbc_pbmc <- NormalizeData(s_balbc_pbmc, normalization.method = "LogNormalize", scale.factor = 10000) s_balbc_pbmc <- FindVariableFeatures(s_balbc_pbmc, selection.method = "vst", nfeatures = 2000)all.genes <- rownames(s_balbc_pbmc) s_balbc_pbmc <- ScaleData(s_balbc_pbmc, features = all.genes) s_balbc_pbmc <- RunPCA(s_balbc_pbmc, features = VariableFeatures(object = s_balbc_pbmc))use.pcs = 1:30 s_balbc_pbmc <- FindNeighbors(s_balbc_pbmc, dims = use.pcs) s_balbc_pbmc <- FindClusters(s_balbc_pbmc, resolution = c(0.5)) s_balbc_pbmc <- RunUMAP(s_balbc_pbmc, dims = use.pcs) DimPlot(s_balbc_pbmc, reduction = "umap", label = TRUE)讓我們看看T細胞的marker的表達情況:
t_cell_markers <- c("Cd3d","Cd3e") FeaturePlot(s_balbc_pbmc, features = t_cell_markers)比如我知道某個b_cdr3s_aa我感覺特有意思,想在UMAP 圖上進行表示,于是我先把他的蛋白序列找了出來,IGH:CARWGGYGYDGGYFDYW;IGK:CGQSYSYPYTF,然后:
DimPlot(s_balbc_pbmc, cells.highlight = Cells(subset(s_balbc_pbmc, subset = b_cdr3s_aa + == "IGH:CARWGGYGYDGGYFDYW;IGK:CGQSYSYPYTF")))萬“綠”叢中一點紅!
當然你也可以在metadata中納入更多的TCR/BCR中的有關信息,比如我們把annotation.csv中的chains也納入進來的話,就是這個樣子滴:
p2 <-DimPlot(s_balbc_pbmc,group.by = "t_chain") p2參考來源
https://github.com/ucdavis-bioinformatics-training/2020-Advanced_Single_Cell_RNA_Seq/blob/master/data_analysis/VDJ_Analysis_fixed.md
你可能還想看
NBT:單細胞轉錄組新降維可視化方法PHATE
復現nature communication PCA原圖|代碼分析(一)
這篇Nature子刊文章的蛋白組學數據PCA分析竟花費了我兩天時間來重現|附全過程代碼
一個R包玩轉單細胞免疫組庫分析,還能與Seurat無縫對接
往期精品(點擊圖片直達文字對應教程)
后臺回復“生信寶典福利第一波”或點擊閱讀原文獲取教程合集
總結
以上是生活随笔為你收集整理的Seurat的单细胞免疫组库分析来了!的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 那个一年发四篇Cell的研究生,后来怎么
- 下一篇: R变量索引 - 什么时候使用 @或$