當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

实验记录 | DNA数据的处理流程

發布時間：2024/3/24 编程问答 38 豆豆

生活随笔收集整理的這篇文章主要介紹了实验记录 | DNA数据的处理流程小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

目前，嘗試對代碼文件進行一步一步的拆解。去看他是怎樣一步一步的做判斷的？

我感覺我的狀態有點不對勁，內心的那種動力沒有了。反而內心會被一些東西困擾，這些東西阻礙了我的前進。我花半個小時，理一下。
我內心困惑什么呢？還是昨天和老媽討論的，以及今天和董舜討論的。
（1）我覺得自己在課題組中被“邊緣化”。

沒有被納入到正式群中
做的事情也是細枝末節
目前也沒有參與到一些正式的成員所參與的一些事情中，比如組會講文獻，匯報課題
我所看到的和實際上的
所有的同學都渴望加入到這個課題組中，讓我產生了一種危機感

這些，讓我不太舒服。我害怕重蹈肖老師的覆轍，我很緊張。所以，現在的我，該怎樣與這些疙瘩和解呢？
（1）首先，明確一點，你現在不過才過來一個星期。和老師互相都不太熟悉。老師，不會把一些重要的任務交給你是最正常不過的事情。如果是我，我會先安排一些簡單的事情，看看她做的怎么樣？時間久了之后，如果這個小姑娘我很喜歡，那么我會交給她一些任務。信任的建立是需要時間的，不可能一開始就居于“核心”的位置，要學會忍耐，要學會蓄光。
（2）你現在著手所做的事情雖然是比較基礎的，但是是與這個課題的主線相關的一條支線。也非常的重要。將來，也很有可能被納入到這個體系之中。所以，不存在“游離”一說。你不必有如此的小情緒。
（3）你現在雖然并未是正式的成員，但是是存在可能性的。因為，你的本科有相關的分析的基礎，你自己并不差的。老師，主要看兩個方面的內容，一方面是你的專業背景與實驗室的契合程度，另一方面就是你的專業能力等其它方面的能力。你自己并不差的。
（4）而且，工作的過程中，你的確感覺到快樂不是。不要去在意最終的結果，而且你來到這里的初心就是多學一點東西，你現在的確正在學習中，這難道不是一件快樂的事情嗎？值得滿足了。
（5）所以，最終，你只需要踏踏實實的安定下來就夠了。“請君看取榖紋平”，踏實的過好每一天，每一天都在進步，都比昨天多學習一點東西，就足夠了。
以上。

我現在卡在什么地方了呢？
（1）首先，我并不理解為什么四個GATK的結果文件為空？
（2）其次，我并不理解為什么最終沒有跑出來mutation calling的結果文件（如例子中展示的）？
這兩個問題，并不是純粹的技術性的問題，而是涉及到整個流程的原理性的問題。需要返回到源代碼，去查看它的處理過程（必然經歷的一個步驟）。

一個人終究要經過丑陋走向美麗的。

所以，我現在就入手去整理它的每一步的分析過程。
一、準備階段
（1）定義過程中使用到的參數
（2）準備程序、參考文件
（3）準備工作文件夾

二、比對階段
非常簡潔，只是使用了一個函數alignment（）。

判斷輸入的文件的類型（這是接下來一切處理步驟的前提）：

exome-seq：rna=0（本次，我們只論述輸入文件為dna的處理情況）
rna-seq：rna=1
deep exome-seq：rna=2

如果輸入的是bam文件，想將其轉換為fastq文件。

序列比對：
/home/xxzhang/workplace/software/bwa/bwa mem -v 1 -t 32 -a -M ./geneome/hg19/hg19.fa ./output/tumor/fastq1.fastq ./output/tumor/fastq2.fastq > ./output/tumor/alignment.sam

mem:使用mem比對算法。
-v:
-a:將所有的比對結果都輸出，包括 single-end 和 unpaired paired-end的 reads，但是這些比對的結果會被標記為次優。

（這里過段時間，找到help文檔的時候再進行注釋）

輸入文件：fastq1.fastq；fastq2.fastq
輸出文件：alignment.sam

add read group（并不是特別懂）:

/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -jar /home/xxzhang/workplace/QBRC/somatic_script/picard.jar AddOrReplaceReadGroups INPUT=./output/tumor/alignment.sam OUTPUT=./output/tumor/rgAdded.bam SORT_ORDER=coordinate RGID=tumor RGLB=tumor RGPL=illumina RGPU=tumor RGSM=tumor CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT

輸入文件：alignment.sam
輸出文件：rgAdded.bam

參考鏈接：https://broadinstitute.github.io/picard/
這行代碼中使用了工具“AddOrReplaceReadGroups”，下面對這些內容進行介紹：
https://broadinstitute.github.io/picard/command-line-overview.html#AddOrReplaceReadGroups
我不太明白什么是read group?

read group：
Replace read groups in a BAM file.This tool enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file.

標記重復：
/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -jar /home/xxzhang/workplace/QBRC//somatic_script/picard.jar MarkDuplicates INPUT=./output/tumor/rgAdded.bam OUTPUT=./output/tumor/dupmark.bam CREATE_INDEX=true VALIDATION_STRINGENCY=STRICT REMOVE_SEQUENCING_DUPLICATES=true METRICS_FILE=./output/tumor/dupmark_metrics.txt

輸入文件：rgAdded.bam
輸出文件：dupmark.bam

這行代碼中使用了工具“MarkDuplicates”，這行代碼的意思，顧名思義，就是標記重復。我覺得我的錯誤也不在這個環節。

indel重新比對：

我們的物種是人類，因此使用的比對文件是mills和1000g這兩個文件。
比對前：先對使用過的文檔排一個順序。
#/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/picard.jar ReorderSam INPUT=./output/tumor/dupmark.bam OUTPUT=./output/tumor/dupmark.bam SEQUENCE_DICTIONARY=./geneome/hg19/hg19.dict CREATE_INDEX=TRUE

這行代碼其實是可有可無。注釋掉，我們真正需要調整順序的是參考的vcf文件。

RealignerTargetCreator
/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T RealignerTargetCreator -R ./geneome/hg19/hg19.fa --num_threads 32 -known ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf -known ./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf -o ./output/tumor/tumor_intervals.list -I ./output/tumor/dupmark.bam > ./output/tumor/index.out

其實這一步已經有生成結果文件tumor_intervals.txt，只不過不能夠將屏顯轉移到index.out,并生成它。所以，從整體上看，是沒有問題的。

IndelRealigner
/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T IndelRealigner --filter_bases_not_stored --disable_auto_index_creation_and_locking_when_reading_rods -R ./geneome/hg19/hg19.fa -known ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf -known ./geneome/hg19/hg19.fa_resource/1000G_phase1.snps.high_confidence.hg19.vcf -targetIntervals ./output/tumor/tumor_intervals.list -I ./output/tumor/dupmark.bam -o ./output/tumor/realigned.bam >./output/tumor/tumor_realign.out

這一步也生成了結果文件。

base recalibration：

BaseRecalibrator
/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T BaseRecalibrator -R ./geneome/hg19/hg19.fa -knownSites ./geneome/hg19/hg19.fa_resource/dbsnp.hg19.vcf -knownSites ./geneome/hg19/hg19.fa_resource/Mills_and_1000G_gold_standard.indels.hg19.vcf -I ./output/tumor/realigned.bam -o ./output/tumor/tumor_bqsr > ./output/tumor/table.out
PrintReads
/home/xxzhang/workplace/software/java/jdk1.8.0_291/bin/java -Djava.io.tmpdir=./output/tumor/tmp -jar /home/xxzhang/workplace/QBRC//somatic_script/GenomeAnalysisTK.jar -T PrintReads -rf NotPrimaryAlignment -R ./geneome/hg19/hg19.fa -I ./output/tumor/realigned.bam -BQSR ./output/tumor/tumor_bqsr -o ./output/tumor/tumor.bam > ./output/tumor/tumor_recal.out

這兩步都沒有問題。

進一步處理bam文件：
/home/xxzhang/workplace/software/sambamba/sambamba index -t 32 ./output/tumor/tumor.bam
/home/xxzhang/miniconda3/bin/bcftools mpileup -I -f ./geneome/hg19/hg19.fa ./output/tumor/tumor.bam > ./output/tumor/tumor.mpileup

三、突變篩選階段
/home/xxzhang/workplace/software/strelka/bin/configureStrelkaGermlineWorkflow.py --bam /home/xxzhang/workplace/QBRC/output/tumor/tumor.bam --referenceFasta ./geneome/hg19/hg19.fa --runDir ./output/strelka/ --exome

輸入文件：tumor.bam（也是上一步生成的）
輸出文件：

/home/xxzhang/workplace/QBRC/output/strelka/runWorkflow.py -m local -j 32

這一步覺得也有問題，為什么只是calling出了一行？對，我覺得問題也在這里。

四、整合vcf文件

過濾vcf文件

/home/xxzhang/workplace/software/R/R-3.6.3/bin/Rscript /home/xxzhang/workplace/QBRC//somatic_script/filter_vcf.R ./output NA

annovar注釋

/home/xxzhang/workplace/QBRC/annovar/table_annovar.pl ./output/germline_mutations.txt /home/xxzhang/workplace/QBRC/annovar//humandb/ -buildver hg19 -out ./output/germline_mutations -remove -protocol refGene,ljb26_all,cosmic70,esp6500siv2_all,exac03,1000g2015aug_all -operation g,f,f,f,f,f -nastring

/home/xxzhang/workplace/software/R/R-3.6.3/bin/Rscript /home/xxzhang/workplace/QBRC//somatic_script/filter_vcf2.R ./output/somatic_mutations.txt ./output/somatic_mutations.hg19_multianno.txt ./output/somatic_mutations_hg19.txt somatic

/home/xxzhang/workplace/QBRC/annovar/annotate_variation.pl -geneanno -dbtype refGene -buildver hg19 ./output/somatic_mutations_hg19.txt /home/xxzhang/workplace/QBRC/annovar//humandb/

/home/xxzhang/workplace/QBRC/annovar/coding_change.pl --includesnp --alltranscript --newevf ./output/somatic_mutations_hg19.txt_tmp.txt ./output/somatic_mutations_hg19.txt.exonic_variant_function /home/xxzhang/workplace/QBRC/annovar//humandb//hg19_refGene.txt /home/xxzhang/workplace/QBRC/annovar//humandb//hg19_refGeneMrna.fa >/dev/null 2>/dev/null

/home/xxzhang/workplace/software/R/R-3.6.3/bin/Rscript /home/xxzhang/workplace/QBRC//somatic_script/add_fs_annotation.R ./output hg19 somatic

注釋內容：用到了什么文件？

總結

以上是生活随笔為你收集整理的实验记录 | DNA数据的处理流程的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：响应式布局实现方案
下一篇： dna计算机量子计算,量子算法、DNA计

日韩av黄I国产麻豆传媒I国产91av视频在线观看I日韩一区二区三区在线看I美女国产在线I麻豆视频国产在线观看I成人黄色短片

编程问答

实验记录 | DNA数据的处理流程

總結