日韩性视频-久久久蜜桃-www中文字幕-在线中文字幕av-亚洲欧美一区二区三区四区-撸久久-香蕉视频一区-久久无码精品丰满人妻-国产高潮av-激情福利社-日韩av网址大全-国产精品久久999-日本五十路在线-性欧美在线-久久99精品波多结衣一区-男女午夜免费视频-黑人极品ⅴideos精品欧美棵-人人妻人人澡人人爽精品欧美一区-日韩一区在线看-欧美a级在线免费观看

歡迎訪問 生活随笔!

生活随笔

當前位置: 首頁 > 编程资源 > 综合教程 >内容正文

综合教程

linux command line 利用Entrez Direct下载NCBI数据

發布時間:2023/12/13 综合教程 31 生活家
生活随笔 收集整理的這篇文章主要介紹了 linux command line 利用Entrez Direct下载NCBI数据 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

一、軟件的安裝

1.軟件下載:

curl ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/edirect.zip -O (熟悉curl下載文件的方法,見http://www.cnblogs.com/duhuo/p/5695256.html)

2.解壓

unzip edirect.zip

3.添加、激活環境變量

echo 'export PATH=/home/lmt/desktop/edirect/:$PATH' >> ~/.zshrc (根據自己的配置文件選擇,可能使~/.bashrc。查看shell ,echo $SHELL 就曉得啦)

source ~/.zshrc(激活環境變量)

二、.entrez direct的功能

1.esearch 根據給定的indexed fields進行查找

2.efilter 過濾之前查找到的的結果

3.efetch 根據指定的格式下載所需的數據

。。。。。

三、用法舉例

下載核酸或蛋白序列(fasta格式)

esearch -db nucleotide -query 'CHN-JS-2014' | efetch -format fasta > 11.fasta #下載的為全基因組堿基序列

>KP757892.1 Porcine deltacoronavirus isolate CHN-JS-2014, complete genome
ACATGGGGACTAAAGATAAAAATTATAGCATTAGTCTATAATTTTATCTCCCTAGCTTCGCTAGTTCTCT
ACCGACACCAATCCAGGTGCGTCTGCCACCAAGTTGGCTACCCTTTCTAGGGGCGCTTTCGCGCTTGCTC
ACCATTAGATTACCTGGAAACCAGCCATTCAGGTTGGAGTTTCCCCAGGCTCTTTTGTGTGGGCATTAGC

esearch -db necleotide -query 'CHN-JS-2014' | efetch -format gene_fasta > 22.fasta #下載的為各個區段的基因的堿基序列,如S/E/M等,分開的

>lcl|KP757892.1_gene_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [location=22797..23048]
ATGGTAGTCGACGACTGGGCCGTTACCATCCCTGGACAATATATTATTGCTATACTAGTTGTCATCTGCA
TTGGTGTGGCACTACTTTTTATTAACACTTGCTTAGCTTGTGTTAAATTATTTTACAAGTGCTACCTAGG
GGCAGCATACCTTGTTAGGCCTATTATAGTGTACTACTCCAAGCCGAACCCCGTACCTGAGGATGAGTTT
GTAAAAGTACACCAATTTCCTAGAAACACTCACTATGTCTGA
>lcl|KP757892.1_gene_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [location=23041..23694]
ATGTCTGACGCAGAAGAGTGGCAAATTATTGTTTTCATTGCGATCATATGGGCGCTTGGCGTCATCCTCC
AAGGAGGCTATGCCACGCGTAATCGTGTGATCTATGTTATTAAACTTATTCTGCTTTGGCTGCTCCAACC
CTTCACCCTAGTGGTGACCATTTGGACCGCAGTTGACAGATCATCTAAGAAGGACGCAGTTTTCATTGTG
TCCATAATTTTTGCCGTACTGACCTTCATATCCTGGGCCAAGTACTGGTATGACTCAATTCGCTTATTAA
TGAAAACCAGATCTGCATGGGCACTCTCACCTGAGAGTAGACTCCTTGCAGGGATTATGGATCCAATGGG
TACATGGAGGTGCATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC

esearch -db necleotide -query 'CHN-JS-2014' | efetch -format fasta_cds_aa > 33.fasta #下載的為各個區段的基因的蛋白序列,分開的(在核酸庫里搜索,試著用蛋白庫,發現報錯)

>lcl|KP757892.1_prot_AKC54443.1_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [protein=envelope protein] [protein_id=AKC54443.1] [location=22797..23048] [gbkey=CDS]
MVVDDWAVTIPGQYIIAILVVICIGVALLFINTCLACVKLFYKCYLGAAYLVRPIIVYYSKPNPVPEDEF
VKVHQFPRNTHYV
>lcl|KP757892.1_prot_AKC54444.1_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [protein=membrane protein] [protein_id=AKC54444.1] [location=23041..23694] [gbkey=CDS]
MSDAEEWQIIVFIAIIWALGVILQGGYATRNRVIYVIKLILLWLLQPFTLVVTIWTAVDRSSKKDAVFIV
SIIFAVLTFISWAKYWYDSIRLLMKTRSAWALSPESRLLAGIMDPMGTWRCIPIDHMAPILTPVVKHGKL
KLHGQELANGISVRNPPQDMVIVSPSDTFHYTFKKPVESNNDPEFAVLIYQGDRASNAGLHTITTSKAGD
ARLYKYM

esearch -db necleotide -query 'CHN-JS-2014' | efetch -format fasta_cds_na > 44.fasta #下載的為各個區段基因的堿基序列,如S/E/M等,分開的,和22.fasta結果一樣,只是注釋信息較多

下載序列(非fasta格式)

>lcl|KP757892.1_cds_AKC54443.1_3 [gene=E] [locus_tag=PDCoV-CHN-JS-2014_gp3] [protein=envelope protein] [protein_id=AKC54443.1] [location=22797..23048] [gbkey=CDS]
ATGGTAGTCGACGACTGGGCCGTTACCATCCCTGGACAATATATTATTGCTATACTAGTTGTCATCTGCA
TTGGTGTGGCACTACTTTTTATTAACACTTGCTTAGCTTGTGTTAAATTATTTTACAAGTGCTACCTAGG
GGCAGCATACCTTGTTAGGCCTATTATAGTGTACTACTCCAAGCCGAACCCCGTACCTGAGGATGAGTTT
GTAAAAGTACACCAATTTCCTAGAAACACTCACTATGTCTGA
>lcl|KP757892.1_cds_AKC54444.1_4 [gene=M] [locus_tag=PDCoV-CHN-JS-2014_gp4] [protein=membrane protein] [protein_id=AKC54444.1] [location=23041..23694] [gbkey=CDS]
ATGTCTGACGCAGAAGAGTGGCAAATTATTGTTTTCATTGCGATCATATGGGCGCTTGGCGTCATCCTCC
AAGGAGGCTATGCCACGCGTAATCGTGTGATCTATGTTATTAAACTTATTCTGCTTTGGCTGCTCCAACC
CTTCACCCTAGTGGTGACCATTTGGACCGCAGTTGACAGATCATCTAAGAAGGACGCAGTTTTCATTGTG
TCCATAATTTTTGCCGTACTGACCTTCATATCCTGGGCCAAGTACTGGTATGACTCAATTCGCTTATTAA
TGAAAACCAGATCTGCATGGGCACTCTCACCTGAGAGTAGACTCCTTGCAGGGATTATGGATCCAATGGG
TACATGGAGGTGCATTCCCATCGACCACATGGCTCCAATTCTCACACCAGTCGTTAAGCATGGCAAGCTC

esearch -db necleotide -query 'CHN-JS-2014' | efetch -format gb > 55.fasta #下載的格式和在NCBI里的界面結果顯示一樣。

LOCUS       KP757892               25420 bp ss-RNA     linear   VRL 17-DEC-2015
DEFINITION  Porcine deltacoronavirus isolate CHN-JS-2014, complete genome.
ACCESSION   KP757892
VERSION     KP757892.1
KEYWORDS    .
SOURCE      Porcine deltacoronavirus
  ORGANISM  Porcine deltacoronavirus
            Viruses; ssRNA viruses; ssRNA positive-strand viruses, no DNA
            stage; Nidovirales; Coronaviridae; Coronavirinae.
REFERENCE   1  (bases 1 to 25420)
  AUTHORS   Dong,N., Fang,L., Zeng,S., Sun,Q., Chen,H. and Xiao,S.
  TITLE     Porcine Deltacoronavirus in Mainland China
  JOURNAL   Emerging Infect. Dis. 21 (12), 2254-2255 (2015)
   PUBMED   26584185
REFERENCE   2  (bases 1 to 25420)
  AUTHORS   Dong,N., Fang,L., Zeng,S., Sun,Q. and Xiao,S.
  TITLE     Direct Submission
  JOURNAL   Submitted (06-FEB-2015) State Key Laboratory of Agricultural
            Microbiology, Huazhong Agricultural University, 1 Shizishan Street,
            Wuhan, Hubei 430070, China
COMMENT     ##Assembly-Data-START##
            Sequencing Technology :: Sanger dideoxy sequencing
            ##Assembly-Data-END##
FEATURES             Location/Qualifiers
。。。。
。。。。。
。。。。。
。。。。 gene 22797..23048 /gene="E" /locus_tag="PDCoV-CHN-JS-2014_gp3" CDS 22797..23048 /gene="E" /locus_tag="PDCoV-CHN-JS-2014_gp3" /codon_start=1 /product="envelope protein" /protein_id="AKC54443.1" /translation="MVVDDWAVTIPGQYIIAILVVICIGVALLFINTCLACVKLFYKC YLGAAYLVRPIIVYYSKPNPVPEDEFVKVHQFPRNTHYV" gene 23041..23694 /gene="M"
。。。。。。
。。。。。。。

下載SRA數據的info信息

esearch -db sra -query SRP075747 | efetch -format runinfo > runinfo.txt

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR3589948,2016-09-09 16:27:05,2016-05-26 07:22:58,40008592,4080876384,40008592,102,1812,,https://sra-download.ncbi.nlm.nih.gov/traces/sra40/SRR/003505/SRR3589948,SRX1801292,,RIP-Seq,other,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina HiSeq 2500,SRP075747,PRJNA323422,2,323422,SRS1468122,SAMN05178619,simple,9606,Homo sapiens,GSM2177715,,,,,,,no,,,,,GEO,SRA429358,,public,D9CB6278FA440C16D04832F947BF338F,165928A89FAE018C75463F7074DADEA8
SRR3589949,2016-09-09 16:27:05,2016-05-26 07:23:43,37825589,3858210078,37825589,102,1664,,https://sra-download.ncbi.nlm.nih.gov/traces/sra40/SRR/003505/SRR3589949,SRX1801293,,RIP-Seq,other,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina HiSeq 2500,SRP075747,PRJNA323422,2,323422,SRS1468123,SAMN05178620,simple,9606,Homo sapiens,GSM2177716,,,,,,,no,,,,,GEO,SRA429358,,public,4C986EE070A46559AF6F8892378A6E7C,EC2FFDCD9C997BED576391FD3B19CF9E

總結

以上是生活随笔為你收集整理的linux command line 利用Entrez Direct下载NCBI数据的全部內容,希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯,歡迎將生活随笔推薦給好友。