Skip to content

AshokHub/xmlBLASTparser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xmlBLASTparser

About

xmlBLASTparser is a lightweight PHP library for parsing an XML formatted NCBI BLAST output and rendering into a colorful HTML page. The database accession number/id in the webpage is properly hyperlinked to the external source database. Moreover, the description summary in the webpage is hyperlinked with anchor link to the corresponding alignment section. The complete list of NCBI standard sequence identifiers are tabulated below:

Tag and Identifier Syntax Identifier Source Description
bbm|integer NCBI GenInfo Backbone database identifier
bbs|integer NCBI GenInfo Backbone database identifier
dbj|coll-accession|locus DNA Database of Japan
emb|coll-accession|entry EBI EMBL Database
gb|coll-accession|locus NCBI GenBank database
gi|integer NCBI GenInfo Integrated Database ("jee-aye")
gim|integer NCBI GenInfo Import identifier
gnl|database|idstring General (user-definable) database and identifier
gp|coll-accession|locus_cds# GenPept (GenBank protein) identifier
lcl|integer Local (user-definable) identifier
oth|accession|name|release Other (user-definable) identifier*
pat|country|patentid|serialno Patent sequence identifier
pdb|entry|chainid Brookhaven Protein Database
pir|accession|entry Protein Information Resource International
prf|accession|name Protein Research Foundation
ref|coll-accession|locus NCBI RefSeq
sp|coll-accession|locus SWISS-PROT database
tpd|coll-accession|name Third party annotation, DDBJ
tpe|coll-accession|name Third party annotation, EMBL
tpg|coll-accession|name Third party annotation, GenBank

*The NCBI has discontinued support for "oth" identifiers, but support for them is maintained in xdformat/xdget.

Usage

xmlBLASTparser can be used to parse XML file format output of the NCBI BLAST sequence alignment result obtained through any one of the following methods:

  • NCBI BLAST - The XML file format output of the sequence alignment can be downloaded from the NCBI BLAST from the result page and loaded into the xmlBLASTparser PHP file. For example,
$xml = simplexml_load_file("V07E2YXG014-Alignment.xml") or die("Error: Cannot able to create object");
$out = file_get_contents("https://blast.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Get&FORMAT_TYPE=XML&FORMAT_OBJECT=Alignment&RID=$rid");
$xml = new SimpleXMLElement($out);
  • Standalone NCBI BLAST - The XML file format output of the sequence alignment result can be obtained by executing the standalone NCBI BLAST executable programs such as blastn.exe, blastp.exe, blastx.exe, tblastn.exe, tblastx.exe, etc. and loaded into the xmlBLASTparser PHP file. For example,
exec('blastp.exe -db pdb -query seq.fa -remote -outfmt 5 -out out.xml');
$xml = file_get_contents("out.xml");

Input

<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd">
<BlastOutput>
  <BlastOutput_program>blastp</BlastOutput_program>
  <BlastOutput_version>BLASTP 2.7.0+</BlastOutput_version>
  <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
  <BlastOutput_db>pdb</BlastOutput_db>
  <BlastOutput_query-ID>Query_93791</BlastOutput_query-ID>
  <BlastOutput_query-def>KDG85104.1 hypothetical protein AE17_03267, partial [Escherichia coli UCI 58]</BlastOutput_query-def>
  <BlastOutput_query-len>82</BlastOutput_query-len>
  <BlastOutput_param>
    <Parameters>
      <Parameters_matrix>BLOSUM62</Parameters_matrix>
      <Parameters_expect>10</Parameters_expect>
      <Parameters_gap-open>11</Parameters_gap-open>
      <Parameters_gap-extend>1</Parameters_gap-extend>
      <Parameters_filter>F</Parameters_filter>
    </Parameters>
  </BlastOutput_param>
<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_93791</Iteration_query-ID>
  <Iteration_query-def>KDG85104.1 hypothetical protein AE17_03267, partial [Escherichia coli UCI 58]</Iteration_query-def>
  <Iteration_query-len>82</Iteration_query-len>
<Iteration_hits>
<Hit>
  <Hit_num>1</Hit_num>
  <Hit_id>gi|109158070|pdb|2GTS|A</Hit_id>
  <Hit_def>Chain A, Structure Of Protein Of Unknown Function Hp0062 From Helicobacter Pylori</Hit_def>
  <Hit_accession>2GTS_A</Hit_accession>
  <Hit_len>86</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>25.0238</Hsp_bit-score>
      <Hsp_score>53</Hsp_score>
      <Hsp_evalue>6.53601</Hsp_evalue>
      <Hsp_query-from>52</Hsp_query-from>
      <Hsp_query-to>74</Hsp_query-to>
      <Hsp_hit-from>20</Hsp_hit-from>
      <Hsp_hit-to>42</Hsp_hit-to>
      <Hsp_query-frame>0</Hsp_query-frame>
      <Hsp_hit-frame>0</Hsp_hit-frame>
      <Hsp_identity>9</Hsp_identity>
      <Hsp_positive>16</Hsp_positive>
      <Hsp_gaps>0</Hsp_gaps>
      <Hsp_align-len>23</Hsp_align-len>
      <Hsp_qseq>QFKSLMLKELNFVMNYVFTLETW</Hsp_qseq>
      <Hsp_hseq>RFKELLREEVNSLSNHFHNLESW</Hsp_hseq>
      <Hsp_midline>+FK L+ +E+N + N+   LE+W</Hsp_midline>
    </Hsp>
  </Hit_hsps>
</Hit>
<Hit>
  <Hit_num>2</Hit_num>
  <Hit_id>gi|970842266|pdb|5FCD|A</Hit_id>
  <Hit_def>Chain A, Crystal Structure Of Mccd Protein &gt;gi|970842267|pdb|5FCD|B Chain B, Crystal Structure Of Mccd Protein</Hit_def>
  <Hit_accession>5FCD_A</Hit_accession>
  <Hit_len>267</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>25.409</Hsp_bit-score>
      <Hsp_score>54</Hsp_score>
      <Hsp_evalue>8.26162</Hsp_evalue>
      <Hsp_query-from>61</Hsp_query-from>
      <Hsp_query-to>81</Hsp_query-to>
      <Hsp_hit-from>174</Hsp_hit-from>
      <Hsp_hit-to>194</Hsp_hit-to>
      <Hsp_query-frame>0</Hsp_query-frame>
      <Hsp_hit-frame>0</Hsp_hit-frame>
      <Hsp_identity>10</Hsp_identity>
      <Hsp_positive>14</Hsp_positive>
      <Hsp_gaps>0</Hsp_gaps>
      <Hsp_align-len>21</Hsp_align-len>
      <Hsp_qseq>LNFVMNYVFTLETWYSFFVLR</Hsp_qseq>
      <Hsp_hseq>INFRPNPLWTLEYWHQFFSER</Hsp_hseq>
      <Hsp_midline>+NF  N ++TLE W+ FF  R</Hsp_midline>
    </Hsp>
  </Hit_hsps>
</Hit>
<Hit>
  <Hit_num>3</Hit_num>
  <Hit_id>gi|257097223|pdb|3FX7|A</Hit_id>
  <Hit_def>Chain A, Crystal Structure Of Hypothetical Protein Of Hp0062 From Helicobacter Pylori &gt;gi|257097224|pdb|3FX7|B Chain B, Crystal Structure Of Hypothetical Protein Of Hp0062 From Helicobacter Pylori</Hit_def>
  <Hit_accession>3FX7_A</Hit_accession>
  <Hit_len>94</Hit_len>
  <Hit_hsps>
    <Hsp>
      <Hsp_num>1</Hsp_num>
      <Hsp_bit-score>25.0238</Hsp_bit-score>
      <Hsp_score>53</Hsp_score>
      <Hsp_evalue>9.03233</Hsp_evalue>
      <Hsp_query-from>52</Hsp_query-from>
      <Hsp_query-to>74</Hsp_query-to>
      <Hsp_hit-from>20</Hsp_hit-from>
      <Hsp_hit-to>42</Hsp_hit-to>
      <Hsp_query-frame>0</Hsp_query-frame>
      <Hsp_hit-frame>0</Hsp_hit-frame>
      <Hsp_identity>9</Hsp_identity>
      <Hsp_positive>16</Hsp_positive>
      <Hsp_gaps>0</Hsp_gaps>
      <Hsp_align-len>23</Hsp_align-len>
      <Hsp_qseq>QFKSLMLKELNFVMNYVFTLETW</Hsp_qseq>
      <Hsp_hseq>RFKELLREEVNSLSNHFHNLESW</Hsp_hseq>
      <Hsp_midline>+FK L+ +E+N + N+   LE+W</Hsp_midline>
    </Hsp>
  </Hit_hsps>
</Hit>
</Iteration_hits>
  <Iteration_stat>
    <Statistics>
      <Statistics_db-num>93500</Statistics_db-num>
      <Statistics_db-len>23509168</Statistics_db-len>
      <Statistics_hsp-len>0</Statistics_hsp-len>
      <Statistics_eff-space>0</Statistics_eff-space>
      <Statistics_kappa>0.041</Statistics_kappa>
      <Statistics_lambda>0.267</Statistics_lambda>
      <Statistics_entropy>0.14</Statistics_entropy>
    </Statistics>
  </Iteration_stat>
</Iteration>
</BlastOutput_iterations>
</BlastOutput>

Output

xmlBLASTparser_v1.1 Output

Support

Please feel free to sent your queries, suggestions and/or comments related to xmlBLASTparser program to [email protected] or [email protected].

License

xmlBLASTparser is made available under version 3 of the GNU Lesser General Public License.

Citation

Ashok Kumar, T., and Rajagopal, B. (2017). xmlBLASTparser v1.1 — a PHP based NCBI BLAST XML output parser. International Journal of Advanced Research in Computer Science. 8(8): 230-232. [Abstract] [PDF]