OmicsBox uses the Basic Local Alignment Search Tool (BLAST) to find sequences similar to your query set. Please, refer to http://www.ncbi.nlm.nih.gov/BLAST for details on the BLAST function. Figure 2, show the BLAST Configuration Dialog Window that controls the BLAST step.
BLAST in OmicsBox can basically be performed in four different ways:
CloudBlast. This is a cloud-based OmicsBox Community Resource for massive sequence alignment tasks. It allows you to execute standard NCBI Blast+ searches directly from within OmicsBox in a dedicated computing cloud. CloudBlast is a high-performance, secure and cost-optimized solution for your analysis. This is a blast service totally independent from the NCBI servers to provide fast and reliable sequence alignments. Please see Run Blast using CloudBLAST section for more information.
QBlast@NCBI. NCBI offers a public service that allows searching molecular sequence databases with the BLAST algorithm. The main advantages of making use of this service are its versatility and that no database maintenance is required. Therefore by selecting this option at OmicsBox no additional installations have to be done.
Local BLAST against its own database. It is possible to use BLAST+ executable to query a local/own database. At https://www.blast2go.com/make-own-database-and-blast and at the Make Blast Database section one can see how to prepare and blast locally an own fasta database.
Custom Database CloudBlast. It is possible to run BLAST against a database made of a custom protein fasta file using the OmicsBox Cloud resources.
Figure 1: Select between NCBI, Local, CloudBlast or Custom Database
Blast Configuration Page
Your e-mail address in case you are using the NCBI BLAST web service.
BLAST program: The algorithm you want to use:
blastp - Compares an amino acid query sequence against a protein sequence database.
blastn (-task blastn) - Compares a nucleotide query sequence against a nucleotide sequence database.
blastx - Compares a nucleotide query sequence translated in all reading frames against a protein sequence database. Used to find potential translation products of an unknown nucleotide sequence
tblastn - Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.
blastn (-task megablast)
blastn (-task dc-megablast)
BLAST DB: The name of the database to search in (eg. nr, swissprot, pdb). To see a list of possible DBs at NCBI seehttp://data.biobam.com/ncbi_blast_dbs_protein.pdf
Taxonomy Filter: Search for Blast results only in the selected taxonomy.
BLAST expect value: The statistical significance threshold for reporting matches against database sequences. If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Increasing the threshold shows less stringent matches.
Number of BLAST hits: The number of alignments you want to achieve (0-100).
BLAST Description Annotator: The BDA finds the best possible description for a new sequence based on a given BLAST result.
Figure 2: Blast Configuration Page
Word size: One of the important parameters governing the sensitivity of BLAST searches is the length of the initial words. The word size is adjustable in blastn and can be reduced from the default value to increase sensitivity. This word size can also be increased to increase the search speed and limit the number of database hits.
Low complexity filter: The BLAST programs employ the SEG algorithm to filter low complexity regions from proteins before executing a database search. The default is ON.
HSP length cutoff: A Cutoff value for the minimal length of the first hsp of a balst hit, used to exclude hits with only small local alignments from the BLAST result. The given length corresponds to amino-acids or nucleotides depending on the type of performed BLAST.
Filter by description: Filter-out Blast hits by a description
Figure 3: Advanced Page
Save Results Page
The results of the BLAST queries can also be directly saved to a file in different formats by selecting the corresponding checkboxes at the BLAST Save Results Page. If the chosen file already exists, upcoming results will be appended. Choose a format type to additionally save your BLAST results.
XML2: This is a new BLAST result provided by NCBI and can also be loaded into OmicsBox.
XML: It is recommended to save your BLAST results as XML as this format is supported by the OmicsBox Load BLAST Results function.
TXT: It saves the blast results of each sequence in text file format.
HTML: For each sequence, a file in HTML format will be saved.
Figure 4: Save Results Page
CloudBlast offers a highly optimized, self-sustained HPC solution to address a very specific need of the OmicsBox community.
CloudBlast is a BLAST service totally independent of the NCBI servers to provide fast and reliable sequence alignments. It consists of a high performance computing cluster dedicated exclusively to Blast searches.
All OmicsBox subscriptions include "Cloud Units" to make use of this resource and allows you to perform blast searches for tens of thousands of sequences within a few days against a large collection of protein databases.
These units correspond directly to the usage of the cluster (used CPU seconds and network traffic/data volume).
Each sequence alignment performed in the system consumes a certain amount of computation time depending on the sequence length and the blast algorithm (blastx, blastp) and the parameters used. The smaller the database you blast against the more sequences you can analyse with 6.000.000 Cloud Units (see Cloud Usage in the View Menu section to know how to monitor the Cloud Units). This means that e.g. if you blast against the vertebrate NR-subset you would be able to blast approx. one million (1.000.000) sequences. If you decide to blast against the NR database, the largest protein database available, it should allow you to blast approx. 80.000 sequences (with an average length of 800nt per sequence). One has to add the Species taxonomy id to blast against an NR-subset.
Figure 5: CloudBlast Configuration Page
With Local BLAST you can blast the sequences against your own database. OmicsBox allows creating a Blast database from a FASTA file with the option "Make Blast Database'' (see Make Blast Database section). Download and format your database and choose the corresponding folder to see figure 6. Databases have to be formatted for NCBI Blast+.
The main parameters in the Local BLAST Configuration page are very similar to the ones in NCBI and CloudBlast. The main difference is when choosing the database as OmicsBox is expecting a .pal' file or .psq. On the Advanced Page at the "Run Parameters,'' it is possible to select the number of threads to be used. This field has not to be set up as OmicsBox detects the number of threads in the computer. The Advanced Page section provides a detailed description of each parameter. As in CloudBlast, the BLAST results will be saved in XML file format.
Visit the following tutorial on how to download NCBI pre-formatted databases.
Please cite NCBI for Local Blast and pre-formatted databases https://www.ncbi.nlm.nih.gov/books/NBK569850/ .
Figure 6: Local Blast Configuration Page
Custom Database CloudBlast
OmicsBox offers the possibility to generate your own custom database from a .FASTA file and run Blast on the OmicsBox Cloud.
The database will be automatic generated in the Cloud using the fasta file and the parameters provided. When running Custom Database CloudBlast Cloud Units will be consumed. For more information on Cloud Units can be found online or under the CloudBlast section.
As the BLAST search progresses, sequences with successful BLAST results change their color on the Main Sequence Table from white to orange and the BLAST result-related columns will be filled. In case no results could be retrieved for a given sequence, this row will turn dark-red.
Figure 7: Show BLAST Results
Individual Blast Results
With a mouse the right click on a sequence, the Single Sequence Menu will be displayed and it is possible to see the BLAST results for each sequence individually. Show BLAST Results (figure 7) will generate a tab in the Results containing information on the results of the similarity search of the selected sequence. For each of the obtained hits, the following information is given: Hit id and definition Gene name assigned to the hit by its accession e-value of the alignment Alignment length of the longest hsp Positive matches of the longest hsp Hsp similarity of hit: Number of hsps mapped GO-Terms with its evidence code UniProt codes of the hit sequences.
For Blast statistic charts see Charts and Statistics page of this user manual.
Figure 8: Individual BLAST Result Table View
Figure 9: Individual BLAST Result in Alignment View
Export Blast Top-Hits
A tab separator text file can be exported with the Blast top hit of each sequence (File → Export → Export Blast Top-Hits).
Other BLAST Functions
The following functionalities can be found under functional analysis → Blast
Remove Blast Results: This option will remove the BLAST results from the selected sequences.
Run Blast-Descriptor-Annotator (BDA): This will run the BDA algorithm. For further details, please see Blast Configuration Page section.
Recover original Best-Blast-Hit Description: When this option is executed the sequence description column on the Main Sequence Table will contain the top blast hit description and not the one from the BDA.