Retrieve Blast Top-Hit

Introduction

This feature allows retrieving the sequence information of Top Blast Hits from an OmicsBox project to improve the annotation of a dataset.

A possible use case scenario would be a so-called "Double-Blast'': The blast results of a first-run are used to replace the sequence data for a second run against a different set of query sequences. Imagine an RNA-seq data-set with a high percentage of sequences without any alignments against a protein database (e.g. blastx against NR). This feature could be used to select and extract the sequences without hits (red ones) into a new project. These sequences could be basted first against a set of EST sequences. The initial unaligned sequences are now replaced with the ESTs. Now the initial blastx search is repeated against the protein database.

Run Retrieve Blast Top-Hit

It can be found under functional analysis → Blast → Retrieve Blast Top-Hit.

Configuration

Data can be obtained from the NCBI, Ensembl or Uniprot web services and stored in a new project or replace the existing IDs/sequences (figure 1).

Retrieve Blat Top-Hit
  • Action: Allows to either replace the sequence from the data set or extract them into a new data set.

  • Sequence Name: It is possible to keep the original sequence names or to rename them to the names in the FASTA file. The latter will add a small note to the sequence description, telling the original name. 

  • Replace Query With Top-Hit: If checked the original sequence will be replaced by the one with a similar sequence found in the fasta file. This option is activated by default.

  • Filters Applied to Top-Hit: For each Top-Hit (first significant alignment from an already performed BLAST), apply the filters (bottom part of the dialog) and search them in the corresponding database (online). 

Figure 1: Retrieve Blast Top-Hit Dialog.

Results

Depending on the configuration a new project will be generated or the current one will be changed.