Retrieve Blat Top-Hit

Introduction

The BLAT algorithm is short for “BLAST-like alignment tool.” BLAT is similar in many ways to BLAST. The program rapidly scans for relatively short matches (hits), and extends these into high-scoring pairs (HSPs). BLAT builds an index of the database and then scans linearly through the query sequence. In addition, BLAT can trigger extensions on any number of perfect or near-perfect hits. Furthermore, BLAT has a special code to handle introns in RNA/DNA alignments.

Please cite BLAT: Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002;12(4):656-664. doi:10.1101/gr.229202

In OmicsBox BLAT is used to replace query sequences of a dataset with the top-hit one found in a reference FASTA file.

Run Retrieve Blat Top-Hit

This functionality can be found under functional analysis → Blast → Retrieve Blat Top-Hit.

Configuration

This tool creates a BLAT database with a reference FASTA file and then finds a similar sequence in the project.
The following parameters can be configured.

  • Action: Allows to either replace the sequence from the data set or extract them into a new data set.

  • Sequence Name: It is possible to keep the original sequence names or to rename them to the names in the FASTA file. The latter will add a small note to the sequence description, telling the original name. 

  • Replace Query With Top-Hit: If checked the original sequence will be replaced by the one with a similar sequence found in the fasta file. This option is activated by default.

  • Reference Fasta: BLAT needs a reference FASTA file which is used to search for similar sequences.

  • Similarity: Filter by similarity

  • Check for Reverse Strand: If checked BLAT will also consider the reverse strand to find similar sequences.

Figure 1: Retrieve Blat Top-Hit Wizard

Results

Depending on the configuration a new project will be generated or the current one will be changed.

A possible use case scenario would be a so-called "Double-Blast'': The blast results of a first-run are used to replace the sequence data for a second run against a different set of query sequences.
This tool can be useful after running Prokaryotic Gene Finding, in order to replace the sequence names retrieved from Glimmer with the top-hit from a reference fasta.

Visit the online tutorial here to see how to replace the sequence names.