InterProScan Annotation

Content of this page:

General

The functionality of InterPro annotations in OmicsBox allows to retrieved domain/motif information in a sequence-wise manner. Corresponding GO terms are then transferred to the sequences and merged with already existing GO terms. InterProScan results can be viewed through the Single Sequence Menu (figure 6) and saved in TXT and XML format (figure 5). The sequences will turn violet if no other analysis has been executed before. 

Image interproscan

Figure 1: InterProScan options

  • Run InteProScan. Start sending sequences to the EBI.
  • Merge InterProScan GOs to Annotation. Add GO terms obtained through motifs/domains to the current annotations.
  • Remove InterProScan. Delete InterProScan results for the selected sequences.

Run InterProScan

There are two options to run InterProScan in OmicsBox, either with CloudIPS or via the public web service at EBI.

CloudIPS is a cloud-based OmicsBox community resource for fast and reliable InterPro analysis for everything from small to big data-sets. It allows executing the original InterPro algorithms against up-to-date databases in our dedicated computing cloud. This is a high-performance, secure and cost-optimized solution for your analysis.
The public EMBL-EBI InterPro web-service scans your sequences against InterPro's signatures and performance and results depend on the EBI web-server.

InteProScan can only be performed if the sequences are shown in the sequence table that contains the actual sequence information (loaded via fasta file). You have to be careful if you created a project via a blast XML file or if you loaded a .annot file.
To add the sequences to the current OmicsBox project see Add sequences to existing OmicsBox project section. 

You can save the InterProScan results in different file formats, in tab separated values (TVS), XML, which is the default output, GFF3 and the input (query) sequence itself (figure 5). 
If you are working with nucleotide sequences, OmicsBox translates it to the longest open reading frame and sends it to InterProScan. For this particular case when exporting the input sequence OmicsBox will save the protein sequence itself and not the nucleotide one.

Once the InterProScan has finished it is possible to view the results of each sequence via the context menu (figure 6). 


Figure 2: InterProScan Configuration


Figure 3: Selection of Member Databases

Figure 4: Selection of Member Databases


Figure 5:  Save InterProScan Results


Image showipsresult

Figure 6: Show InterProScan Results 


Figure 7: InterProScan Results

Merge InterProScan GOs to Annotation

The InterProScan GOs results can now be added to the already existing annotations based on the BLAST results. This option is available from the InterProScan submenu.
Once the merge has finished a distribution chart is displayed in the Results menu showing the number of GOs that have been added to (or confirmed) the current annotation results.




Image mergeIPS

Figure 8: Merge InterProScan results


Image mergeIPSStatistics

Figure 9: Statistics after merging InterProScan to GO Annotation

Statistics

On the submenu of the "Charts'' icon it is possible to select InterProScan statistics to see how many sequences still do or do not have IPS results and how many sequences have GOs resulting from InterProScan.

  • InterProScan Results: This chart reflects the effect of adding the GO-terms retrieved through the InterProScan results (figure 11).
  • InterProScan Families Distribution: Bar chart representing the number of sequences that belong to a particular IPS family.
  • InterProScan Domains Distribution: Bar chart showing the number of sequences that belong to a particular IPS domain.
  • InterProScan Repeats Distribution: Bar chart reflecting the number of sequences that belong to a particular IPS repeat.
  • InterProScan Sites Distribution: Bar chart representing the number of sequences that belong to a particular IPS sites.
  • InterProScan IDs Distribution: Bar chart showing the number of sequences that have been annotated with that InterProScan IDs.
  • InterProScan IDs by Database: Pie chart reflecting the number of sequences of the InterProScan IDs for a particular InterProScan Database. In figure 10 the Pfam database is selected.


Figure 10: InterProScan Statistics Configuration Window



Image ipsstatistics

Figure 11: InterProScan Statistics


Load InterProScan Results

The InterProScan results saved in XML format can be loaded in the current OmicsBox project (File > Load > Load InterProScan Results).

When loading the InterProScan results it is possible to select the input format.

  • Protein - If InterProScan has been performed inside OmicsBox (OmicsBox translates the nucleotide sequences to the longest ORF peptides)
  • Nucleotides - If InterProScan has been performed with nucleotide sequences and InterProScan binaries.




Figure 12: Load InterProScan Results