Functional Annotation

Content of this page:

Functional Annotation

EggNOG-Mapper

Eggnog-mapper is a tool for fast functional annotation of novel sequences (genes or proteins) using precomputed eggNOG-based orthology assignments. Obvious examples include the annotation of novel genomes, transcriptomes or even metagenomic gene catalogs. The use of orthology predictions for functional annotation is considered more precise than traditional homology searches, as it avoids transferring annotations from paralogs (duplicate genes with a higher chance of being involved in functional divergence). (Figures 1 and 2)

Details and methodology about the tool and its database are best explained on their website: http://eggnogdb.embl.de/#/app/methods

  • Genes or Proteins: A multi-fasta file containing genes or proteins.


Figure 1. EggNOG Mapper wizard: input page.

  • Taxonomic Scope: Fix the taxonomic scope used for annotation, so only orthologs from a particular clade are used for functional transfer. By default, this is automatically adjusted for every query sequence.
  • Target Orthologs: Define what type of orthologs should be used for functional transfer.
  • GO Evidence: Defines what type of GO terms should be used for annotation:
    • experimental = Use only terms inferred from experimental evidence
    • non-electronic = Use only non-electronically curated terms

Figure 2. EggNOG Mapper wizard: configuration page.

The result table (figure 3) summarizes all annotations that could be transferred with EggNOG Mapper. Besides ordering and filtering, the context menu allows to take a closer look at certain results.


Figure 3. EggNOG Mapper results table.

The annotation details (figure 4) provides link outs where possible and gives detailed information about annotated GOs.

Figure 4. EggNOG Mapper annotation details.


Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Jaime Huerta-Cepas, Damian Szklarczyk, Lars Juhl Jensen, Christian von Mering and Peer Bork. Submitted (2016).

Huerta-Cepas J et al. (2019). eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic acids research, 47(D1), D309-D314.

PfamScan

Pfam is a database of protein families. Briefly, each Pfam database entry is comprised of a seed alignment, which forms the basis to build a profile hidden Markov model (HMM) using the HMMER software (http://hmmer.org/). The profile HMM is then queried against a sequence database called pfamseq, and all matches scoring above the curated threshold (carefully chosen to avoid the inclusion of any known false positives), are aligned back to the profile HMM to generate the full alignment. Where possible, each entry is annotated with functional information derived from literature. To improve sustainability, especially with regard to scaling of the resource, pfamseq is derived only from the UniProt Knowledgebase (UniProtKB) sequences that belong to Reference Proteomes, rather than the entirety of UniProtKB. (Figure 5)

  • Genes or Proteins: A multi-fasta file containing genes or proteins.

Figure 5. PfamScan wizard.

The result table (figure 6) summarizes all PfamScan annotations. Besides ordering and filtering, the context menu allows to take a closer look at certain results.

Figure 6. PfamScan results table.

The annotation details (figure 7) provides link outs where possible and gives detailed information about annotated GOs.

Figure 7. PfamScan annotation details.

The Pfam protein families database in 2019: S. El-Gebali, J. Mistry, A. Bateman, S.R. Eddy, A. Luciani, S.C. Potter, M. Qureshi, L.J. Richardson, G.A. Salazar, A. Smart, E.L.L. Sonnhammer, L. Hirsh, L. Paladin, D. Piovesan, S.C.E. Tosatto, R.D. FinnNucleic Acids Research (2019)  doi: 10.1093/nar/gky995