Gene Ontology Annotation

Introduction

Annotation rule is the process of selecting GO terms from the GO pool obtained by the Mapping step and assigning them to the query sequences. In the current OmicsBox version, this is the core type of functional annotation.

GO annotation is carried out by applying an annotation rule (AR) on the found ontology terms. The rule seeks to find the most specific annotations with a certain level of reliability. This process is adjustable in specificity and stringency.

For each candidate GO an annotation score (AS) is computed. The AS is composed of two additive terms.

The first, direct term (DT), represents the highest hit similarity of this GO weighted by a factor corresponding to its EC.

The second term (AT) of the AS provides the possibility of abstraction. This is defined as an annotation to a parent node when several child nodes are present in the GO candidate collection. This term multiplies the number of total GOs unified at the node by a user-defined GO weight factor that controls the possibility and strength of abstraction. When GO weight is set to 0, no abstraction is done.

Finally, the AR selects the lowest term per branch that lies over a user-defined threshold. DT, AT, and the AR terms are defined as given in figure 1.

To better understand how the annotation score works, the following reasoning can be done: When EC-weight is set to 1 for all ECs (no EC influence) and GO-weight equals zero (no abstraction), then the annotation score equals the maximum similarity value of the hits that have that GO term and the sequence will be annotated with that GO term if that score is above the given threshold provided. The situation when EC-weights are lower than 1 means that higher similarities are required to reach the threshold. If the GO-weight is different to 0 this means that the possibility is enabled that a parent node will reach the threshold while its various children nodes would not.

The annotation rule provides a general framework for annotation. The actual way annotation occurs depends on how the different parameters at the AS are set. These can be adjusted in the Annotation Configuration Dialog (figure 2) and in the Evidence Code Weight Configuration Dialog (figure 3).

Please cite:

Gotz S., Garcia-Gomez JM., Terol J., Williams TD., Nagaraj SH., Nueda MJ., Robles M., Talon M., Dopazo J. and Conesa A. (2008). High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic acids research, 36(10), 3420-35.

Figure 1: OmicsBox Annotation Rule

Run Blast2GO Annotation

The Blast2GO Mapping functionality can be found under functional analysis → Blast2GO Annotation.

Annotation Configuration

  • Annotation Cut-Off (threshold): The annotation rule selects the lowest term per branch that lies over this threshold (default=55).

  • GO-Weight: This is the weight given to the contribution of mapped children terms to the annotation of a parent term (default=5).

  • Filter GO by taxonomy: The filter will remove the Gene Ontology terms known not to be in the given taxonomy using the restrictions defined by Gene Ontology. You can select one of the given options or simply write a taxonomy id.

  • E-Value-Hit-Filter: This value can be understood as a pre-filter: only GO terms obtained from hits with a greater e-value than given will be used for annotation and/or shown in a generated graph (default=1.0E-6).

  • Hsp-HitCoverage CutOff: Sets the minimum needed coverage between a Hit and his HSP. For example, a value of 80 would mean that the aligned HSP must cover at least 80% of the longitude of its Hit. Only annotations from Hit fulfilling this criterion will be considered for annotation transference.

  • Hit Filter: This option allows you to consider only the first N hits during annotation. This option is correlative with "Only hits with GOs'' feature.

  • Only hits with GOs: This option together with the "Hit Filter'' option allows to apply it only on hits that have a GO term candidate.


Figure 2: Annotation Configuration

Evidence Code Weights

By employing ECs, it promotes the assignment of annotations with experimental evidence and penalizes electronic annotations or low traceability.

EC code weights can be modified depending on what you want. Note that in case of influence by evidence codes is not wanted, you can set them all at 1. Alternatively, when you want to exclude GO annotations of a certain EC (for example IEAs), you can set this EC weight at 0.

Figure 3: Evidence Code weight configuration

Results

Successful annotation for each query sequence will result in a color change for that sequence from light-green to blue at the Main Sequence Table, and only the annotated GOs will remain in the GO IDs column.

Individual Annotation Results

Annotation results for each sequence can also be visualized on the GO DAG by selecting "Draw Graph of GO-Mapping with Annotation Score'' in the context menu. Additionally, the "Change Annotation and Description'' figure 4 options of this menu offer also the possibility to adjust annotations specifically for a single sequence.
This function edits the annotation of the selected and allows typing and deleting of annotation or sequence description. A manual annotation check-box (see figure 5) is available for marking sequences with manual annotation. The sequence will get the pink label on the Main Sequence Table.

For Annotation statistic charts see the Charts and Statistics page of this user manual.


Figure 4: Manually change Annotation and Description

Figure 5: Mark Manual Annotation

Export Annotation Results

The annotation results can be exported in a variety of formats. This function is available under File → Export → Export Annotation.

  1. .annot. This is the default option for Annotation export and the exchange annotation format in OmicsBox. Annotations are provided in a three-column fashion. The first column contains the sequence name, the second the annotation code and the third the sequence description. When multiple annotations for the same sequence are available, these come in subsequent rows. GO and EC annotations are exported jointly in the same format.

  2. Genespring format. One single row is given by sequence where three different columns are provided for Molecular Function, Biological Process, and Cellular Component. GO terms are denoted by their description rather than by their code.

  3. GoStats format. One single row is given by sequence and GO terms are only denoted by entire numbers ("GO:" and left zero's are skipped)

  4. WEGO format (native). One single row is given by sequence, including those without annotated GOs. Belonging GOs are added to each sequence separated by tabs. The format corresponds to the "WEGO native format'', shown in this example: 
    http://wego.genomics.org.cn/docs/input01.lst.

  5. Custom: It is possible to customize the exportation of the annotation file according to the information desired or the column separator see the next figure.

OmicsBox allows exporting additional annotation file formats.

  1. Export Annotations in GO Annotation File Format (GAF v.2), which is the primary format currently used by the GO Consortiumhttp://geneontology.org/page/go-annotation-file-formats.

  2. Export Annotation Descriptions.

  3. Export GO Propagation: Exports the GO parents up to the root for the annotated sequences.

  4. Export Sequences per GO (Gene Sets).


Figure 6: Export Annotation Configuration

Figure 7: Export Annotations Custom Configuration

Annotate GOs from Blast Descriptions

This tool looks at every significant alignment (Right-Click → Show Blast Result on a sequence) for each sequence and searches their description lines for GO ids. These GOs are now directly annotated to the sequence if the alignments similarity passes the desired minimum. Validation can also be applied and is recommended, it will remove intermediate GO terms.

There are still other annotation functions available in the submenu:

Other Annotation Functions

  • Remove Annotation. Delete Annotation results for the selected sequences.

  • Filter Annotation by GO Taxa

  • Validate Annotations. OmicsBox annotation generates the lowest node annotations. This is not always guaranteed when Annotations have been imported or changed manually. This function can be run to ensure that no parent-child redundancy is present in the annotated set.

  • Remove 1. Level Annotations

  • Annotate GOs from Blast Descriptions allows to transfer of GOs from the Blast hit descriptions to their sequences.