Gene Ontology Graphs

Introduction

The Gene Ontology structure can be described in terms of a graph, where each GO term is a node, and the relationships between the terms are edges between the nodes. GO is loosely hierarchical, with child terms being more specific than their parent terms. A child term may have more than one parent term. There exist different types of relationships between child and parent terms: is a (is a subtype of); part of; has part; regulates, negatively regulates, and positively regulates. Children that represent a more specific instance of a parent term have is a relationship to the parent. Children that are a constituent of the parent term have a part of relationship.
The three GO categories (cellular component, biological process, and molecular function) are each represented by a separate root ontology term.

OmicsBox offers the possibility of visualizing the hierarchical structure of the gene ontology by directed acyclic graphs (DAG). OmicsBox integrates a viewer for graph visualization. It allows fast navigation and zooms on the GO DAG. OmicsBox provides various ways for the joint analysis of groups of annotated sequences.

It is possible to generate these graphs in OmicsBox:

  • Make GO Graph: Generates the GO graph of the provided GOs.

  • Make Combined Graph: Generates the GO graph to visualize the annotation results.

  • Make Colored Graph: Generates the GO graph from a text file.

These functionalities are available under functional analysis → Gene Ontology Graph

Make GO Graph

The "Make GO Graph'' function allows visualizing any set of GO terms/Ids and these have to be provided by the user (figure 2).


Figure 1: Make GO Graph

Figure 2: Make GO ID Graph

Make Combined Graph

OmicsBox generates combined graphs where the annotation of a group of sequences is visualized together. This can be used to study the joined biological meaning of a set of sequences. It can be used to visualize results at different stages of the application.
Combined graphs are a good alternative to enrichment analysis where there is no reference set to be considered or the number of involved sequences is low.

Figure 3: Combined graph visualization

Graph Drawing Configuration

The following parameters are available:

  • Graph Title

  • GO Categories

For each Gene Ontology category, a graph will be displayed. OmicsBox allows extracting information from the graph nodes such as tooltip (figure 6), create a subgraph from that specific GO, create an Id list of the sequences that have been annotated with that particular GO (figure 7). The generated Id list can then be used within OmicsBox in the select by sequences feature (see Select Sequences and Functions Section).

Figure 4: Combined Graph Drawing Configuration Dialog allows to provide a graph title header and to choose between the different GO categories

Figure 5: Biological Process Combined Graph

Figure 6: Graph Node Tooltip

Figure 7: Extract Node Information

Results

Graph element legend

Gene Ontology term obtained by mapping which can directly be associated with one or more BLAST hits. (GO-Accession, maximum hit e-value assigned, max. hit similarity assigned, number of hits belonging to this)

Non-annotated GO term node (GO term name, mean e-value of all hits contributing to this node, max. e-value, max. Similarity, number of Hits contributing to this node, Annotation Algorithm Score)

Annotated GO term node (GO term name, mean e-value of all hits contributing to this node, max. e-value, max. Similarity, number of Hits contributing to this node, Annotation Algorithm Score)

The GO Graphs are displayed in different shapes:

  • octagon: Annotated GO Terms

  • square: Intermediate GO Terms

  • ellipsis: GO Terms linked to a Blast Hit

Graph Side Panel

The generated combined graph is interactive and its parameters can be modified from the side panel.

  • View. This section controls the graph visualization within its area.

    • Zoom: Zoom in/out is supported on the mouse wheel or from the icons.

    • Collapse All: The nodes will collapse and only the root will be visualized.

    • Expand All: The nodes will expand to the original graph visualization.

    • Re-Layout: The whole graph will be re-scaled to adjust to the visualization area.

  • Search. Allows to search for GO IDs/ Terms/ Description in the Combined Graph.

  • Node Info. This parameter controls the information shown at a node. Possible values are:

    • GO ID: If checked the GO ID will be included in the node.

    • GO Name: The GO Names are shown in the node.

    • GO Description: When checked the GO Description will be included in the node.

    • Nodescore: The node score will be shown in the node.

    • Sequence Names: The names of the sequences annotated at each GO are included in the node. The limit number of names to be displayed is 15.

    • Sequences: The number of sequences annotated with that particular GO will be displayed in the node.

  • Layout.

    • Edge Labels: When checked the labels on the edges will be shown.

    • Expand/Collapse Icon: If checked the ions that represent expand/collapse on the node are displayed.

    • Only "is a'' Relations: Only the is a relation between nodes will be displayed if the box is checked.

    • Color: OmicsBox highlights nodes proportionally to some parameter of the analysis which result is visualized on the DAG.

      • Ontology: All nodes will be colored according to the ontology category, Biological Process - green; Molecular Function - blue; Cellular Component - yellow.

      • White: The nodes will turn white.

      • By Nodescore: A Score is computed at each node according to the formula:

        where seq is the number of different sequences annotated at a child GO term and dist the distance to the node of the child. GO term Coloring by Score will highlight areas of high annotation density.

      • By Sequence Count: Node color intensity will be proportional to the number of contributing sequences at the node.

  • Options.

    • Sequence Filter: The minimal number of sequences a GO node must have assigned, to be displayed. This filter is used to control the number of nodes present in the graph. It is recommended to start the analysis with a high number that, depending on the number of total sequences, is expected to overload the graph. Depending on the result adjust this value until you obtain a satisfactory graph. Start with 10% of your total number of sequences.

    • Nodescore Filter: OmicsBox allows modulation of graph size by introducing node filters that depend on the type of graph considered.

    • Score alpha. The value for parameter alpha in the Score formula Node Score Filter. Only nodes with a Score value higher than the Filter will be shown. Use this parameter to thin out the GO-DAG for low informative nodes.

    • Restore Defaults: All filters will be set to the default values.

  • Charts. (see next section)

  • Save as. The information present in a Combined Graph can be saved as an image (.png) or in table format. This will generate a .txt file where all information related to each node of the plotted Graph is provided in different columns.

  • Overview. Provides a radar-like view of the graph, which allows adjusting the visible window.

  • Open With. Open the graph information as TreeMap or WordCloud (see following sections).

Figure 8: Combined Graph Side Panel

Graph Charts

Analysis of GO Term associations in a set of sequences can also be done by Pie/Bar Charts. For this analysis, a Combined Graph must have been generated first. Once the graph is visible in the GO Graph panel you can find several icons to visualize the 4 different types of charts.

Four possibilities are available:

  1. Sequence distribution by GO level (Pie-Chart): This pie chart represents the number of sequences for each Gene Ontology term for a given level. See figure 10.

  2. Sequences per GO terms (Multilevel Pie): This function generates a Pie with the lowest node per branch of the DAG that fulfills the filter condition., e.g. will find all the lowest nodes with the given number of sequences or Score value and will plot them jointly in a Pie representation. See figure 11.

  3. Gene Level (Bar-Chart): A bar chart representing the GO terms according to the number of annotated sequences. See figure 12.

  4. Sequence distribution by GO level (Bar-Chart): This bar chart represents the number of sequences for each Gene Ontology term for a given level. See figure 13.

When any of these functions are called, a table of node counts is generated and displayed in the statistics tab.


Figure 9: Combined Graph Pie and Bar-Charts

Figure 10: Sequence distribution by GO level: Pie Chart

Figure 11: Sequence Distribution/GO as Multilevel-Pie (#score or #seq cutoff)

Figure 12: Biological Process Level 2

Figure 13: Sequence distribution by GO level: Bar Chart

WordCloud

A WordCloud is a visual representation of a list of labels. The importance of words, here GO terms, is represented by its font size. The font size depends on either the sequence count or the NodeScore of each GO term. The list of words can be limited to a specific Gene Ontology category (BP, CC or MF). The coloring is random. Several options to change the graphical appearance are available like the number of words, the orientation and shape of the cloud as well as the color scheme.

Figure 14: Convert Graph to Word Cloud

TreeMap

The TreeMap viewer allows visualizing graphs (hierarchical, tree-structured data in general) as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. The size of each rectangle represents the number of sequences associated with a given GO term or a GO's NodeScore.

Figure 15: A TreeMap representing a Gene Ontology Graph.
The size of the rectangles represents the number of sequences or the NodeScore of each GO term. 

Make Colored Graphs

We can generate a GO graph from a text (.txt) file which contains a list of GOs and the desired color for each of them. It is also possible to label groups of GOs with the same name. Figure 17 shows an example that was created introducing the following text file:

GO:0000003    6    Group A
GO:0040007    8    Group B
GO:0050896    1    Group B

The text file has to follow a simple structure, to be processed correctly. It may contain from 2 to 3 columns in each line. The first column has to contain a GO, the second a number (0.0 to  ) and the optional third column contains a text that will be written into the octagon of the corresponding GO. The columns must be separated with a tabulator character.
According to the example above Group B has two GO IDs that contain different values. It is also possible to differentiate these GO IDs by coloring according to their values. In order to color the octagon according to the value, you should select the gradient color on the next page on the color graph configuration window (see figure 18).

Figure 16: Colour Configuration Window

Figure 17: Coloured GO Graph by Group

Figure 18: Coloured GO Graph by Group value

Figure 19: Select Colour to differentiate values within the same group.