Comined Graph

Introduction

OmicsBox generates combined graphs where the annotation of a group of sequences is visualized together. This can be used to study the joined biological meaning of a set of sequences. It can be used to visualize results at different stages of the application.
Combined graphs are a good alternative to enrichment analysis where there is no reference set to be considered or the number of involved sequences is low.

Figure 1: Combined graph visualization

Graph Drawing Configuration

The following parameters are available:

  • Graph Title

  • GO Categories

For each Gene Ontology category, a graph will be displayed. OmicsBox allows extracting information from the graph nodes such as tooltip (figure 4), create a subgraph from that specific GO, create an Id list of the sequences that have been annotated with that particular GO (figure 5). The generated Id list can then be used within OmicsBox in the select by sequences feature (see Selection Section).

Figure 2: Combined Graph Drawing Configuration Dialog allows to provide a graph title header and to choose between the different GO categories

Figure 3: Biological Process Combined Graph

Figure 4: Graph Node Tooltip

Figure 5: Extract Node Information

Results

Graph element legend

Gene Ontology term obtained by mapping which can directly be associated with one or more BLAST hits. (GO-Accession, maximum hit e-value assigned, max. hit similarity assigned, number of hits belonging to this)

Non-annotated GO term node (GO term name, mean e-value of all hits contributing to this node, max. e-value, max. Similarity, number of Hits contributing to this node, Annotation Algorithm Score)

Annotated GO term node (GO term name, mean e-value of all hits contributing to this node, max. e-value, max. Similarity, number of Hits contributing to this node, Annotation Algorithm Score)

The GO Graphs are displayed in different shapes:

  • octagon: Annotated GO Terms

  • square: Intermediate GO Terms

  • ellipsis: GO Terms linked to a Blast Hit

Graph Side Panel

The generated combined graph is interactive and its parameters can be modified from the side panel.

  • View. This section controls the graph visualization within its area.

    • Zoom: Zoom in/out is supported on the mouse wheel or from the icons.

    • Collapse All: The nodes will collapse and only the root will be visualized.

    • Expand All: The nodes will expand to the original graph visualization.

    • Re-Layout: The whole graph will be re-scaled to adjust to the visualization area.

  • Search. Allows to search for GO IDs/ Terms/ Description in the Combined Graph.

  • Node Info. This parameter controls the information shown at a node. Possible values are:

    • GO ID: If checked the GO ID will be included in the node.

    • GO Name: The GO Names are shown in the node.

    • GO Description: When checked the GO Description will be included in the node.

    • Nodescore: The node score will be shown in the node.

    • Sequence Names: The names of the sequences annotated at each GO are included in the node. The limit number of names to be displayed is 15.

    • Sequences: The number of sequences annotated with that particular GO will be displayed in the node.

  • Layout.

    • Edge Labels: When checked the labels on the edges will be shown.

    • Expand/Collapse Icon: If checked the ions that represent expand/collapse on the node are displayed.

    • Only "is a'' Relations: Only the is a relation between nodes will be displayed if the box is checked.

    • Color: OmicsBox highlights nodes proportionally to some parameter of the analysis which result is visualized on the DAG.

      • Ontology: All nodes will be colored according to the ontology category, Biological Process - green; Molecular Function - blue; Cellular Component - yellow.

      • White: The nodes will turn white.

      • By Nodescore: A Score is computed at each node according to the formula:

        where seq is the number of different sequences annotated at a child GO term and dist the distance to the node of the child. GO term Coloring by Score will highlight areas of high annotation density.

      • By Sequence Count: Node color intensity will be proportional to the number of contributing sequences at the node.

  • Options.

    • Sequence Filter: The minimal number of sequences a GO node must have assigned, to be displayed. This filter is used to control the number of nodes present in the graph. It is recommended to start the analysis with a high number that, depending on the number of total sequences, is expected to overload the graph. Depending on the result adjust this value until you obtain a satisfactory graph. Start with 10% of your total number of sequences.

    • Nodescore Filter: OmicsBox allows modulation of graph size by introducing node filters that depend on the type of graph considered.

    • Score alpha. The value for parameter alpha in the Score formula Node Score Filter. Only nodes with a Score value higher than the Filter will be shown. Use this parameter to thin out the GO-DAG for low informative nodes.

    • Restore Defaults: All filters will be set to the default values.

  • Charts. (see next section)

  • Save as. The information present in a Combined Graph can be saved as an image (.png) or in table format. This will generate a .txt file where all information related to each node of the plotted Graph is provided in different columns.

  • Overview. Provides a radar-like view of the graph, which allows adjusting the visible window.

  • Open With. Open the graph information as TreeMap or WordCloud (see following sections).

Figure 6: Combined Graph Side Panel

Graph Charts

Analysis of GO Term associations in a set of sequences can also be done by Pie/Bar Charts. For this analysis, a Combined Graph must have been generated first. Once the graph is visible in the GO Graph panel you can find several icons to visualize the 4 different types of charts.

Four possibilities are available:

  1. Sequence distribution by GO level (Pie-Chart): This pie chart represents the number of sequences for each Gene Ontology term for a given level. See figure 8.

  2. Sequences per GO terms (Multilevel Pie): This function generates a Pie with the lowest node per branch of the DAG that fulfills the filter condition., e.g. will find all the lowest nodes with the given number of sequences or Score value and will plot them jointly in a Pie representation. See figure 9.

  3. Gene Level (Bar-Chart): A bar chart representing the GO terms according to the number of annotated sequences. See figure 10.

  4. Sequence distribution by GO level (Bar-Chart): This bar chart represents the number of sequences for each Gene Ontology term for a given level. See figure 11.

When any of these functions are called, a table of node counts is generated and displayed in the statistics tab.


Figure 7: Combined Graph Pie and Bar-Charts

Figure 8: Sequence distribution by GO level: Pie Chart

Figure 9: Sequence Distribution/GO as Multilevel-Pie (#score or #seq cutoff)

Figure 10: Biological Process Level 2

Figure 11: Sequence distribution by GO level: Bar Chart

WordCloud

A WordCloud is a visual representation of a list of labels. The importance of words, here GO terms, is represented by its font size. The font size depends on either the sequence count or the NodeScore of each GO term. The list of words can be limited to a specific Gene Ontology category (BP, CC or MF). The coloring is random. Several options to change the graphical appearance are available like the number of words, the orientation and shape of the cloud as well as the color scheme.

Figure 12: Convert Graph to Word Cloud