As a Data Mining tool, OmicsBox provides various ways for the joint analysis of groups of annotated sequences.
Descriptive analysis. Combined Graph Function
OmicsBox generates combined graphs where the combined annotation of a group of sequences is visualized together. This can be used to study the joined biological meaning of a set of sequences. Combined graphs are a good alternative to enrichment analysis where there is no reference set to be considered or the number of involved sequences is low. This function is available under Functional Analysis > Gene Ontology Graphs.
The next images show the Combined Graph Drawing Configuration Dialog, where the following parameters are available:
For each Gene Ontology category, a graph will be displayed. OmicsBox allows extracting information from the graph nodes such as tooltip (figure 4), create a subgraph from that specific GO, create an Id list of the sequences that have been annotated with that particular GO (figure 5). The generated Id list can then be used within OmicsBox in the select by sequences feature (see Select Sequences and Functions Section).
Figure 1: Combined graph visualization
Figure 2: Combined Graph Drawing Configuration Dialog allows to provide a graph title header and to choose between the different GO categories
Figure 3: Molecular Function Combined Graph
Figure 4: Graph Node Tooltip
Figure 5: Extract Node Information
Graph Side Panel
The generated combined graph is interactive and its parameters can be modified from the side panel.
View. This section controls the graph visualization within its area.
Collapse All: The nodes will collapse and only the root will be visualized.
Expand All: The nodes will expand to the original graph visualization.
Re-Layout: The whole graph will be re-scaled to adjust to the visualization area.
Search. Allows to search for GO IDs/ Terms/ Description in the Combined Graph.
Node Info. This parameter controls the information shown at a node. Possible values are:
GO ID: If checked the GO ID will be included in the node.
GO Name: The GO Names are shown in the node.
GO Description: When checked the GO Description will be included in the node.
Nodescore: The node score will be shown in the node.
Sequence Names: The names of the sequences annotated at each GO are included in the node. The limit number of names to be displayed is 15.
Sequences: The number of sequences annotated with that particular GO will be displayed in the node.
Edge Labels: When checked the labels on the edges will be shown.
Expand/Collapse Icon: If checked the ions that represent expand/collapse on the node are displayed.
Only "is a'' Relations: Only the is a relation between nodes will be displayed if the box is checked.
Ontology: All nodes will be colored according to the ontology category, Biological Process - green; Molecular Function - blue; Cellular Component - yellow.
White: The nodes will turn white.
By Nodescore: A Score is computed at each node according to the formula:
where seq is the number of different sequences annotated at a child GO term and dist the distance to the node of the child. GO term Coloring by Score will highlight areas of high annotation density.
By Sequence Count: Node color intensity will be proportional to the number of contributing sequences at the node.
Sequence Filter: The minimal number of sequences a GO node must have assigned, to be displayed. This filter is used to control the number of nodes present in the graph. It is recommended to start the analysis with a high number that, depending on the number of total sequences, is expected to overload the graph. Depending on the result adjust this value until you obtain a satisfactory graph. Start with 10% of your total number of sequences.
Score alpha. The value for parameter alpha in the Score formula Node Score Filter. Only nodes with a Score value higher than the Filter will be shown. Use this parameter to thin out the GO-DAG for low informative nodes.
Restore Defaults: All filters will be set to the default values.
Charts. (see next section)
Save as. The information present in a Combined Graph can be saved as an image (.png) or in table format. This will generate a .txt file where all information related to each node of the plotted Graph is provided in different columns.
Overview. Provides a radar-like view of the graph, which allows adjusting the visible window.
Open With. Open the graph information as TreeMap or WordCloud (see following sections).
Figure 6: Combined Graph Side Panel
Analysis of GO Term associations in a set of sequences can also be done by Pie/Bar Charts. For this analysis, a Combined Graph must have been generated first. Once the graph is visible in the GO Graph panel you can find several icons to visualize the 4 different types of charts.
Four possibilities are available:
Sequence distribution by GO level (Pie-Chart): This pie chart represents the number of sequences for each Gene Ontology term for a given level. See figure 8.
Sequences per GO terms (Multilevel Pie): This function generates a Pie with the lowest node per branch of the DAG that fulfils the filter condition., e.g. will find all the lowest nodes with the given number of sequences or Score value and will plot them jointly in a Pie representation. See figure 9.
Top 50 GO terms (Bar-Chart): A bar chart representing the GO terms according to the number of annotated sequences. See figure 10.
Sequence distribution by GO level (Bar-Chart): This bar chart represents the number of sequences for each Gene Ontology term for a given level. See figure 11.
When any of these functions are called, a table of node counts is generated and displayed in the statistics tab.
Figure 7: Combined Graph Pie and Bar-Charts
Figure 8: Sequence distribution by GO level: Pie Chart
Figure 9: Sequence Distribution/GO as Multilevel-Pie (#score or #seq cutoff)
Figure 10: Top 50 GO terms
Figure 11: Sequence distribution by GO level: Bar Chart
A WordCloud is a visual representation for a list of labels. The importance of words, here GO terms, is represented by its font size. The font size depends on either the sequence count or the NodeScore of each GO term. The list of words can be limited to a specific Gene Ontology category (BP, CC or MF). The coloring is random. Several options to change the graphical appearance are available like the number of words, the orientation and shape of the cloud as well as the color scheme.
Figure 12: Convert Graph to Word Cloud
The TreeMap viewer allows visualizing graphs (hierarchical, tree-structured data in general) as a set of nested rectangles. Each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. The size of each rectangle represents the number of sequences associated with a given GO term or a GO's NodeScore.
Figure 13: A TreeMap representing a Gene Ontology Graph.
The size of the rectangles represents the number of sequences or the NodeScore of each GO term.
Coloured GO Graphs from a text file
We can generate a GO graph from a text (.txt) file which contains a list of GOs and the desired colour for each of them. It is also possible to label groups of GOs with the same name. Figure 15 shows an example that was created introducing the following text file:
The text file has to follow a simple structure, to be processed correctly. It may contain from 2 to 3 columns in each line. The first column has to contain a GO, the second a number (0.0 to
According to the example above Group B has two GO IDs that contain different values. It is also possible to differentiate these GO IDs by colouring according to their values. In order to colour the octagon according to the value, you should select the gradient colour in the next page on the colour graph configuration window (see figure 16).
Figure 14: Colour Configuration Window
Figure 15: Coloured GO Graph by Group
Figure 16: Coloured GO Graph by Group value
Figure 17: Select Colour to differentiate values within the same group.
Make GO Graph
The "Make GO Graph'' function allows visualizing any set of GO terms/Ids.
Figure 18: Make GO Graph
Figure 19: Make GO ID Graph