Fisher's Exact Test
OmicsBox has integrated the FatiGO package for statistical assessment of annotation differences between 2 sets of sequences. This package uses Fisher's Exact Test and corrects for multiple testing. For this analysis, the completion (but not exclusively) of the involved sequences with their annotations must be loaded in the application. This can either be the result of a OmicsBox annotation or the imported annotation by file (.annot), see Gene Ontology Annotation of this manual.
This functionality can be found under Functional Analysis → Enrichment Analysis → Enrichment Analysis (Fisher's Exact Test). A dialog screen appears (see image below). Test and Reference Sequences can be selected by uploading text files or ID-List .box files containing the lists of sequence IDs for the 2 groups. When there is no reference set selected, the whole dataset present in the project will be taken as reference. A detailed description of each parameter is available by clicking the help icon next to the parameter.
Figure 1: Run Fisher's Exact Test Wizard Dialog
Click on the Run button to start the analysis. It may take a while depending on the number of annotations.
Once completed the results table will be shown in a new tab (see image below), where the adjusted p-values of each annotation above a given threshold will be shown. The main columns are:
|Corrected p-value by False Discovery Rate control according to Benjamini-Hochberg.||p-Value without multiple testing corrections|
For further details please refer to the FatiGO publication (Al-Shahrour, F., Díaz-Uriarte, R., and Dopazo, J. (2004). Fatigo: a web tool for finding significant associations of gene ontology terms with groups of genes. Bioinformatics, 20(4):578–580).
Figure 2: Enrichment Results Table
Using the context menu of each row It is possible to get more details about the annotation and also create an ID-List with the sequences annotated in the Test-Set or the Reference-Set.
- #Test is the number of sequences that are annotated with the GO and are in the test set.
- #NotAnnotTest is the number of sequences that are not annotated with that GO, that is in the test set.
Adding these two numbers it gives the total amount of sequences that are annotated at all in your test set e.g. GO:0061135: 9 + 52 = 61
In the sidebar there are located all possible action that can be performed for this enrichment result, including two options for the visual display of the results:
- Make Enriched Graph (only for GO annotations): use this option to generate a representation on the GO DAG (see image below). Nodes are color-highlighted proportionally to their significance value. The user can choose which type of calculated p-value to use for highlighting and the threshold for filtering out nodes. Additionally, the Filter intermediate the checkbox will hide non-enriched nodes. More options are available in the graph viewer's sidebar. Gene Ontology Graphs of this manual gives further information on the graphical functions in OmicsBox.
Figure 3: Enriched Graph
2. Show Bar Chart: this option generates a bar display of the percentages of sequences at both, test and reference set, for each annotation of the table (see image below).
Figure 4: Enriched Bar Chart
3. Reduce to Most Specific (only for GO annotations): use this option to remove more general GO terms from the results and get only the most specific terms (with the lowest level in the GO DAG).
Additionally, like many others results in OmicsBox, It is possible to display the enrichment results in two different ways: the Treemap representation to compare the most enriched annotations by their size and the WordCloud representation to summarise relevant annotations in a fashionable way.