Fisher's Exact Test
Fisher’s Exact Test can be used to find GO terms that are over and under-represented in a set of genes (test set) with respect to a reference group (reference set). This set of genes can be the differentially expressed genes of differential expression analysis, a set of genes related to a phenotype of interest, etc. Fisher’s Exact Test uses a contingency table-based method to examine the association between two kinds of classification.
When the proportion of genes annotated with a determined GO term in the test set is significantly higher than the proportion in the reference set, this GO term will be detected as over-represented, and otherwise, it will be declared under-represented.
OmicsBox has integrated the FatiGO package for statistical assessment of annotation differences between 2 sets of sequences. This package uses Fisher's Exact Test and corrects for multiple testing. For this analysis, the completion (but not exclusively) of the involved sequences with their annotations must be loaded in the application. This can either be the result of a OmicsBox annotation or the imported annotation by file (.annot), see Gene Ontology Annotation of this manual.
This functionality can be found under Functional Analysis → Enrichment Analysis → Enrichment Analysis (Fisher's Exact Test). A dialog screen appears (see image below). Test and Reference Sequences can be selected by uploading text files or ID-List .box files containing the lists of sequence IDs for the 2 groups. When there is no reference set selected, the whole dataset present in the project will be taken as reference. A detailed description of each parameter is available by clicking the help icon next to the parameter.
The Fisher's Exact Test implementation is sensitive in the direction of the test: the sequences that are present in the test-set and also in reference-set will be deleted from the reference, but not from the test-set.
For further details please refer to the FatiGO publication (Al-Shahrour, F., Díaz-Uriarte, R., and Dopazo, J. (2004). Fatigo: a web tool for finding significant associations of gene ontology terms with groups of genes. Bioinformatics, 20(4):578–580).
Figure 1: Run Fisher's Exact Test Wizard Dialog
- Test-set Files. ID-list with sequences belonging to the test-set.
- Reference-set Files. ID-list with sequences belonging to the reference-set.
- Filtering. Only IDs with a higher p-value or FDR than the filter value will be shown. Note that FDR is the corrected p-value for multiple testing, so it provides more information about the statistical significance than the raw p-value.
- Two-tailed test. This option allows us to test for over and under-representation: the test-set will be tested against the reference-set and vice versa.
- Annotations. You can select how gene sets are selected for the enrichment analysis: group genes by GO term, by Enzyme Code, etc.
Click on the Run button to start the analysis. It may take a while depending on the number of annotations.
Once completed the results table will be shown in a new tab (see image below), where the adjusted p-values of each annotation above a given threshold will be shown. The main columns are:
|It indicates if the GO term has been declared over or under-represented in the test-set.||Corrected p-value by False Discovery Rate control according to Benjamini-Hochberg.||Raw p-Value without multiple testing corrections.|
Figure 2: Enrichment Results Table
For further information about how p-values are adjusted by FDR according to Benjamini-Hochberg procedure please refer to the publication: Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1), 289-300.
Using the context menu of each row It is possible to get more details about the annotation and also create an ID-List with the sequences annotated in the Test-Set or the Reference-Set.
- #Test is the number of sequences that are annotated with the GO and are in the test set.
- #NotAnnotTest is the number of sequences that are not annotated with that GO, that is in the test set.
Adding these two numbers it gives the total amount of sequences that are annotated overall in your test set e.g. GO:0061135: 9 + 52 = 61
In the sidebar there are located all possible action that can be performed for this enrichment result, including two options for the visual display of the results:
- Make Enriched Graph (only for GO annotations): use this option to generate a representation on the GO DAG (see image below). Nodes are color-highlighted proportionally to their significance value. The user can choose which type of calculated p-value to use for highlighting and the threshold for filtering out nodes. Additionally, the Filter intermediate the checkbox will hide non-enriched nodes. More options are available in the graph viewer's sidebar. Gene Ontology Graphs of this manual gives further information on the graphical functions in OmicsBox.
Figure 3: Enriched Graph
2. Show Bar Chart: this option generates a bar display of the percentages of sequences at both, test and reference set, for each annotation of the table (see image below).
Figure 4: Enriched Bar Chart
3. Reduce to Most Specific (only for GO annotations): use this option to remove more general GO terms from the results and get only the most specific terms (with the lowest level in the GO DAG).
Additionally, like many others results in OmicsBox, It is possible to display the enrichment results in two different ways: the Treemap representation to compare the most enriched annotations by their size and the WordCloud representation to summarise relevant annotations in a fashionable way.