Gene Set Enrichment Analysis (GSEA)
OmicsBox includes the GSEA computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states. GSEA considers experiments with genome-wide expression profiles from samples belonging to two classes, labelled 1 or 2. Genes are ranked based on the correlation between their expression and the class distinction by using any suitable metric. Given an a priori defined set of genes S (e.g., genes encoding products in a metabolic pathway, located in the same cytogenetic band, or sharing the same GO category), the goal of GSEA is to determine whether the members of S are randomly distributed throughout a ranked list of genes (L) or primarily found at extrems.
If there is no association, genes in S will be uniformely distributed throghout L: that is the null hypothesis of GSEA. If there is association, genes in S will accumulate at the top or at the bottom of L. The magnitude of the association will be measured by the Enrichment Score statistic (ES).
For further details please refer to the GSEA publication: Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S., and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences, 102(43):15545– 15550.
For this analysis, the completion (but not exclusively) of the involved sequences with their annotations must be loaded in the application. This can either be the result of an OmicsBox annotation or the imported annotation by file (.annot), see Gene Ontology Annotation of this manual.
This functionality can be found under Functional Analysis → Enrichment Analysis → Gene Set Enrichment Analysis (GSEA). A dialog screen appears (see image below). A detailed description of each parameter is available by clicking the help icon next to the parameter.
Figure 1: GSEA Dialog
- Rank file. Ranked list of genes can be selected by uploading text files or ID-Value-List .box/.b2g files containing the lists of sequence IDs and a statistical value for each one.
- Number of permutations. Number of gene set permutations to assess the statistical significance of Enrichment Score.
- Enrichment Statistic. Each time GSEA encounters a gene in S, a running-sum statistic increases, and decreases if gene is not in S. Enrichment Score (ES) will be 0 if genes in S are randomly distributed throghout L: ES represents the maximum deviation for a random distribution. This option change the way in which ES is calculated (see GSEA paper).
- Detailed Results. Set the number of GO terms to get further details.
Click on the Run button to start the analysis. It may take a while depending on the number of permutations selected.
Once completed the results table will be shown in a new tab (see image below), where the adjusted p-values of each annotation above a given threshold will be shown. The main columns are:
|Reflects the degree to which a gene set is overrepresented at the top or bottom of a ranked list of genes||By normalizing the enrichment score, GSEA accounts for differences in gene set size and in correlations between gene sets and the expression dataset||The estimated probability that a gene set with a given NES represents a false positive finding||Estimates the statistical significance of the enrichment score for a single gene set|
For further details please refer to the GSEA User Guide.
Figure 2: GSEA result table
Using the context menu of the rows tagged with the Details tag It is possible to get more details about the GO term, including the enrichment statistics, and also create an ID-List with the core enrichment sequences for each GO term.
In the sidebar there are located all possible action that can be performed for this enrichment result, including two options for the visual display of the results:
- Make Enriched Graph: use this option to generate a representation on the GO DAG (see image below). Nodes are color-highlighted proportionally to their significance value. The user can choose which type of calculated p-value to use for highlighting and the threshold for filtering out nodes.
Figure 3: Enriched Graph
2. NES vs Significance Chart: this option generates a plot of p-values versus normalized enrichment scores, which provides a quick, visual way to grasp the number of enriched gene sets that are significant (see image below).
Figure 4: NES vs Significance Chart
3. ES Histogram Chart: this option generates a histogram of enrichment scores across gene sets, which provides a quick, visual way to grasp the number of enriched gene sets. (see image below).
Figure 5: ES Histogram Chart
4. Reduce to Most Specific: use this option to remove more general GO terms from the results and get only the most specific terms (with the lowest level in the GO DAG).
Additionally, like many others results in OmicsBox, It is possible to display the enrichment results in two different ways: the Treemap representation to compare the most enriched GO terms by their size and the WordCloud representation to summarise relevant GO terms in a fashionable way.