Functional Annotation of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins.
The protein sequences have been downloaded from NCBI.
Reference genome: Severe acute respiratory syndrome coronavirus 2 ASM985889v2
Blast Program: blastp-fast
Blast DB: Non-redundant protein sequences (nr v5)
Blast Expectation Value (e-Value): 1.0E-3
Number of Blast Hits: 20
Blast Description Annotator: True
Word Size: 6
Low Complexity Filter: True
HSP Length Cutoff: 33
HSP-Hit Coverage: 0
Filter By Description: No filter
Save XML results in a folder
Consumed 1174 CloudUnits during this execution.
Blast XML files in zip.
2- Cloud InterProScan
Retrieve protein family domains with InterProScan Annotation.
Save InterProScan XML files in a folder
Consumed 194 CloudUnits during this execution.
3- Blast2GO Mapping
Retrieve Gene Ontology terms using Gene Ontology Mapping.
Use latest database version: True
In this example, it was June 2021
The same project with Gene Ontology terms.
4- Blast2GO Annotation
Apply the Gene Ontology Annotation rule to all GO terms.
Annotation CutOff: 55
GO Weight: 5
Filter GO by Taxonomy: No Filter
HSP-Hit Coverage CutOff: 0
Hit Filter: 500
Only hits with GOs: False
Evidence Code Weights: Default Values
The same project with Gene Ontology terms that passed the Annotation rule CutOff.
5- EggNOG Annotation
Retrieve additional Gene Ontology terms from orthologs by running EggNOG.
Target Orthologs: All
GO Evidence: Non-Electronic
6- Merge EggNOG to Annotation
Merge the Gene Ontology terms retrieved from EggNog to existing Annotation.
Seed Ortholog E-Value Filter: 1E-3
Seed Ortholog Bit-Score Filter: 60
7- Merge InterProScan to Annotation
Merge the Gene Ontology terms retrieved from InterProScan to existing Annotation.
Open the project with Gene Ontology terms merged from EggNOG and InterProScan results.
If no InterProScan results are available in the project, it is possible to run or load the results from step 2- Cloud InterProScan
8- Functional Enrichment Analysis (Fisher’s Exact Test)
In this case, a comparative analysis between SARS-CoV and SARS-CoV-2 will be performed to see if there is a function that is specific to the SARS-CoV-2 strand.
The protein sequences of SARS-CoV have been downloaded from NCBI and analyzed with the above pipeline. Both annotated projects (SARS-CoV and SARS-CoV-2) have to be merged into a single project and a test set has to be generated. The test set for the Enrichment Analysis will be the identifiers from SARS-CoV-2.
It is possible to combine 2 projects in OmicsBox by adding the results to the other.
This has to be done in the file manager, by selecting both projects, right-clicking on the first project, and selecting Merge. All results have to be added and a new Merged project will open.
Create test set id list
It is possible to create an id list in OmicsBox from an annotated project.
The SARS-CoV-2 project has to be opened in OmcisBox. First, all sequences have to be marked with Ctrl + A (Windows and Linux) / Cmd + A (Mac) and then right-click on a sequence name to choose “Create ID List of Column: SeqName”. A new tab will open with the sequence identifiers in single columns. This list has to be saved and used as the test set for the Enrichment Analysis.
Open the Merged project in OmicsBox and this will be used as the reference.
Test-Set Files: SARS-CoV2_idlist.box
Reference-Set Files: false
Do Not Filter: false
Filter Value: 0.01
Filter Mode: P-VALUE
Two Tailed: false
Remove Double IDs: true
Annotations: GO IDs
GO Categories: biological_process,molecular_function,cellular_component
Project containing results of the functional enrichment analysis. In this case, only 1 Gene Ontology term is enriched which is specific for SARS-CoV-2 which is “host cell endosome”.