The Microbiome Analysis feature in OmicsBox allows describing the bacterial diversity between two different soda lakes (Salina Preta and Salina Verde) thanks to the Taxonomic Classification Workflow, and also enables the identification of the functional genetic potential of these microbial communities.
In order to generate the metagenomic dataset, first it was necessary to follow some previous steps using OmicsBox: FastQC and Preprocessing.
The metagenomic analysis was done using a dataset that consists of 12 single-end metagenomic samples. Those samples were collected in two different soda lakes: Salina Verde and Salina Preta. There are three replicates for each lake, taken at two different times, morning and afternoon. The metagenomic libraries were prepared using Nextera XT DNA Sample Preparation kit and sequenced using the Illumina MiSeq platform and the MiSeq Reagent Kit V3.
Time of sampling
Morning (10 AM)
PMB1, PMB2, PMB3
Afternoon (3 PM)
PAB1, PAB2, PAB3
Morning (10 AM)
VMB1, VMB2, VMB3
Afternoon (3 PM)
VAB1, VAB2, VAB3
Andreote, A. P., Dini-Andreote, F., Rigonato, J., Machineski, G. S., Souza, B. C., Barbiero, L., ... & Fiore, M. F. (2018). Contrasting the genetic patterns of microbial communities in soda lakes with and without cyanobacterial bloom. Frontiers in microbiology, 9, 244.
Soda lakes have high levels of sodium carbonates and are characterized by salinity and elevated pH. These ecosystems are found across Africa, Europe, Asia, Australia, North, Central, and South America. Particularly in Brazil, the Pantanal region has a series of hundreds of shallow soda lakes (ca. 600) potentially colonized by a diverse haloalkaliphilic microbial community. Biological information of these systems is still elusive, in particular data on the description of the main taxa involved in the biogeochemical cycling of life-important elements. Here, we used metagenomic sequencing to contrast the composition and functional patterns of the microbial communities of two distinct soda lakes from the sub-region Nhecolândia, state of Mato Grosso do Sul, Brazil. These two lakes differ by permanent cyanobacterial blooms (Salina Verde, green-water lake) and by no record of cyanobacterial blooms (Salina Preta, black-water lake). The dominant bacterial species in the Salina Verde bloom was Anabaenopsis elenkinii. This cyanobacterium altered local abiotic parameters such as pH, turbidity, and dissolved oxygen and consequently the overall structure of the microbial community. In Salina Preta, the microbial community had a more structured taxonomic profile. Therefore, the distribution of metabolic functions in Salina Preta community encompassed a large number of taxa, whereas, in Salina Verde, the functional potential was restrained across a specific set of taxa. Distinct signatures in the abundance of genes associated with the cycling of carbon, nitrogen, and sulfur were found. Interestingly, genes linked to arsenic resistance metabolism were present at higher abundance in Salina Verde and they were associated with the cyanobacterial bloom. Collectively, this study advances fundamental knowledge on the composition and genetic potential of microbial communities inhabiting tropical soda lakes.
The Data was downloaded from the MG-RAST server: mgp10309
Metagenomic Assembly (MEGAHIT)
Processed Illumina sequencing data in FASTQ format:
Sequencing Data: single
Minimum Multiplicity: 2
K-mer Sizes: 29,39,59,79,99,119,141
No Mercy K-mers: false
Bubble Level: high
Bubble Merge Level L: 20
Bubble Merge Level S: 0.95
Prune Level: high
Prune Depth: 2
Low Local Ratio: 0.2
Max Tip Length: 2
Disable Local Assembly: false
It varies with the number of reads per sample between 3-6 minutes aproximately.
Multifasta files with the assembled contigs for each sample group.
Statistical reports for each assembly.
2- Gene Finding
Gene Finding (FragGeneScan)
Assemblies made in the previous step:
Type of Data: Complete Genomic Sequences.
Model for Input Data: Complete genomic sequences or short sequence reads without sequencing error.
One minute each fasta file.
Multifasta files with the nucleotide sequence for each predicted gene.
Multifasta files with the amino acid sequence for each predicted gene.
Reports with the number and length of the predicted genes.
3- Functional Annotation
Functional Annotation (PfamScan and EggNOG Mapper)
Functional Annotation can also be done using the Blast2GO methodology that can be found also in this manual.
Multifasta files with amino acid sequences predicted in the previous step:
Target Orthologs: All
GO Evidence: Non-Electronic
10 minutes each sample group
30-35 minutes each sample group
Table that summarizes all PfamScan annotations (Type of motif, HMM information and GO information).
Table that summarizes all annotations that could be transferred with EggNOG Mapper (EggNOG description, GO information and KEGG information).
Report with general information and distribution of different types of motifs.
Report with general information and distribution of different COG categories and Orthologous groups.
More information can be found in this review.