Taxonomic Classification with Kraken 2


The Microbiome Analysis feature in OmicsBox allows describing the bacterial diversity between two different soda lakes (Salina Preta and Salina Verde) thanks to the Taxonomic Classification Workflow, and also enables the identification of the functional genetic potential of these microbial communities.

In order to generate the metagenomic dataset, first it was necessary to follow some previous steps using OmicsBox: FastQC and Preprocessing.

Dataset description

The metagenomic analysis was done using a dataset that consists of 12 single-end metagenomic samples. Those samples were collected in two different soda lakes: Salina Verde and Salina Preta. There are three replicates for each lake, taken at two different times, morning and afternoon. The metagenomic libraries were prepared using Nextera XT DNA Sample Preparation kit and sequenced using the Illumina MiSeq platform and the MiSeq Reagent Kit V3.


Time of sampling


Sample names


Morning (10 AM)




Afternoon (3 PM)




Morning (10 AM)




Afternoon (3 PM)




Andreote, A. P., Dini-Andreote, F., Rigonato, J., Machineski, G. S., Souza, B. C., Barbiero, L., ... & Fiore, M. F. (2018). Contrasting the genetic patterns of microbial communities in soda lakes with and without cyanobacterial bloom. Frontiers in microbiology, 9, 244.


Soda lakes have high levels of sodium carbonates and are characterized by salinity and elevated pH. These ecosystems are found across Africa, Europe, Asia, Australia, North, Central, and South America. Particularly in Brazil, the Pantanal region has a series of hundreds of shallow soda lakes (ca. 600) potentially colonized by a diverse haloalkaliphilic microbial community. Biological information of these systems is still elusive, in particular data on the description of the main taxa involved in the biogeochemical cycling of life-important elements. Here, we used metagenomic sequencing to contrast the composition and functional patterns of the microbial communities of two distinct soda lakes from the sub-region Nhecolândia, state of Mato Grosso do Sul, Brazil. These two lakes differ by permanent cyanobacterial blooms (Salina Verde, green-water lake) and by no record of cyanobacterial blooms (Salina Preta, black-water lake). The dominant bacterial species in the Salina Verde bloom was Anabaenopsis elenkinii. This cyanobacterium altered local abiotic parameters such as pH, turbidity, and dissolved oxygen and consequently the overall structure of the microbial community. In Salina Preta, the microbial community had a more structured taxonomic profile. Therefore, the distribution of metabolic functions in Salina Preta community encompassed a large number of taxa, whereas, in Salina Verde, the functional potential was restrained across a specific set of taxa. Distinct signatures in the abundance of genes associated with the cycling of carbon, nitrogen, and sulfur were found. Interestingly, genes linked to arsenic resistance metabolism were present at higher abundance in Salina Verde and they were associated with the cyanobacterial bloom. Collectively, this study advances fundamental knowledge on the composition and genetic potential of microbial communities inhabiting tropical soda lakes.

Original Data

The Data was downloaded from the MG-RAST server: mgp10309

Bioinformatic Analysis


Taxonomic Classification (Kraken 2)


Processed Illumina sequencing data in FASTQ format:

  • f.PAB1.fastq.gz

  • f.PAB2.fastq.gz

  • f.PAB3.fastq.gz

  • f.PMB1.fastq.gz

  • f.PMB2.fastq.gz

  • f.PMB3.fastq.gz

  • f.VAB1.fastq.gz

  • f.VAB2.fastq.gz

  • f.VAB3.fastq.gz

  • f.VMB1.fastq.gz

  • f.VMB2.fastq.gz

  • f.VMB3.fastq.gz


Enable filter: true

Kraken Confidence Filter: 0.05

Minimum Hit Groups: 2

Execution Time

12 minutes each sample group (PAB, PMB, VAB and VMB)


  • Table with information of different taxa that has been found and the number of hits of the reads in each FASTQ file in each taxonomic group.

  • With these table, a PCoA and a Stacked Bar Chart can also be done in order to know how well different groups separate and what taxonomic groups compose each sample.


More information can be found in this review.