Variant Calling using Freebayes


Freebayes is a variant calling tool characterized by its capability to use it with polyploid genomes.

This algorithm is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.

Run Freebayes for Variant Calling

Freebayes can be found under Genetic Variation → Variant Calling → Freebayes. The wizard consists of 3 pages and allows to define the input and output options as well as the analysis parameters (Figure 2, Figure 3 and Figure 4).


  • BAM files: alignment files in BAM format. To obtain them, you must align FASTQ files using a DNA-Seq Alignment Strategy, like BWA (highly recommended) or Bowtie 2.

  • Reference Genome: FASTA file with the reference genome.

  • Group Experiment File (optionally): tab-delimited file with sample names in one column and population names in another. If this file is added, the population-based bayesian inference model will then be partitioned on the basis of the populations.


In this page, the parameters for Freebayes can be set.

  • Ploidy: sets the species ploidy for the analysis.

  • Minimum Mapping Quality: exclude alignments for the analysis if they have less than this value of mapping quality.

  • Minimum Base Quality: exclude alleles for the analysis if they have less than this value of base quality.

  • Minimum Allele Quality Sum: exclude alleles for the analysis if the sum of the base quality of the supporting observations is lower than this value.

  • Minimum Allele Mapping Quality Sum: exclude alleles for the analysis if the sum of the mapping quality of the corresponding alignments of the supporting observations is lower than this value.

  • Mismatch Base Quality: base quality to call a mismatch.

  • Minimum Alternate Fraction: require at least this fraction of observations supporting an alternate allele within a single individual in order to evaluate the position.

  • Minimum Alternate Count: the same as before but in absolute numbers.

  • Minimum Coverage: coverage needed to process a site.

  • Maximum Coverage: downsample per-sample coverage to this level if it is greater than this coverage.

  • Use Mapping Quality: use mapping quality of alleles when calculating data likelihoods.

  • P-value: report sites if the probability that there is a polymorphism at the site is greater than N. Note that post-filtering is generally recommended over the use of this parameter.


  • Set Name for VCF: VCF filename.

  • Directory to Save the VCF: directory to save the VCF file.


Variant Calling has the following outputs:

  • VCF file with all the found variants.

  • Report with summary details:

    • Information about the resulting VCF: information about the types of variant found and the number of alleles per variant.

    • Adjusted parameters: as you might want to repeat the variant calling with other parameters, it is important to keep this table for reproducibility.

Just in case you repeat the Variant Calling Analysis with BCFtools, please keep in mind that Freebayes is able to separate MNPs from SNPs, although BCFtools is not able to do it, and MNPs are registered as different SNPs. Nevertheless, it is no of great importance.

  • Distribution charts of different quality variables found in the VCF. This charts might be important to know how to filter the VCF subsequently:

  1. Depth Histogram: In Freebayes, Depth (the DP field) means the total read depth at the locus, that is to say, the number of times that site was read, but not necessarily that variant (also the reference nucleotide, or other variants).

2. Proportion 'Quality/Depth' Histogram: the quality column of VCF files is the Phred-scaled probability that the site has no variant. Nevertheless, it is better to rely on this quality normalized by depth.

3. Mapping Quality in Alternate Alleles Histogram: the MQ value in the info field of a BCFtools VCF file relates to the average of all mapping qualities of the reads supporting the variant.