Variant miscalling demonstration
This recipe is built to demonstrate an interesting case where a two-base deletion in a certain region of the Ebola genome can lead to variant callers systematically miscalling that region.
This recipe requires the bio package, install it with pip install bio --upgrade
The recipe proceeds as follows:
AF086833
REFERENCE.fa
GENOME.fa
GENOME.fa
and aligns it against REFERENCE.fa
GENOME.fa
against REFERENCE.fa
as a "whole genome alignment" bcftools
and freebayes
The region around tother variations is prone to so called "misalignment" and that often produces errors that substantially affect snp calling accuracy. The mutations that are present in the data are the following:
AGGGTGGACAACAGAAGAACA
|||||||||||||||-|||--
AGGGTGGACAACAGA-GAA--
Here is an example, bcftools generates three different variants in the location, freebayes just one. (the quality of the calls could be filtered further!)
Doesn't always happen, but either snp caller of the two may indicate additional variants at position 1998 (you may need to run the recipe multiple times).