Variant miscalling demonstration
This recipe is built to demonstrate an interesting case where a two-base deletion in a certain region of the Ebola genome can lead to variant callers systematically miscalling that region.
This recipe requires the bio package, install it with
pip install bio --upgrade
The recipe proceeds as follows:
GENOME.faand aligns it against
REFERENCE.fasaves that as
REFERENCE.faas two genomes and saves that as
The region around the deletion is prone to so called "misalignment" and that often produces errors that substantially affect snp calling accuracy.
Here is an example, bcftools generates three different variants in the location, freebayes just one. (the quality of the calls could be filtered further!)
Doesn't always happen, but either snp caller of the two may indicate additional variants at position 1998 (you may need to run the recipe multiple times).