A curious case of variant calling errors

A curious case of variant calling errors

Variant miscalling demonstration

1 result • updated 10 months ago by Istvan Albert • needs authorization

Variant miscalling demonstration

This recipe is built to demonstrate an interesting case where a two-base deletion in a certain region of the Ebola genome can lead to variant callers systematically miscalling that region.

This recipe requires the bio package, install it with pip install bio --upgrade

The recipe proceeds as follows:

  1. downloads the ebola genome corresponding to accession number AF086833
  2. extracts the first 3000 basepairs of it and call that the REFERENCE.fa
  3. deletes two basepairs from the reference between positions 1998 and 2000 and calls that GENOME.fa
  4. generates simulated data from GENOME.fa and aligns it against REFERENCE.fa saves that as align.bam
  5. aligns GENOME.fa against REFERENCE.fa as two genomes and saves that as genome.bam
  6. perform snp calling with bcftools and freebayes

The region around the deletion is prone to so called "misalignment" and that often produces errors that substantially affect snp calling accuracy.

Here is an example, bcftools generates three different variants in the location, freebayes just one. (the quality of the calls could be filtered further!)

SNP calling accuracy

Doesn't always happen, but either snp caller of the two may indicate additional variants at position 1998 (you may need to run the recipe multiple times).

Copy recipe
You need write access to the project to edit.
You need write access to the original recipe to edit.
Recipe Interface Builder
Click the buttons on the right to create new fields.
Interface Editor
Edit the content of each interface element.
You need write access to the original recipe to edit.
Edit Recipe
Recipe display name
Unique identifier for the recipe.
Image :
Optional image for the recipe ( 500px Maximum ).
Used to order recipes (optional).

Insert Image


Powered by the release 2.3.3