A curious case of variant calling errors

A curious case of variant calling errors

Variant miscalling demonstration

1 result • updated 2.5 years ago by Istvan Albert • needs authorization

Variant miscalling demonstration

This recipe is built to demonstrate an interesting case where a two-base deletion in a certain region of the Ebola genome can lead to variant callers systematically miscalling that region.

This recipe requires the bio package, install it with pip install bio --upgrade

The recipe proceeds as follows:

  1. downloads the ebola genome corresponding to accession number AF086833
  2. extracts the first 3000 basepairs of it and call that the REFERENCE.fa
  3. applies mutations and calls the resulting file GENOME.fa
  4. generates simulated data from GENOME.fa and aligns it against REFERENCE.fa
  5. also aligns GENOME.fa against REFERENCE.fa as a "whole genome alignment"
  6. performs snp calling with bcftools and freebayes

The region around tother variations is prone to so called "misalignment" and that often produces errors that substantially affect snp calling accuracy. The mutations that are present in the data are the following:


Here is an example, bcftools generates three different variants in the location, freebayes just one. (the quality of the calls could be filtered further!) enter image description here

Doesn't always happen, but either snp caller of the two may indicate additional variants at position 1998 (you may need to run the recipe multiple times).

Copy recipe
You need write access to the project to edit.
You need write access to the original recipe to edit.
Recipe Interface Builder
Click the buttons on the right to create new fields.
Interface Editor
Edit the content of each interface element.
You need write access to the original recipe to edit.
Edit Recipe
Recipe display name
Unique identifier for the recipe.
Image :
Optional image for the recipe ( 500px Maximum ).
Used to order recipes (optional).

Insert Image


Powered by the release 2.3.6