Recipe View

This recipe follows the variant calling process in the Biostar Handbook.

1 result • updated 5.6 years ago by Istvan Albert

This recipe follows the variant calling process in the Biostar Handbook.

With default parameters the recipe obtains the reference genome AF086833 (the Ebola Mayinga strain of 1976) and will align to it the sequencing data obtained from the 2014 outbreak deposited as SRA id SRR1972739

The recipe produces an alignment and a variant call files for the sequencing run. The results can be viewed relative to the Gene Feature file.

Lectures

* coming soon

Copy recipe

You need write access to the project to edit.

# Stop on errors. Show all commands.
set -uxeo pipefail

# Set the reference genome accession number.
ACC=AF086833

# Set the SRR run number.
SRR=SRR1972739

# Directory for various reference information.
mkdir -p refs

# The reference genome in GENBANK format.
GB=refs/$ACC.gb

# The reference genome in FASTA format.
REF=refs/$ACC.fa

# The reference genome in GFF (Gene Feature Format).
GFF=refs/$ACC.gff

# Get the reference sequence in GenBank format.
efetch -db=nuccore -format=gb -id=$ACC > $GB

# Reformat GenBank to FASTA.
cat $GB | readseq -p -f fasta > $REF 2>> log.txt

# Reformat GenBank to GFF. Keep only lines that match CDS.
cat $GB | readseq -p -f gff 2>> log.txt | grep CDS > $GFF 

# Index reference for the bwa aligner.
bwa index $REF 2>> log.txt

# Index the reference genome so that it can be loaded into IGV.
samtools faidx $REF

# The directory that stores the sequencing reads
mkdir -p reads

# Get sequencing data from a SRR number.
fastq-dump -X 10000 --split-files -O reads $SRR

# Files names that store read pairs
R1=reads/${SRR}_1.fastq
R2=reads/${SRR}_2.fastq

# The alignment file name.
BAM=$SRR.bam

# The variant file name.
VCF=$SRR.vcf

# Align and generate a sorted BAM file.
bwa mem $REF $R1 $R2 2>> log.txt | samtools sort > $BAM

# Index the BAM file.
samtools index $BAM

# Call snps with bcftools
bcftools mpileup -Ou -f $REF $BAM 2>> log.txt | bcftools call -mv -Ov -o $VCF 2>> log.txt

You need write access to the original recipe to edit.

Click the buttons on the right to create new fields.

Add text field Add float field Add data field Add checkbox Add dropdown Add upload field Add integer field Add radio button

Edit the content of each interface element.

[acc]
label = "Accession number for the reference genome"
display = "TEXTBOX"
value = "AF086833"
help = "Must be an NCBI accession number"

[srr]
label = "SRA run number"
display = "TEXTBOX"
value = "SRR1972739"
help = "Must be an SRR run id number"

[settings]
name = "Simple Variant Calling"
template = "Simple_Variant_Calling_cookbook_4.sh"
image = "Simple_Variant_Calling_cookbook_4.png"
id = 4
recipe_uid = "2abba317"
uid = "2abba317"
help = "This recipe follows the variant calling process in the Biostar Handbook.\n\nWith default parameters the recipe obtains the reference genome `AF086833` (the Ebola Mayinga strain of 1976) and will align to it the sequencing data obtained from the 2014 outbreak deposited as SRA id `SRR1972739`\n\nThe recipe produces an alignment and a variant call files for the sequencing run. The results can be viewed relative to the Gene Feature file.\n\n#### Lectures\n\n*** coming soon **"
url = "http://localhost8000"

You need write access to the original recipe to edit.

Name

Recipe display name

Identifier

Unique identifier for the recipe.

Image :

Optional image for the recipe ( 500px Maximum ).

Rank:

Used to order recipes (optional).

Insert Image

From the web

From your computer

Cancel

Back