Recipe View

The recipe demonstrates the use of different short read aligners.

2 results • updated 4.0 years ago by Bioinformatics Admin

The recipe demonstrates the use of different short read aligners.

There substantially many more differences between aligners beyond their "performance" or so-called "accuracy". Equally important is how we can customize the alignments and what additional information is produced in each of the alignments.

The recipe generates SAM alignment files with different algorithms bwa, bowtie2 in both single end and paired-end mode.

Downloads the 1976 Ebola - Mayinga Reference Genome
Downloads paired-end sequencing data for the 2014 outbreak
Generates single-end sequence alignments with bwa and bowtie2
Generates paired-end sequence alignments with bwa and bowtie2
Runs samtools flagstat on each of the SAM files.

Readers are invited to investiage the SAM files to identify the differences between aligner outputs.

Lecture

A detailed presentation that explains the steps and rationale for this recipe can be found at:

Lecture for Recipe 4: Short Read Aligners

Please refer to the lecture above for study materials and additional content.

Copy recipe

You need write access to the project to edit.

#
# In this script some of the tools are overly verbose and print lots of useless information
# as they run. This can be quite annoying. In those cases we redirect and append the stdout and 
# stderr into the log.txt file  using:  1>>log.txt 2>>log.txt
# We keep the output in case we need to troubleshoot.
#

# Stop on errors.
set -uex

# UNIX trick: Adding the -x flag will also show the commands as these are executed.
# This can help you track down the errors as these happen.

# There will be many ancillary files.
# Better keep them out of sight.
mkdir -p db

#
# When run from command line use the second lines to set the parameters.
#
# Reference genome accession number.
ACC=AF086833

# The SRR number for the sequencing data.
SRR=SRR1972739

# This will be the reference genome we align against.
REF=db/${ACC}.fa

# It is best to build all variables at the top ot the program.
# That way you see how these are constructed.
READ1=${SRR}_1.fastq
READ2=${SRR}_2.fastq

# Get the reference genome. AF086833  - Mayinga Ebola strain from 1976
efetch -db=nuccore -format=fasta -id=${ACC} > $REF

# Obtain the sequencing data from SRA.
fastq-dump -X 10000 --split-files $SRR 1>> log.txt 2>> log.txt

# Build a bowtie2 index. This only needs to be done once per reference file.
bowtie2-build $REF $REF 1>> log.txt 2>> log.txt

# Build a bwa index. This only needs to be done once per reference file.
bwa index $REF  1>> log.txt 2>> log.txt

# Run the bowtie2 aligner in single end mode.
bowtie2 -x $REF -U $READ1 > bowtie2.SE.sam 2>> log.txt
samtools index bowtie2.SE.sam

# Run the bwa aligner in single end mode.
bwa mem $REF $READ1 > bwa.SE.sam 2>> log.txt
samtools index bwa.SE.sam

# The flagstat tool produces a report on the alignment file..
echo "*** BOWTIE2 SE flagstat:"
samtools flagstat bowtie2.SE.sam

echo "*** BWA SE flagstat:"
samtools flagstat bwa.SE.sam

# Run the bowtie2 aligner in paired end mode.
bowtie2 -x $REF -1 $READ1 -2 $READ2 > bowtie2.PE.sam 2>> log.txt
samtools index bowtie2.PE.sam

# Run the bowtie2 aligner in paired end mode.
bwa mem $REF $READ1 $READ2 > bwa.PE.sam 2>> log.txt
samtools index bwa.PE.sam

echo "*** BOWTIE2 PE flagstat:"
samtools flagstat bowtie2.PE.sam

echo "*** BWA PE flagstat:"
samtools flagstat bwa.PE.sam

You need write access to the original recipe to edit.

Click the buttons on the right to create new fields.

Add text field Add float field Add data field Add checkbox Add dropdown Add upload field Add integer field Add radio button

Name

Recipe display name

Identifier

Unique identifier for the recipe.

Image :

Optional image for the recipe ( 500px Maximum ).

Rank:

Used to order recipes (optional).

The recipe demonstrates the use of different short read aligners.

There substantially many more differences between aligners beyond their "performance" or so-called "accuracy".  Equally important is how we can customize the alignments and what additional information is produced in each of the alignments.

The recipe generates SAM alignment files with different algorithms `bwa`, `bowtie2` in both single end and paired-end mode.

1. Downloads the 1976 Ebola - Mayinga Reference Genome
1. Downloads paired-end sequencing data for the 2014 outbreak
1. Generates single-end sequence alignments with `bwa` and `bowtie2`
1. Generates paired-end sequence alignments with `bwa` and `bowtie2`
1. Runs `samtools flagstat` on each of the SAM files.

Readers are invited to investiage the SAM files to identify the differences between aligner outputs.

#### Lecture

A detailed presentation that explains the steps and rationale for this recipe can be found at:

* [Lecture for Recipe 4: Short Read Aligners][short]

Please refer to the lecture above for study materials and additional content.

[short]:  https://www.biostarhandbook.com/edu/lecture/view/43/

Insert Image

From the web

From your computer

Cancel

Back

Short Read Alignment

Lecture

Insert Image