Bioinformatics Data Analysis

Recipe Description

This recipe demonstrates the use of the BOWTIE short read aligner to create a BAM file.

# Stop on errors.
set -uex

# Reference genome accession number.

# The SRR number for the sequencing data.

# How many reads to unpack

# The reference genome stored locally.

# The directory that store the reference.
mkdir -p refs

# Get the reference genome in FASTA format.
efetch -db nuccore -format fasta -id $ACC > $REF

# Build the bowtie2 index for the reference genome.
bowtie2-build $REF $REF  1>> log.txt 2>> log.txt

# Build IGV index for the reference genome.
samtools faidx $REF

# Obtain the FASTQ sequences for the SRR number.
fastq-dump -X $N --split-files $SRR  >> log.txt

# The name for the read pairs.

# Run the bowtie2 aligner. Creates a SAM file.
bowtie2 -x $REF -1 $R1 -2 $R2 > $SRR.sam 2>> log.txt

# Convert the SAM file to BAM format.
cat $SRR.sam | samtools sort > $SRR.bam

# Index the BAM file.
samtools index $SRR.bam

#Generate an alignment report.
samtools flagstat $SRR.bam > alignment-report.txt

