Bioinformatics Recipe Cookbook

Recipe Description

Perform data quality control on FASTQ data

This recipe serves as the introductory recipe for the course and this entire site.

The recipe code will demonstrate the following:

  1. Downloading sequencing data from SRA
  2. Generating a FASTQC report on this data
  3. Trimming Illumina adapters from the dataset in paired-end mode
  4. Generating a FASTQC report on the trimmed data

Lectures

A detailed presentation that explains the steps and rationale of the site and this recipe can be read at:

Please refer to the lectures above for links to the chapters that cover each concept.

Recipe Code | Recipe Description

Download Recipe

# This recipe downloads sequencing data from SRA
# then performs quality filtering and adapter trimming.

# This is how the recipe gets the SRR 
# variable filled via the website.
SRA=SRR519926

# Stop the script on errors.
set -ue

# How many sequences to unpack.
N=10000

# Create directory to store the reads in.
mkdir -p reads

# Download 1000 reads from SRA.
fastq-dump --split-files -X $N -O reads  $SRA

# Make a directory for the fastqc reports
mkdir -p reports

# Run the fastqc report on the sra reads.
fastqc reads/*.fastq -o reports

# Create the adapter sequence used for trimming.
echo ">illumina" > adapter.fa
echo "AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC" >> adapter.fa

# Run trimmomatic on the data.
trimmomatic PE reads/${SRA}_1.fastq reads/${SRA}_2.fastq -baseout reads/${SRA}.trimmed.fq  ILLUMINACLIP:adapter.fa:2:30:5 SLIDINGWINDOW:4:20

# Run trimmomatic on the trimmed data.
fastqc reads/*.fq -o reports

# Delete the fastqc zip files to reduce clutter.
rm -f reports/*.zip

Powered by the release 1.4