Recipe View

Alignment Based RNA-Seq

RNA-Seq differential expression with alignments and using three different statistical methods.

1 result • updated 5.6 years ago by Istvan Albert

RNA-Seq differential expression with alignments and using three different statistical methods.

A common RNA-Seq data analysis approach is one where sequencing reads are aligned with a "splice" aware aligner.

In this recipe we run follow the concepts presented in chapter:

https://www.biostarhandbook.com/rnaseq/rnaseq-griffith-data.html

and produce differential expression results with three different statistical methods: deseq1, deseq2 and edgeR

Copy recipe

You need write access to the project to edit.

# Stop on errors. Print commands to standard output.
set -uex

# Download and unpack the data.
#
# We have already organized into the reads/refs folders.
#
URL=http://data.biostarhandbook.com/rnaseq/projects/griffith/griffith-data.tar.gz

# The file that collects the program outputs.
LOG=runlog.txt

# Download and unpack the data on the fly.
curl -s $URL | tar xz

#
# This script will download a subset of data for the Griffith Experiment.
#
# The archive contains sequenced reads of mRNA from biological samples
# mixed with a spike-in control (ERCC92) of known abundances.
#
# The archive also contains the genome for chromosome 22 and the known sequence
# of the spike-in controls.
#
#
# See: https://www.biostarhandbook.com/rnaseq/rnaseq-griffith-data.html
#
#

# Limit the reference genome to chromosome 22 of the human genome.
REF=refs/22.fa

# The root name for the indices.
IDX=refs/22.fa

# Annotations for chromosome 22 of human genome.
GTF=refs/22.gtf

# Build the indices for both the spike-in genome and chromosome 22.
hisat2-build $REF $IDX 1>>$LOG 2>> $LOG

# Build the genome indices for IGV.
samtools faidx $REF


# Output folder for bam files.
BAMDIR=xbam

# Create output folder for bam files.
mkdir -p $BAMDIR

# Note: the flag below is -1 (minus 1) not little L.
# This -1 flag lists the file names with no other information.

# Align against human genome.
ls -1 reads/*.fq | parallel -N 2 "hisat2 $IDX -1 {1} -2 {2} 2>> $LOG" '|' "samtools sort 2>>$LOG"  '>' "$BAMDIR/{1/.}.bam 2>> $LOG"

# Generate the index for each bam file.
ls -1 $BAMDIR/*bam | parallel "samtools index {} 2>> $LOG"

# Count features relative to the human annotations.
# This tool "decides" what type of counts are produced: gene level, transcript level
# The "gene_name" is an attribute set inside the GTF file.  You may use other attributes inside of GTF.
featureCounts -a $GTF -g gene_name -o counts_full.txt  $BAMDIR/U*.bam  $BAMDIR/H*.bam 1>> $LOG 2>> $LOG

# This directory will store the R script that analyze the counts.
mkdir -p code

#
# The reecipe will now run three different differential expression (DE) methods
# on the resulting counts table.
# For each analysis it will create a DE results table and a heatmap.
#

# Download each script into the code directory)
(cd code && curl -sO http://data.biostarhandbook.com/rnaseq/code/deseq1.r)
(cd code && curl -sO http://data.biostarhandbook.com/rnaseq/code/deseq2.r)
(cd code && curl -sO http://data.biostarhandbook.com/rnaseq/code/edger.r)
(cd code && curl -sO http://data.biostarhandbook.com/rnaseq/code/draw-heatmap.r)

# Simplify the counts file to keep only the count columns.
cat counts_full.txt | cut -f 1,7-12 > counts_simplified.txt

#
# Your R system needs to be set up to know about the tools below.
# This setup needs to be done once. See the chapter:
# https://www.biostarhandbook.com/rnaseq/rnaseq-bioconductor.html
#

# Run the DESeq1 method on the simplified count file.
cat counts_simplified.txt | Rscript code/deseq1.r 3x3 > counts_DE_deseq1.results.txt  2> $LOG
cat norm-matrix-deseq1.txt | Rscript code/draw-heatmap.r > counts_DE_deseq1.heatmap.pdf 2>> $LOG

# Run the DESeq2 method on the simplified count file.
cat counts_simplified.txt| Rscript code/deseq2.r 3x3 > counts_DE_deseq2.results.txt  2>> $LOG
cat norm-matrix-deseq2.txt | Rscript code/draw-heatmap.r > counts_DE_deseq2.heatmap.pdf 2>> $LOG

# Run the EdgeR method on the simplified count file.
cat counts_simplified.txt | Rscript code/edger.r 3x3 > counts_DE_edger.results.txt  2>> $LOG
cat norm-matrix-edgeR.txt | Rscript code/draw-heatmap.r > counts_DE_edger.heatmap.pdf 2>> $LOG

You need write access to the original recipe to edit.

Click the buttons on the right to create new fields.

Add text field Add float field Add data field Add checkbox Add dropdown Add upload field Add integer field Add radio button

Edit the content of each interface element.

[settings]
name = "Alignment Based RNA-Seq"
template = "Alignment_Based_RNA-Seq_cookbook_35.sh"
image = "Alignment_Based_RNA-Seq_cookbook_35.png"
id = 35
recipe_uid = "alignrnaseq"
uid = "alignrnaseq"
help = "RNA-Seq differential expression with alignments and using three different statistical methods.\n\nA common RNA-Seq data analysis approach is one where sequencing reads are aligned with a \"splice\" aware aligner.\n\nIn this recipe we run follow the concepts presented in chapter:\n\nhttps://www.biostarhandbook.com/rnaseq/rnaseq-griffith-data.html\n\nand produce differential expression results with three different statistical methods: `deseq1`, `deseq2` and `edgeR`"
url = "http://localhost8000"

You need write access to the original recipe to edit.

Name

Recipe display name

Identifier

Unique identifier for the recipe.

Image :

Optional image for the recipe ( 500px Maximum ).

Rank:

Used to order recipes (optional).

Insert Image

From the web

From your computer

Cancel

Back