Recipe View

The recipe produces pairwise alignments with different software

3 results • updated 5.6 years ago by Istvan Albert

The recipe produces pairwise alignments with different software

"Aligning sequences" appears to be a well-defined concept - in reality, it is everything but. The differences in the way aligner operate as well as what they are able to report can be radically different. Adding to the complexity even the same tool may report more or less information, various formats depending on the customization that we apply. Sometimes, certain answers can only be obtained if one were to choose to run a very specific software in a specific way.

The recipe provides code that takes two sequences (by default the 1976 Ebola genome and the 2014 Ebola genome) and generates alignments with different methods between the two viral genomes.

Downloads both sequences from NCBI
Uses an optimal global sequence aligner
Uses a local aligner
Uses a semi-global aligner

In each case the aligner is asked to produce different outputs, hence ending up with several files for each method.

Lecture

A detailed presentation that explains the steps and rationale for this recipe can be found at:

Lecture for Recipe 3: Understanding Sequence Alignments

Please refer to the lecture above for study materials and additional content.

Copy recipe

You need write access to the project to edit.

# Stop script on any error.
set -ue

#
# This tool requires minimap2 to run.
# If you don't already have it install it with.
#
# conda install minimap2 -y
#

# Two sequence accession numbers
SEQ1=AF086833
SEQ2=KR105328

#
# The default values are comparing
# the Ebola strain from 1972 to one strain observed in 2014
#
# Ebola Virus: Mayinga 1972: AF086833
#SEQ1=AF086833

# Ebola Virus: Makona 2014: KR105328

# Download both sequences from NCBI and store them locally.
# You may want to name them $SEQ1.fa and $SEQ2.fa
# We will keep the names short here for visiblity.
efetch -db nuccore -format fasta -id $SEQ1 > 1.fa
efetch -db nuccore -format fasta -id $SEQ2 > 2.fa

#
# Typically you want to know what the scoring matrix is.
#
# Get the scoring matrix
#
wget -q -nc ftp://ftp.ncbi.nlm.nih.gov/blast/matrices/NUC.4.4

# Generate a global alignment in default (markx0) format.
stretcher -filter -asequence 1.fa -bsequence 2.fa -data NUC.4.4 > alignment.1.stretcher.markx0.txt

#
# The stretcher has the NUC4.4 matrix built into it under the special
# name of EDNAFULL. From now on we will use that.
#

# Generate alignment in the so called markx2 format.
stretcher -filter -asequence 1.fa -bsequence 2.fa -data EDNAFULL -aformat markx2  > alignment.2.stretcher.markx2.txt

# Generate alignment in the SAM format.
stretcher -filter -asequence 1.fa -bsequence 2.fa -data EDNAFULL -aformat sam > alignment.3.stretcher.sam.txt

# Generate a pariwise local alignment with blastn.
# It is surprisingly difficult to tell what scoring matrix
# blastn used. See blastn --help for information.
#

# The default output format of blasn.
blastn -subject 1.fa -query 2.fa > alignment.4.blast.default.txt

# Format alignments in tabular form.
blastn -subject 1.fa -query 2.fa -outfmt 7 > alignment.5.blast.outfmt7.txt

#
# The minimap aligner, this perform a SEMI-GLOBAL alignment.
#
# Run minimap2 configured to produces MD tags.
minimap2 -c --MD 1.fa 2.fa > alignment.6.minimap2.MD.txt

# Run minimap2 configured to produces CS tags.
minimap2 --cs 1.fa 2.fa > alignment.7.minimap2.cs.txt

#
# How to interpret the alignments in one glance.
# See lecture video.
#
# The 23rd column shows the CIGAR string.
echo "*** The alignment as a CIGAR string:"
cat alignment.6.minimap2.MD.txt | cut -f 22

# The 23rd column shows the MD tag.
echo "*** The alignment as an MD tag:"
cat alignment.6.minimap2.MD.txt | cut -f 23

# The 22nd column of the CS tag.
echo "*** The alignment as a CS tag:"
cat alignment.7.minimap2.cs.txt | cut -f 22

You need write access to the original recipe to edit.

Click the buttons on the right to create new fields.

Add text field Add float field Add data field Add checkbox Add dropdown Add upload field Add integer field Add radio button

Edit the content of each interface element.

[seq1]
label = "Accession number for the first sequence"
display = "TEXTBOX"
value = "AF086833"
help = "Must be an NCBI accession number"

[seq2]
label = "Accession number for the second sequence"
display = "TEXTBOX"
value = "KR105328"
help = "Must be an NCBI accession number"

[settings]
name = "Sequence Alignment"
template = "Sequence_Alignment_cookbook_2.sh"
image = "Sequence_Alignment_cookbook_2.png"
id = 2
recipe_uid = "align01"
uid = "align01"
help = "The recipe produces pairwise alignments with different software\n\n\"Aligning sequences\" appears to be a well-defined concept - in reality, it is everything but. The differences in the way aligner operate as well as what they are able to report can be radically different.  Adding to the complexity even the same tool may report more or less information, various formats depending on the customization that we apply. Sometimes, certain answers can only be obtained if one were to choose to run a very specific software in a specific way. \n\nThe recipe provides code that takes two sequences (by default the 1976 Ebola genome and the 2014 Ebola genome)  and generates alignments with different methods between the two viral genomes. \n\n1. Downloads both sequences from NCBI\n1. Uses an optimal global sequence aligner\n1. Uses a local aligner\n1. Uses a semi-global aligner \n\nIn each case the aligner is asked to produce different outputs, hence ending up with several files for each method.\n\n#### Lecture\n\nA detailed presentation that explains the steps and rationale for this recipe can be found at:\n\n* [Lecture for Recipe  3: Understanding Sequence Alignments][recipe3]\n\nPlease refer to the lecture above for study materials and additional content.\n\n[recipe3]: https://www.biostarhandbook.com/edu/lecture/view/42/"
url = "http://localhost8000"

You need write access to the original recipe to edit.

Name

Recipe display name

Identifier

Unique identifier for the recipe.

Image :

Optional image for the recipe ( 500px Maximum ).

Rank:

Used to order recipes (optional).

The recipe produces pairwise alignments with different software

"Aligning sequences" appears to be a well-defined concept - in reality, it is everything but. The differences in the way aligner operate as well as what they are able to report can be radically different.  Adding to the complexity even the same tool may report more or less information, various formats depending on the customization that we apply. Sometimes, certain answers can only be obtained if one were to choose to run a very specific software in a specific way.

The recipe provides code that takes two sequences (by default the 1976 Ebola genome and the 2014 Ebola genome)  and generates alignments with different methods between the two viral genomes.

1. Downloads both sequences from NCBI
1. Uses an optimal global sequence aligner
1. Uses a local aligner
1. Uses a semi-global aligner

In each case the aligner is asked to produce different outputs, hence ending up with several files for each method.

#### Lecture

A detailed presentation that explains the steps and rationale for this recipe can be found at:

* [Lecture for Recipe  3: Understanding Sequence Alignments][recipe3]

Please refer to the lecture above for study materials and additional content.

[recipe3]: https://www.biostarhandbook.com/edu/lecture/view/42/

Insert Image

From the web

From your computer

Cancel

Back