Visualizing large scale genomic variations in short read data.
A large large-scale variation (over 50bp) typically does not fit into the sequence portion of a short read.
Paired-end read sequencing is a methodology that allows us to identify where the reorganization took place. Pairs in the unexpected orientation, template sizes of unexpected lengths together can provide the necessary guidance to identify the "reorganization" junction points relative to the reference genome.
This recipe provides code that:
You may manually edit the file specified as $GENOME
to modify certain parts. You may also programmatically do so with the commands like:
Large deletion. Deletion applied from 2000-3000
cat $REF | seqret --filter -sbegin 1 -send 2000 > part1
cat $REF | seqret --filter -sbegin 3000 > part2
cat part1 part2 | union -filter > $GENOME
Copy number variation. The first 2000 bases are present three times.
cat $REF | seqret --filter -sbegin 1 -send 2000 > part1
cat part1 part1 $REF | union -filter > GENOME.fa
Swap regions in the genome. The first 5000 bp are moved to the end.
cat $REF | seqret --filter -send 5000 > part1
cat $REF | seqret --filter -sbegin 5000 > part2
cat part2 part1 | union -filter > GENOME.fa
Reverse complement a region of the genome (1000 to 2000).
cat $REF | seqret --filter -sbegin 1 -send 1000 > part1
cat $REF | seqret --filter -sbegin 1000 -send 2000 -sreverse1 > part2
cat $REF | seqret --filter -sbegin 2000 > part3
For more information see: