The following is based on a homework assignment submitted for the Applied Bioinformatics course.
The recipe has been updated and edited for reproducibility but otherwise kept in its original form.
Data for the analysis came from the Whole genome sequencing project of SARS-CoV-2 conducted by the California Department of Public Health. The reference genome for Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1, complete genome was downloaded from NCBI.
To install GATK follow the instructions here:
For this recipe we assume GATK was downloaded and stored in
pip install bio --upgrade