GitHub - spilornis-ng/Mitogenome_geneious: A short pipeline to recover mitogenomes from short-read WGS datasets

Introduction

A crude pipeline for mitogenome recovery from low-coverage WGS using reference nuclear+mitogenome or just mitogenome
This pipeline uses Geneious prime
You need trimmed reads for running this pipeline

Checklist of softwares required

While recovering mitogenome from whole genome resequencing data we need to address potential NUMTs in our dataset. To do this we ideally require both nuclear and mitochondrial reference genomes for species of our interest or the next closest sister taxa. However, in most cases reference genomes for non-model species are not available (although this is changing rapidly). In such a case one can recover mitogenome using only mitochondrial reference of the target species or its closest sister, but this would lead to inclusion of NUMTs in the mitogenome assembly and any results of downstream analyses should be interpreted carefully. More or less the pipeline is similar for both the cases, and has been expanded upon below.

Although I have used geneious to generate a consensus and annotate, one can use any open source software to do the same.

Please note that Step 5 is common for both the case scenarios.

Mitogenome recovery using reference nuclear and mitgenome

Step 1 - Download the reference genomes and index (bwa)

From NCBI or other online sources, download the nuclear and mitochondrial reference genome of your species of interest. In case no reference is available we can use a close sister species to our taxa. Merge the nuclear reference and mitochondrial reference. This will allow us to reduce the proportion of NUMTs in our mitochondrial assembly downstream.

cat nuclear_ref.fna mito_ref.fasta > final_ref.fna

Now we index the reference using bwa

bwa index ~/path/to/final_ref.fna

Step 2 - Map the trimmed reads to the indexed reference (BWA and samtools) and extract only reads mapping to the mitogenome region

Now we will map the trimmed reads of our data to the indexed reference we generated in the previous step. To do that we will use bwa.

bwa mem -t 20 -M -R "@RG\tID:\tSM:\tLB:IlluminaWGS\tPL:ILLUMINA" \ #here I have used -t flag to specify cores, -R to add read-group to bam, do not forget to add details to the bam-header
~/path/to/indexed/final_ref.fna \
~/path/to/R1.fq.gz \
~/path/to/R2.fq.gz | \
samtools view -bh - | \
samtools sort -@20 -T tmp -o output_mapped_sorted.bam

We will now extract the reads mapping to only the mitochondrial region. To do that we will first check the header of our mitochondrial reference fasta to know what to look for in the generated bam file.

head /path/to/mito_ref.fasta

>NC_042191.1 Culicicapa ceylonensis mitochondrion, complete genome
GTCCCTGTAGCTTACAAAAAGCATGACACTGAAGATGTCAAGACGGCTGCCATATGCACCCAAGGACAAA
AGACTTAGTCCTAACCTTACTGTTGGTTTTTGCTAGACATATACATGCAAGTATCCGCGCGCCAGTGTAG

The head command will print out the first few lines of the fasta file. Note down the accession number after > (in this case NC_042191.1). We are now going to use samtools to extract the reads mapping to this region.

samtools view output_mapped_sorted.bam "NC_042191.1" > output_mitomapped_sorted.bam

We now have the bam file consisting of only reads mapped to the mitogenome region. Do note that the file size reduces considerably as < 98% of the reads will map to the mitogenome.

Step 3 - Run BAMQC to get an idea of mapping coverage and quality (qualimap)

Using qualimap bamqc we will assess the mapping coverage and quality

qualimap bamqc -bam output_mitomapped_sorted.bam -outfile results.pdf

Inspect the output to understand the coverage and quality of the reads mapped to the mitogenome region. We can now move to Step 5 to generate a consensus mitogenome and annotate.

Mitogenome recovery using mitogenome reference only

Step 1 - Download the mitogenome reference and index (bwa)

From NCBI or other online sources, download the mitochondrial reference genome of your species of interest. In case no reference is available we can use a close sister species to our taxa.

Now we index the reference using bwa

bwa index ~/path/to/mito_ref.fasta

Step 2 - Map the trimmed reads to the indexed reference (BWA and samtools)

Now we will map the trimmed reads of our data to the indexed reference we generated in the previous step. To do that we will use bwa.

bwa mem -t 20 -M -R "@RG\tID:\tSM:\tLB:IlluminaWGS\tPL:ILLUMINA" \ #here I have used -t flag to specify cores, -R to add read-group to bam, do not forget to add details to the bam-header
~/path/to/indexed/mito_ref.fasta/ \
~/path/to/R1.fq.gz \
~/path/to/R2.fq.gz | \
samtools view -bh - | \
samtools sort -@20 -T tmp -o output_mapped_sorted.bam

Step 3 - Run BAMQC to get an idea of mapping coverage and quality (qualimap)

Using qualimap bamqc we will assess the mapping coverage and quality

qualimap bamqc -bam output_mapped_sorted.bam -outfile results.pdf

Inspect the output to understand the coverage and quality of the reads mapped to the reference mitogenome. Note that < 98% of our reads have mapped.

Step 4 - Extract only mapped reads (samtools)

Now we will extract only the reads mapped to the reference mitogenome. To do this we will use samtools

samtools view -b -F 4 -o output_mitomapped_sorted.bam output_mapped_sorted.bam

We are now ready to proceed to the Step 5 to generate consensus mitogenome and annotate.

Step 5 - Using Geneious prime

Import the final bam file into geneious using File > Import > Files

Download the same reference mitogenome (only mitogenome reference) that was used in generating the bam file using NCBI database in the geneious software.

You can now visualize where all the reads map on the reference mitogenome.

Once you get a good idea, you can generate a consensus sequence. To do this select Tools > Generate Consensus Seqeunce

After generating the consensus sequence, transfer the annotations from the reference mitogenome sequence by checking the checkbox under Annotate and Predict tab.

Make sure to use a similarity of 40-50% to ensure that annotations are transfered from your reference to the consensus sequence.

In case we are interested in circularizing the mitogenome we can do it using Sequence > Circularize Sequence

This will give us the final circularized annotated mitogenome which has beene recovered from the whole-genome dataset

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Mitogenome recovery using reference nuclear and mitgenome

Step 1 - Download the reference genomes and index (bwa)

Step 2 - Map the trimmed reads to the indexed reference (BWA and samtools) and extract only reads mapping to the mitogenome region

Step 3 - Run BAMQC to get an idea of mapping coverage and quality (qualimap)

Mitogenome recovery using mitogenome reference only

Step 1 - Download the mitogenome reference and index (bwa)

Step 2 - Map the trimmed reads to the indexed reference (BWA and samtools)

Step 3 - Run BAMQC to get an idea of mapping coverage and quality (qualimap)

Step 4 - Extract only mapped reads (samtools)

Step 5 - Using Geneious prime

About

Releases

Packages

spilornis-ng/Mitogenome_geneious

Folders and files

Latest commit

History

Repository files navigation

Introduction

Mitogenome recovery using reference nuclear and mitgenome

Step 1 - Download the reference genomes and index (bwa)

Step 2 - Map the trimmed reads to the indexed reference (BWA and samtools) and extract only reads mapping to the mitogenome region

Step 3 - Run BAMQC to get an idea of mapping coverage and quality (qualimap)

Mitogenome recovery using mitogenome reference only

Step 1 - Download the mitogenome reference and index (bwa)

Step 2 - Map the trimmed reads to the indexed reference (BWA and samtools)

Step 3 - Run BAMQC to get an idea of mapping coverage and quality (qualimap)

Step 4 - Extract only mapped reads (samtools)

Step 5 - Using Geneious prime

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages