KrakenParser is a collection of scripts designed to process Kraken2 reports and convert them into CSV format. This pipeline extracts taxonomic abundance data at six levels:
- Phylum
- Class
- Order
- Family
- Genus
- Species
You can run the entire pipeline with a single command, or use the scripts individually depending on your needs.
counts_phylum.csv
parsed from 7 kraken2 reports of metagenomic samples using KrakenParser
:
Sample_id,Euryarchaeota,Euglenozoa,Parabasalia,Apicomplexa,Basidiomycota,Ascomycota,Acidobacteriota,Bdellovibrionota,Chlorobiota,Ignavibacteriota,Planctomycetota,Spirochaetota,Thermotogota,Fusobacteriota,Cyanobacteriota,Mycoplasmatota,Actinomycetota,Pseudomonadota,Bacteroidota,Deferribacterota,Campylobacterota,Thermodesulfobacteriota,Bacillota,Negarnaviricota,Nucleocytoviricota,Uroviricota,Peploviricota
X1,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
X2,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,4,0,0,0,0
X3,11,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,4,0,0,0,0
X4,1313,0,0,0,0,4,0,0,0,0,0,1,2,2,1,3,3,17,33,4,5,4,112,0,0,0,0
X5,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,0,0,0,0
X6,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,0,3,3,0,3,2,13,0,0,0,1
X7,20,1,1,5,1,9,1,6,1,7,1,13,1,3,9,4,10,139,519,0,8,2,81,1,3,1,0
This counts_phylum.csv
is easy to visualize as Relative Abundance Barplot!
To run the full pipeline, use the following command:
KrakenParser --complete -i data/kreports
#Having troubles? Run KrakenParser --complete -h
This will:
- Convert Kraken2 reports to MPA format
- Combine MPA files into a single file
- Extract taxonomic levels into separate text files
- Process extracted text files
- Convert them into CSV format
- The Kraken2 reports must be inside a subdirectory (e.g.,
data/kreports
). - The script automatically creates output directories and processes the data.
pip install krakenparser
You can also run each step manually if needed.
KrakenParser --kreport2mpa -i data/kreports -o data/mpa
#Having troubles? Run KrakenParser --kreport2mpa -h
This script converts Kraken2 .kreport
files into MPA format using KrakenTools.
KrakenParser --combine_mpa -i data/mpa/* -o data/COMBINED.txt
#Having troubles? Run KrakenParser --combine_mpa -h
This merges multiple MPA files into a single combined file.
KrakenParser --deconstruct -i data/COMBINED.txt -o data/counts
#Having troubles? Run KrakenParser --deconstruct -h
This step extracts only species-level data (excluding human reads).
KrakenParser --process -i data/COMBINED.txt -o data/counts/txt/counts_phylum.txt
#Having troubles? Run KrakenParser --process -h
Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up KrakenParser --process
to a loop!
This script cleans up taxonomic names (removes prefixes, replaces underscores with spaces).
KrakenParser --txt2csv -i data/counts/txt/counts_phylum.txt -o data/counts/csv/counts_phylum.csv
#Having troubles? Run KrakenParser --txt2csv -h
Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up KrakenParser --txt2csv
to a loop!
This converts the processed text files into structured CSV format.
- Automates the entire workflow.
- Takes one argument: the path to Kraken2 reports (
data/kreports
). - Runs all the scripts in sequence.
- Converts Kraken2 reports to MPA format.
- Uses
KrakenTools/kreport2mpa.py
.
- Combines multiple MPA files into one.
- Uses
KrakenTools/combine_mpa.py
.
- Extracts phylum, class, order, family, genus, species into separate text files.
- Removes human-related reads.
- Cleans and formats extracted taxonomic data.
- Removes prefixes (
s__
,g__
, etc.), replaces underscores with spaces.
- Converts cleaned text files to CSV.
- Transposes data so that sample names become rows.
After running the full pipeline, the output directory will look like this:
data/
ββ kreports/ # Input Kraken2 reports
ββ mpa/ # Converted MPA files
ββ COMBINED.txt # Merged MPA file
ββ counts/
ββ txt/ # Extracted taxonomic levels in TXT
β ββ counts_species.txt
β ββ counts_genus.txt
β ββ counts_family.txt
β ββ ...
ββ csv/ # Final CSV output
ββ counts_species.csv
ββ counts_genus.csv
ββ counts_family.csv
ββ ...
KrakenParser provides a simple and automated way to convert Kraken2 reports into usable CSV files for downstream analysis. You can run the full pipeline with a single command or use individual scripts as needed.
For any issues or feature requests, feel free to open an issue on GitHub!
π Happy analyzing!