-
Notifications
You must be signed in to change notification settings - Fork 7
FAQ
TELL US ABOUT IT!!!
- Github issue
- Send us a message on slack
Be sure to include the command used, what config file was used, and what the nextflow error was.
Dr. Wick has a wonderful tutorial available at https://github.com/rrwick/Perfect-bacterial-genome-tutorial
We are flattered that there are those out there that think we can summarize a response into something that would fit into a github readme. In summary, the three aligners do different things and have different issues. The default is flye due to its popularity.
When comparing four samples that went through 'flye', 'minasm', and 'raven' assembly via visualizing their gfa files in bandage, these assemblies look very similar. There are instances, however, where the genome is closed from one assembler but could not be circularized in another. Sometimes there are different numbers of plasmids as well.
flye | miniasm and minipolish | raven | |
---|---|---|---|
sample 1 | ![]() |
![]() |
![]() |
sample 2 | ![]() |
![]() |
![]() |
sample 3 | ![]() |
![]() |
![]() |
sample 4 | ![]() |
![]() |
![]() |
If perfection is the goal, please try Trycycler, which is covered on the trycycler wiki page.
Trycycler is a useful tool that reconciles generated consensus sequences from other assemblers, but has manual steps. There was an attempt to automate these steps into this workflow, but the assemblies did not achieve the desired quality. Other devs more talented that we are may have solved the issues that we were experiencing.
In general, longer reads are better.
Currently the brunt of QC is being undertaken by NanoPlot, busco, and circulocov
The greener the better. This is from using the optional sequencing_summary param (params.sequencing_summary = "nanopore sequencing summary file").

Samples with "few" or "short" reads likely need a different workflow. All reads with lengths less than 1000 are discarded.

Contigs with few reads mapping to them should be discarded.
To get a copy of the template file that Donut Falls uses by default, run
nextflow run UPHL-BioNGS/Donut_Falls --config_file true
This creates an edit_me.config
file in the current directory that the End User can edit for their own purpose. This file can be renamed with no penalty.
To use this edited config file, simply use -c
on the command line.
nextflow run UPHL-BioNGS/Donut_Falls -r main -profile singularity --reads reads -c edit_me.config
Most of this has to do with the quality of the nanpore reads and flye's internal workings. Right now, flye errors are set to be ignored by default, but this can be adjusted in a config file. A common error and ways around it are found in this issue thread. This means that the End User will need a config file with the specified flye parameters.
Some example lines for a config file
process {
withName: flye {
ext.args = "--asm-coverage 50"
}
}
Or something like this
process {
withName: flye {
ext.args = "--meta"
}
}
They perform well.
We did attempt adding canu, but the assembly took forever.
We used to include dragonflye to this workflow, but this assembler was dropped because versions later than 1.0.14 stopped working on our system.
We would like to include plassembler or hybracter as options, but the documentation for these requires more effort than we can put in at the moment.
Note: both dragonflye and hybracter already perform many of the steps of Donut Falls and End Users may want to use those tools instead.
If the End User prefers other assemblers, please let me know and we'll work in some options.
Warning : If there's not a relaible container of the suggested tool, we'll request the End User create a container for that tool and contribute to StaPH-B's docker repositories.
Polishing can be tricky in that there is an intention to remove errors from nanopore sequencing, but then not force the introduction of new errors. As such, a minimum amount of polishing exists in this workflow. If there is a desire to add additional polishers or more rounds of polishing, please read about linking to another nextflow workflow which is covered on a different wiki page.
Then the End User needs to do basecalling and demultiplexing with guppy first.
Something like the following to get the fastq files.
# With config file
guppy_basecaller -i <input path> -s <save path> -c <config file> [options]
# With flowcell and kit name
guppy_basecaller -i <input path> -s <save path> --flowcell <flowcell name> --kit <kit name>
Something like the following to get the demultiplexed fastq files.
guppy_barcoder -i <input fastq path> -s <save path> --barcode_kits <kit name>
Linking a nextflow workflow to Donut Falls is discussed in a separate wiki page if there is a desire to create a nextflow basecalling workflow that uses Donut Falls for assembly.