diff --git a/README.md b/README.md index 2cfa013..69d8835 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ See the preprint here: [Snakemake Workflows for Long-read Bacterial Genome Assem | read filtering | assembly | long read polishing | short read polishing | reference-based polishing | | --- | --- | --- | --- | --- | -| [Filtlong](https://github.com/rrwick/Filtlong) | [Flye](https://github.com/fenderglass/Flye)
[raven](https://github.com/lbcb-sci/raven)
[miniasm](https://github.com/lh3/miniasm)
[Unicycler](https://github.com/rrwick/Unicycler)
[Canu](https://github.com/marbl/canu) | [racon](https://github.com/lbcb-sci/racon)
[medaka](https://github.com/nanoporetech/medaka) | [pilon](https://github.com/broadinstitute/pilon/wiki)
[Polypolish](https://github.com/rrwick/Polypolish)
[POLCA](https://github.com/alekseyzimin/masurca#polca) | [Homopolish](https://github.com/ythuang0522/homopolish)
[proovframe](https://github.com/thackl/proovframe) | +| [Filtlong](https://github.com/rrwick/Filtlong)
[Rasusa](https://github.com/mbhall88/rasusa) | [Flye](https://github.com/fenderglass/Flye)
[raven](https://github.com/lbcb-sci/raven)
[miniasm](https://github.com/lh3/miniasm)
[Unicycler](https://github.com/rrwick/Unicycler)
[Canu](https://github.com/marbl/canu) | [racon](https://github.com/lbcb-sci/racon)
[medaka](https://github.com/nanoporetech/medaka) | [pilon](https://github.com/broadinstitute/pilon/wiki)
[Polypolish](https://github.com/rrwick/Polypolish)
[POLCA](https://github.com/alekseyzimin/masurca#polca) | [Homopolish](https://github.com/ythuang0522/homopolish)
[proovframe](https://github.com/thackl/proovframe) | ## Quick start @@ -114,6 +114,17 @@ The output is written to `fastq-ont/mysample+filtlongMB,,,.fastq`. When using any of the Filtlong keywords in a folder name, they must be followed by an underscore, followed by the keyword for the assembler. +### Rasusa +The ONT reads can be randomly subsampled prior to the assembly. + +The available keywords are: + +**`rasusaMB`** +This will subsample the ONT reads to a total of `m` megabases. +The output is written to `fastq-ont/mysample+rasusaMB.fastq`. + +When using any the Rasusa keyword in a folder name, it must be followed by an underscore, followed by the keyword for the assembler. + ### Flye Following keywords can be used to run the assembly with Flye: diff --git a/Snakefile b/Snakefile index 49e60b0..140aa55 100644 --- a/Snakefile +++ b/Snakefile @@ -43,11 +43,13 @@ wildcard_constraints: def get_ont_fq(wildcards): if "filtlong" in wildcards.sample: return "fastq-ont/" + wildcards.sample + ".fastq" + elif "rasusa" in wildcards.sample: + return "fastq-ont/" + wildcards.sample + ".fastq" else: return glob("fastq-ont/" + wildcards.sample + ".fastq*") -# use split("+")[0] here for removing the +filtlong... suffices from sample names for Illumina reads +# use split("+")[0] here for removing the +filtlong... or +rasusa... suffices from sample names for Illumina reads def get_R1_fq(wildcards): return glob("fastq-illumina/" + wildcards.sample.split("+")[0] + "_R1.fastq*") @@ -161,6 +163,21 @@ rule filtlong: """ +rule rasusaMB: + conda: + "env/conda-rasusa.yaml" + threads: 1 + input: + fq=get_ont_fq, + output: + "fastq-ont/{sample}+rasusaMB{num}.fastq", + log: + "fastq-ont/{sample}_rasusaMB{num}_log.txt", + shell: + """ + rasusa --bases {wildcards.num}m -i {input} -o {output} 2>{log} + """ + rule filtlongMB: threads: 1 input: diff --git a/env/conda-rasusa.yaml b/env/conda-rasusa.yaml new file mode 100644 index 0000000..87943c4 --- /dev/null +++ b/env/conda-rasusa.yaml @@ -0,0 +1,9 @@ +name: ont-assembly-snake-rasusa +channels: + - conda-forge + - bioconda + - defaults + - anaconda +dependencies: + - bioconda::rasusa=0.7.1 +