diff --git a/README.md b/README.md
index 2cfa013..69d8835 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ See the preprint here: [Snakemake Workflows for Long-read Bacterial Genome Assem
| read filtering | assembly | long read polishing | short read polishing | reference-based polishing |
| --- | --- | --- | --- | --- |
-| [Filtlong](https://github.com/rrwick/Filtlong) | [Flye](https://github.com/fenderglass/Flye)
[raven](https://github.com/lbcb-sci/raven)
[miniasm](https://github.com/lh3/miniasm)
[Unicycler](https://github.com/rrwick/Unicycler)
[Canu](https://github.com/marbl/canu) | [racon](https://github.com/lbcb-sci/racon)
[medaka](https://github.com/nanoporetech/medaka) | [pilon](https://github.com/broadinstitute/pilon/wiki)
[Polypolish](https://github.com/rrwick/Polypolish)
[POLCA](https://github.com/alekseyzimin/masurca#polca) | [Homopolish](https://github.com/ythuang0522/homopolish)
[proovframe](https://github.com/thackl/proovframe) |
+| [Filtlong](https://github.com/rrwick/Filtlong)
[Rasusa](https://github.com/mbhall88/rasusa) | [Flye](https://github.com/fenderglass/Flye)
[raven](https://github.com/lbcb-sci/raven)
[miniasm](https://github.com/lh3/miniasm)
[Unicycler](https://github.com/rrwick/Unicycler)
[Canu](https://github.com/marbl/canu) | [racon](https://github.com/lbcb-sci/racon)
[medaka](https://github.com/nanoporetech/medaka) | [pilon](https://github.com/broadinstitute/pilon/wiki)
[Polypolish](https://github.com/rrwick/Polypolish)
[POLCA](https://github.com/alekseyzimin/masurca#polca) | [Homopolish](https://github.com/ythuang0522/homopolish)
[proovframe](https://github.com/thackl/proovframe) |
## Quick start
@@ -114,6 +114,17 @@ The output is written to `fastq-ont/mysample+filtlongMB,,,.fastq`.
When using any of the Filtlong keywords in a folder name, they must be followed by an underscore, followed by the keyword for the assembler.
+### Rasusa
+The ONT reads can be randomly subsampled prior to the assembly.
+
+The available keywords are:
+
+**`rasusaMB`**
+This will subsample the ONT reads to a total of `m` megabases.
+The output is written to `fastq-ont/mysample+rasusaMB.fastq`.
+
+When using any the Rasusa keyword in a folder name, it must be followed by an underscore, followed by the keyword for the assembler.
+
### Flye
Following keywords can be used to run the assembly with Flye:
diff --git a/Snakefile b/Snakefile
index 49e60b0..140aa55 100644
--- a/Snakefile
+++ b/Snakefile
@@ -43,11 +43,13 @@ wildcard_constraints:
def get_ont_fq(wildcards):
if "filtlong" in wildcards.sample:
return "fastq-ont/" + wildcards.sample + ".fastq"
+ elif "rasusa" in wildcards.sample:
+ return "fastq-ont/" + wildcards.sample + ".fastq"
else:
return glob("fastq-ont/" + wildcards.sample + ".fastq*")
-# use split("+")[0] here for removing the +filtlong... suffices from sample names for Illumina reads
+# use split("+")[0] here for removing the +filtlong... or +rasusa... suffices from sample names for Illumina reads
def get_R1_fq(wildcards):
return glob("fastq-illumina/" + wildcards.sample.split("+")[0] + "_R1.fastq*")
@@ -161,6 +163,21 @@ rule filtlong:
"""
+rule rasusaMB:
+ conda:
+ "env/conda-rasusa.yaml"
+ threads: 1
+ input:
+ fq=get_ont_fq,
+ output:
+ "fastq-ont/{sample}+rasusaMB{num}.fastq",
+ log:
+ "fastq-ont/{sample}_rasusaMB{num}_log.txt",
+ shell:
+ """
+ rasusa --bases {wildcards.num}m -i {input} -o {output} 2>{log}
+ """
+
rule filtlongMB:
threads: 1
input:
diff --git a/env/conda-rasusa.yaml b/env/conda-rasusa.yaml
new file mode 100644
index 0000000..87943c4
--- /dev/null
+++ b/env/conda-rasusa.yaml
@@ -0,0 +1,9 @@
+name: ont-assembly-snake-rasusa
+channels:
+ - conda-forge
+ - bioconda
+ - defaults
+ - anaconda
+dependencies:
+ - bioconda::rasusa=0.7.1
+