Skip to content

Commit

Permalink
Release v0.1.2 (#13)
Browse files Browse the repository at this point in the history
  • Loading branch information
leahkemp authored Nov 7, 2024
1 parent cec784a commit 84e09ba
Show file tree
Hide file tree
Showing 7 changed files with 392 additions and 301 deletions.
116 changes: 69 additions & 47 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,31 +4,33 @@

Pipefaceee.

Nextflow pipeline to align, variant call (SNP's, indels's, SV's) and phase long read [ONT](https://nanoporetech.com/) and/or [pacbio](https://www.pacb.com/) HiFi data.
Nextflow pipeline to align, variant call (SNP's, indels's, SV's), phase and optionally annotate (SNP's, indels's) long read [ONT](https://nanoporetech.com/) and/or [pacbio](https://www.pacb.com/) HiFi data.

There currently exists tools and workflows that undertake comparable analyses, but pipeface serves as a central workflow to process long read data (both ONT and pacbio HiFi data). Pipeface's future hold's STR, CNV and tandem repeat calling, as well as the analysis of cohorts.

<p align="center">
<img src="./images/pipeface.png">

## Workflow

### Overview

```mermaid
%%{init: {'theme':'dark'}}%%
flowchart LR
input_data("Input data: \n\n ONT fastq.gz \n and/or \n ONT fastq \n and/or \n ONT uBAM \n and/or \n pacbio HiFi uBAM")
input_data("Input data: <br><br> ONT fastq.gz <br> and/or <br> ONT fastq <br> and/or <br> ONT uBAM <br> and/or <br> pacbio HiFi uBAM")
merging{{"Merge runs (if needed)"}}
alignment{{"bam to fastq conversion (if needed), alignment, sorting"}}
depth{{"Calculate alignment depth"}}
snp_indel_calling{{"SNP/indel variant calling"}}
snp_indel_phasing{{"SNP/indel phasing"}}
snp_indel_annotation{{"SNP/indel annotation (optional - hg38 only)"}}
haplotagging{{"Haplotagging bams"}}
sv_calling{{"Structural variant calling"}}
input_data-.->merging-.->alignment-.->snp_indel_calling-.->snp_indel_phasing-.->haplotagging-.->sv_calling
alignment-.->depth
alignment-.->haplotagging
snp_indel_phasing-.->snp_indel_annotation
```

Expand All @@ -38,45 +40,50 @@ alignment-.->haplotagging
%%{init: {'theme':'dark'}}%%
flowchart LR
ont_data_f1("Sample 1 \n\n Input data: \n\n ONT fastq.gz")
ont_data_f2("Sample 1 \n\n Input data: \n\n ONT fastq.gz")
pacbio_data_f3("Sample 2 \n\n Input data: \n\n Pacbio HiFi uBAM")
pacbio_data_f4("Sample 2 \n\n Input data: \n\n Pacbio HiFi uBAM")
ont_data_f5("Sample 3 \n\n Input data: \n\n ONT fastq")
ont_data_f6("Sample 4 \n\n Input data: \n\n ONT uBAM")
merging_m1{{"Description: merge runs \n\n Main tools: GNU coreutils \n\n Commands: cat"}}
merging_m2{{"Description: merge runs \n\n Main tools: Samtools \n\n Commands: samtools merge"}}
alignment_s1{{"Description: alignment, sorting \n\n Main tools: Minimap2 and Samtools \n\n Commands: minimap2 and samtools sort"}}
alignment_s2{{"Description: alignment, sorting \n\n Main tools: Minimap2 and Samtools \n\n Commands: minimap2 and samtools sort"}}
alignment_s3{{"Description: bam to fastq conversion, alignment, sorting \n\n Main tools: Minimap2 and Samtools \n\n Commands: minimap2 and samtools sort"}}
alignment_s4{{"Description: bam to fastq conversion, alignment, sorting \n\n Main tools: Minimap2 and Samtools \n\n Commands: minimap2 and samtools sort"}}
depth_s1{{"Description: calculate alignment depth \n\n Main tools: Samtools \n\n Commands: samtools depth"}}
depth_s2{{"Description: calculate alignment depth \n\n Main tools: Samtools \n\n Commands: samtools depth"}}
depth_s3{{"Description: calculate alignment depth \n\n Main tools: Samtools \n\n Commands: samtools depth"}}
depth_s4{{"Description: calculate alignment depth \n\n Main tools: Samtools \n\n Commands: samtools depth"}}
snp_indel_calling_s1{{"Description: SNP/indel variant calling \n\n Main tools: Clair3 or DeepVariant (NVIDIA Parabricks) \n\n Commands: run_clair3.sh or pbrun deepvariant"}}
snp_indel_calling_s2{{"Description: SNP/indel variant calling \n\n Main tools: Clair3 or DeepVariant (NVIDIA Parabricks) \n\n Commands: run_clair3.sh or pbrun deepvariant"}}
snp_indel_calling_s3{{"Description: SNP/indel variant calling \n\n Main tools: Clair3 or DeepVariant (NVIDIA Parabricks) \n\n Commands: run_clair3.sh or pbrun deepvariant"}}
snp_indel_calling_s4{{"Description: SNP/indel variant calling \n\n Main tools: Clair3 or DeepVariant (NVIDIA Parabricks) \n\n Commands: run_clair3.sh or pbrun deepvariant"}}
snp_indel_phasing_s1{{"Description: SNP/indel phasing \n\n Main tools: WhatsHap \n\n Commands: whatshap phase"}}
snp_indel_phasing_s2{{"Description: SNP/indel phasing \n\n Main tools: WhatsHap \n\n Commands: whatshap phase"}}
snp_indel_phasing_s3{{"Description: SNP/indel phasing \n\n Main tools: WhatsHap \n\n Commands: whatshap phase"}}
snp_indel_phasing_s4{{"Description: SNP/indel phasing \n\n Main tools: WhatsHap \n\n Commands: whatshap phase"}}
haplotagging_s1{{"Description: haplotagging bams \n\n Main tools: WhatsHap \n\n Commands: whatshap haplotag"}}
haplotagging_s2{{"Description: haplotagging bams \n\n Main tools: WhatsHap \n\n Commands: whatshap haplotag"}}
haplotagging_s3{{"Description: haplotagging bams \n\n Main tools: WhatsHap \n\n Commands: whatshap haplotag"}}
haplotagging_s4{{"Description: haplotagging bams \n\n Main tools: WhatsHap \n\n Commands: whatshap haplotag"}}
sv_calling_s1{{"Description: structural variant calling \n\n Main tools: Sniffles2 and/or cuteSV \n\n Commands: sniffles and/or cuteSV"}}
sv_calling_s2{{"Description: structural variant calling \n\n Main tools: Sniffles2 and/or cuteSV \n\n Commands: sniffles and/or cuteSV"}}
sv_calling_s3{{"Description: structural variant calling \n\n Main tools: Sniffles2 and/or cuteSV \n\n Commands: sniffles and/or cuteSV"}}
sv_calling_s4{{"Description: structural variant calling \n\n Main tools: Sniffles2 and/or cuteSV \n\n Commands: sniffles and/or cuteSV"}}
ont_data_f1("Sample 1 <br><br> Input data: <br><br> ONT fastq.gz")
ont_data_f2("Sample 1 <br><br> Input data: <br><br> ONT fastq.gz")
pacbio_data_f3("Sample 2 <br><br> Input data: <br><br> Pacbio HiFi uBAM")
pacbio_data_f4("Sample 2 <br><br> Input data: <br><br> Pacbio HiFi uBAM")
ont_data_f5("Sample 3 <br><br> Input data: <br><br> ONT fastq")
ont_data_f6("Sample 4 <br><br> Input data: <br><br> ONT uBAM")
merging_m1{{"Description: merge runs <br><br> Main tools: GNU coreutils <br><br> Commands: cat"}}
merging_m2{{"Description: merge runs <br><br> Main tools: Samtools <br><br> Commands: samtools merge"}}
alignment_s1{{"Description: alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
alignment_s2{{"Description: alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
alignment_s3{{"Description: bam to fastq conversion, alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
alignment_s4{{"Description: bam to fastq conversion, alignment, sorting <br><br> Main tools: Minimap2 and Samtools <br><br> Commands: minimap2 and samtools sort"}}
depth_s1{{"Description: calculate alignment depth <br><br> Main tools: Samtools <br><br> Commands: samtools depth"}}
depth_s2{{"Description: calculate alignment depth <br><br> Main tools: Samtools <br><br> Commands: samtools depth"}}
depth_s3{{"Description: calculate alignment depth <br><br> Main tools: Samtools <br><br> Commands: samtools depth"}}
depth_s4{{"Description: calculate alignment depth <br><br> Main tools: Samtools <br><br> Commands: samtools depth"}}
snp_indel_calling_s1{{"Description: SNP/indel variant calling <br><br> Main tools: Clair3 or DeepVariant <br><br> Commands: run_clair3.sh or run_deepvariant"}}
snp_indel_calling_s2{{"Description: SNP/indel variant calling <br><br> Main tools: Clair3 or DeepVariant <br><br> Commands: run_clair3.sh or run_deepvariant"}}
snp_indel_calling_s3{{"Description: SNP/indel variant calling <br><br> Main tools: Clair3 or DeepVariant <br><br> Commands: run_clair3.sh or run_deepvariant"}}
snp_indel_calling_s4{{"Description: SNP/indel variant calling <br><br> Main tools: Clair3 or DeepVariant <br><br> Commands: run_clair3.sh or run_deepvariant"}}
snp_indel_phasing_s1{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase"}}
snp_indel_phasing_s2{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase"}}
snp_indel_phasing_s3{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase"}}
snp_indel_phasing_s4{{"Description: SNP/indel phasing <br><br> Main tools: WhatsHap <br><br> Commands: whatshap phase"}}
snp_indel_annotation_s1{{"Description: SNP/indel annotation (optional - hg38 only)" <br><br> Main tools: ensembl-vep <br><br> Commands: vep}}
snp_indel_annotation_s2{{"Description: SNP/indel annotation (optional - hg38 only)" <br><br> Main tools: ensembl-vep <br><br> Commands: vep}}
snp_indel_annotation_s3{{"Description: SNP/indel annotation (optional - hg38 only)" <br><br> Main tools: ensembl-vep <br><br> Commands: vep}}
snp_indel_annotation_s4{{"Description: SNP/indel annotation (optional - hg38 only)" <br><br> Main tools: ensembl-vep <br><br> Commands: vep}}
haplotagging_s1{{"Description: haplotagging bams <br><br> Main tools: WhatsHap <br><br> Commands: whatshap haplotag"}}
haplotagging_s2{{"Description: haplotagging bams <br><br> Main tools: WhatsHap <br><br> Commands: whatshap haplotag"}}
haplotagging_s3{{"Description: haplotagging bams <br><br> Main tools: WhatsHap <br><br> Commands: whatshap haplotag"}}
haplotagging_s4{{"Description: haplotagging bams <br><br> Main tools: WhatsHap <br><br> Commands: whatshap haplotag"}}
sv_calling_s1{{"Description: structural variant calling <br><br> Main tools: Sniffles2 and/or cuteSV <br><br> Commands: sniffles and/or cuteSV"}}
sv_calling_s2{{"Description: structural variant calling <br><br> Main tools: Sniffles2 and/or cuteSV <br><br> Commands: sniffles and/or cuteSV"}}
sv_calling_s3{{"Description: structural variant calling <br><br> Main tools: Sniffles2 and/or cuteSV <br><br> Commands: sniffles and/or cuteSV"}}
sv_calling_s4{{"Description: structural variant calling <br><br> Main tools: Sniffles2 and/or cuteSV <br><br> Commands: sniffles and/or cuteSV"}}
ont_data_f1-.->merging_m1-.->alignment_s1-.->snp_indel_calling_s1-.->snp_indel_phasing_s1-.->haplotagging_s1-.->sv_calling_s1
ont_data_f2-.->merging_m1
Expand All @@ -96,6 +103,11 @@ alignment_s2-.->haplotagging_s2
alignment_s3-.->haplotagging_s3
alignment_s4-.->haplotagging_s4
snp_indel_phasing_s1-.->snp_indel_annotation_s1
snp_indel_phasing_s2-.->snp_indel_annotation_s2
snp_indel_phasing_s3-.->snp_indel_annotation_s3
snp_indel_phasing_s4-.->snp_indel_annotation_s4
```

## Main analyses
Expand All @@ -107,10 +119,11 @@ alignment_s4-.->haplotagging_s4
## Main tools

- [Minimap2](https://github.com/lh3/minimap2)
- [Clair3](https://github.com/HKU-BAL/Clair3) OR [DeepVariant](https://github.com/google/deepvariant) (wrapped in [NVIDIA Parabricks](https://docs.nvidia.com/clara/parabricks/latest/))
- [Clair3](https://github.com/HKU-BAL/Clair3) or [DeepVariant](https://github.com/google/deepvariant)
- [WhatsHap](https://github.com/whatshap/whatshap)
- [Sniffles2](https://github.com/fritzsedlazeck/Sniffles) AND/OR [cuteSV](https://github.com/tjiangHIT/cuteSV)
- [Sniffles2](https://github.com/fritzsedlazeck/Sniffles) and/or [cuteSV](https://github.com/tjiangHIT/cuteSV)
- [Samtools](https://github.com/samtools/samtools)
- [ensembl-vep](https://github.com/Ensembl/ensembl-vep)

## Main input files

Expand All @@ -119,6 +132,7 @@ alignment_s4-.->haplotagging_s4
- ONT/pacbio HiFi FASTQ (gzipped or uncompressed) or unaligned BAM
- Indexed reference genome
- Clair3 models (if running Clair3)
- [DeepVariant GPU 1.6.1 docker container](https://hub.docker.com/layers/google/deepvariant/1.6.1-gpu/images/sha256-7929c55106d3739daa18d52802913c43af4ca2879db29656056f59005d1d46cb?context=explore) pulled via singularity (if running DeepVariant)

### Optional

Expand All @@ -129,18 +143,26 @@ alignment_s4-.->haplotagging_s4

- Aligned, sorted and haplotagged bam
- Clair3 or DeepVariant phased SNP/indel VCF file
- Clair3 or DeepVariant SNP/indel gVCF file
- Clair3 or DeepVariant phased and annotated SNP/indel VCF file (optional - hg38 only)
- Phased Sniffles2 and/or un-phased cuteSV SV VCF file

## Assumptions

- Running pipeline on Australia's [National Computational Infrastructure (NCI)](https://nci.org.au/)
- Access to if89 project on [National Computational Infrastructure (NCI)](https://nci.org.au/)
- Access to xy86 project on [National Computational Infrastructure (NCI)](https://nci.org.au/) (if running variant annotation)
- Access to pipeline dependencies:
- [Nextflow and it's java dependency](https://nf-co.re/docs/usage/installation). Validated to run on:
- Nextflow 24.04.1
- Java version 17.0.2
- Java 17.0.2

*[See the list of software and their versions used by this version of pipeface](./docs/software_versions.txt) as well as the [list of variant databases and their versions](./docs/database_versions.txt) if variant annotation is carried out (assuming the default [nextflow_pipeface.config](./config/nextflow_pipeface.config) file is used).*

## Run it!

See a walkthrough for how to [run pipeface on NCI](./docs/run_on_nci.md).

## Credit

This is a highly collaborative project, with many contributions from the [Genomic Technologies Lab](https://www.garvan.org.au/research/labs-groups/genomic-technologies-lab). Notably, Dr Andre Reis and Dr Ira Deveson are closely involved in the development of this pipeline. The installation and hosting of software used in this pipeline has and continues to be supported by the [Australian BioCommons Tools and Workflows project (if89)](https://australianbiocommons.github.io/ables/if89/).

39 changes: 29 additions & 10 deletions config/nextflow_pipeface.config
Original file line number Diff line number Diff line change
@@ -1,11 +1,21 @@

params.vep_db = '/g/data/if89/datalib/vep/112/grch38/'
params.revel_db = '/g/data/xy86/revel/1.3/grch38/new_tabbed_revel_grch38.tsv.gz'
params.gnomad_db = '/g/data/xy86/gnomad/genomes/v4.1.0/gnomad.joint.v4.1.sites.chrall.vcf.gz'
params.clinvar_db = '/g/data/xy86/clinvar/2024-08-25/grch38/clinvar_20240825.vcf.gz'
params.cadd_snv_db = '/g/data/xy86/cadd/1.7/grch38/whole_genome_SNVs.tsv.gz'
params.cadd_indel_db = '/g/data/xy86/cadd/1.7/grch38/gnomad.genomes.r4.0.indel.tsv.gz'
params.spliceai_snv_db = '/g/data/xy86/spliceai/v1.3/grch38/spliceai_scores.raw.snv.hg38.vcf.gz'
params.spliceai_indel_db = '/g/data/xy86/spliceai/v1.3/grch38/spliceai_scores.raw.indel.hg38.vcf.gz'
params.alphamissense_db = '/g/data/xy86/alphamissense/grch38/AlphaMissense_hg38.tsv.gz'

process {

executor = 'pbspro'
project = 'kr68'
storage = 'gdata/if89+scratch/kr68+gdata/kr68+gdata/ox63'
storage = 'gdata/if89+gdata/xy86+scratch/kr68+gdata/kr68+gdata/ox63'
// provide proper access to if89 environmental modules
beforeScript = 'module use -a /g/data/if89/apps/modulefiles'
beforeScript = 'module use -a /g/data/if89/apps/modulefiles && module use -a /g/data/if89/shpcroot/modules'

withName: scrape_settings {
queue = 'normal'
Expand Down Expand Up @@ -33,7 +43,7 @@ process {
withName: minimap2 {
queue = 'normal'
cpus = '16'
time = '10h'
time = '14h'
memory = '64GB'
module = 'minimap2/2.28:samtools/1.19'
}
Expand All @@ -49,21 +59,30 @@ process {
withName: clair3 {
queue = 'normal'
cpus = '32'
time = '6h'
time = '9h'
memory = '128GB'
module = 'clair3/v1.0.9:htslib/1.16'
module = 'clair3/v1.0.9'
}

withName: deepvariant {
queue = 'gpuvolta'
cpus = '24'
gpus = '2'
time = '6h'
memory = '180GB'
module = 'parabricks/4.2.1:htslib/1.16'
time = '8h'
memory = '192GB'
disk = '80GB'
module = 'singularity'
}

withName: vep_snv {
queue = 'normal'
cpus = '32'
time = '10h'
memory = '128GB'
module = 'singularity:htslib/1.16:ensemblorg/ensembl-vep/release_112.0'
}

withName: 'whatshap_phase_clair3|whatshap_phase_dv|whatshap_haplotag' {
withName: 'whatshap_phase|whatshap_haplotag' {
queue = 'normal'
cpus = '4'
time = '10h'
Expand All @@ -87,7 +106,7 @@ process {
module = 'cuteSV/1.0.13:htslib/1.16'
}

withName: 'publish_settings|publish_bam_header|publish_minimap2|publish_clair3|publish_deepvariant|publish_whatshap_phase_clair3|publish_whatshap_phase_dv|publish_whatshap_haplotag|publish_sniffles|publish_cutesv' {
withName: 'publish_settings|publish_bam_header|publish_depth|publish_whatshap_phase|publish_whatshap_haplotag|publish_sniffles|publish_cutesv' {
queue = 'normal'
cpus = '1'
time = '20m'
Expand Down
5 changes: 3 additions & 2 deletions config/parameters_pipeface.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
"tandem_repeat": "",
"snp_indel_caller": "",
"sv_caller": "",
"outdir": ""
"annotate": "",
"outdir": "",
"deepvariant_container": ""

}

9 changes: 9 additions & 0 deletions docs/database_versions.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
vep-cache: homo_sapiens/112/GRCh38
REVEL database: 1.3
gnomAD database: 4.1.0
ClinVar database: 2024-08-25
CADD SNV database: 1.7/GRCh38
CADD indel database: 1.7/GRCh38
SpliceAI SNV database: 1.3
SpliceAI indel database: 1.3
AlphaMissense database: GRCh38
Loading

0 comments on commit 84e09ba

Please sign in to comment.