From 6e7ddde37da33a613f267dc7183fb5e234cb9376 Mon Sep 17 00:00:00 2001 From: kayleemathews Date: Tue, 13 Feb 2024 14:12:10 -0500 Subject: [PATCH 1/5] add azure JG link --- website/docs/Pipelines/JointGenotyping/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/website/docs/Pipelines/JointGenotyping/README.md b/website/docs/Pipelines/JointGenotyping/README.md index 98eaeaa858..929f600674 100644 --- a/website/docs/Pipelines/JointGenotyping/README.md +++ b/website/docs/Pipelines/JointGenotyping/README.md @@ -25,6 +25,8 @@ The pipeline can be configured to run using one of the following GATK variant fi The pipeline takes in a sample map file listing GVCF files produced by HaplotypeCaller in GVCF mode and produces a filtered VCF file (with index) containing genotypes for all samples present in the input VCF files. All sites that are present in the input VCF file are retained. Filtered sites are annotated as such in the FILTER field. If you are new to VCF files, see the [file type specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf). +The JointGenotyping pipeline can be adapted to run on Microsoft Azure instead of Google Cloud. For more information, see the [azure-warp-joint-calling GitHub repository](https://github.com/broadinstitute/azure-warp-joint-calling). + ## Set-up ### JointGenotyping Installation and Requirements From c724b161fbd0fc69a5e05367fb8094d2af9b1ca0 Mon Sep 17 00:00:00 2001 From: kayleemathews Date: Tue, 13 Feb 2024 14:36:14 -0500 Subject: [PATCH 2/5] update exome docs --- .../Exome_Germline_Single_Sample_Pipeline/README.md | 4 ++-- .../Exome_Germline_Single_Sample_Pipeline/exome.methods.md | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md index 2165bd249d..3f8125c861 100644 --- a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Exome_Germline_Single_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [ExomeGermlineSingleSample_v3.1.16](https://github.com/broadinstitute/warp/releases?q=ExomeGermlineSingleSample_v3.0.0&expanded=true) | December, 2023 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [ExomeGermlineSingleSample_v3.1.17](https://github.com/broadinstitute/warp/releases?q=ExomeGermlineSingleSample_v3.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | The Exome Germline Single Sample pipeline implements data pre-processing and initial variant calling according to the GATK Best Practices for germline SNP and Indel discovery in human exome sequencing data. @@ -27,7 +27,7 @@ The Exome Germline Single Sample workflow is written in the Workflow Description ### Software Version Requirements -* [GATK 4.3.0.0](https://github.com/broadinstitute/gatk/releases/tag/4.3.0.0) +* [GATK 4.5.0.0](https://github.com/broadinstitute/gatk/releases/tag/4.5.0.0) * Picard 2.26.10 * Samtools 1.11 * Python 3.0 diff --git a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md index a66740f22a..a09c96719b 100644 --- a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md +++ b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md @@ -2,13 +2,13 @@ sidebar_position: 2 --- -# Exome Germline Single Sample v3.0.0 Methods +# Exome Germline Single Sample v3.1.17 Methods The following contains a detailed methods description outlining the pipeline’s process, software, and tools that can be modified for a publication methods section. ## Detailed Methods -Preprocessing and variant calling was performed using the ExomeGermlineSingleSample 3.0.0 pipeline using Picard 2.23.8, GATK 4.2.2.0, and Samtools 1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)). +Preprocessing and variant calling was performed using the ExomeGermlineSingleSample 3.1.17 pipeline using Picard 2.26.10, GATK 4.5.0.0, and Samtools 1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)). ### Pre-processing and QC @@ -31,4 +31,5 @@ Prior to variant calling, the variant calling interval list was split to enable The pipeline’s final outputs included metrics, the ValidateSamFile validation reports, an aligned CRAM with index, and a reblocked GVCF containing variant calls with an accompanying index. ## Previous methods documents +- [ExomeGermlineSingleSample_v3.0.0](https://github.com/broadinstitute/warp/blob/ExomeGermlineSingleSample_v3.0.0/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md) - [ExomeGermlineSingleSample_v2.4.4](https://github.com/broadinstitute/warp/blob/ExomeGermlineSingleSample_v2.6.0/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md) \ No newline at end of file From 9883b26c8b149e240e253391401ef18bd36442ff Mon Sep 17 00:00:00 2001 From: kayleemathews Date: Tue, 13 Feb 2024 15:00:37 -0500 Subject: [PATCH 3/5] update illumina docs --- .../Illumina_genotyping_array_spec.md | 6 +++--- .../Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md index 46da522506..1462c4fdb7 100644 --- a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md +++ b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/Illumina_genotyping_array_spec.md @@ -4,7 +4,7 @@ sidebar_position: 2 # VCF Overview: Illumina Genotyping Array -The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.11.0 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline. +The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.12.15 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline. This document describes the Array pipeline’s unique VCF fields and attributes that are not listed in the standard VCF specification. To learn more about the pipeline, see the [Illumina Genotyping Array Pipeline Overview](./README.md). @@ -26,7 +26,7 @@ Each VCF has meta information fields with attributes that generally describe the - extendedIlluminaManifestVersion - Version of the ‘extended Illumina manifest’ used by the VCF - generation software. - extendedManifestFile - File name of the ‘extended Illumina manifest’ used by the VCF generation software - fingerprintGender - Gender (sex) determined using an orthogonal fingerprinting technology, populated by an optional parameter used by the VCF generation software -- gtcCallRate - GTC call rate of the sample processed that is generated by the autocall/gencall software and represents the fraction of callable loci that had valid calls +- gtcCallRate - GTC call rate of the sample processed that is generated by the autocall/gencall software and represents the fraction of callable loci that had valid calls; ignores zeroed-out SNPs - imagingDate - Creation date for the chip well barcode IDATs (raw image scans) - manifestFile - Name of the Illumina manifest (.bpm) file used by the VCF generation software - sampleAlias - Sample name @@ -112,4 +112,4 @@ The remaining attributes describe the cluster definitions provided in the cluste - meanX_BB - Mean of normalized X for BB cluster - meanY_AA - Mean of normalized Y for AA cluster - meanY_AB - Mean of normalized Y for AB cluster -- meanY_BB - Mean of normalized Y for BB cluster +- meanY_BB - Mean of normalized Y for BB cluster \ No newline at end of file diff --git a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md index 0cebb97fea..8eb9bed3b0 100644 --- a/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md +++ b/website/docs/Pipelines/Illumina_Genotyping_Arrays_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Illumina_Genotyping_Arrays_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Version 1.11.6](https://github.com/broadinstitute/warp/releases) | October, 2021 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Version 1.12.15](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ![The Illumina Genotyping Array Pipeline](./IlluminaGenotyping.png) @@ -121,7 +121,7 @@ The following table provides a summary of the WDL tasks and software tools calle | SubsetArrayVCF | [SubsetArrayVCF](https://gatk.broadinstitute.org/hc/en-us/articles/360036362532) | GATK | | CollectArraysVariantCallingMetrics | [CollectArraysVariantCallingMetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360037593871) | Picard | | SelectVariants | [SelectVariants](https://gatk.broadinstitute.org/hc/en-us/articles/360036362532) | GATK | -| CheckFingerprint | [CheckFingerprint](https://gatk.broadinstitute.org/hc/en-us/articles/360036358752) | Picard | +| CheckFingerprintTask | [CheckFingerprint](https://gatk.broadinstitute.org/hc/en-us/articles/360036358752) | Picard | | VcfToIntervalList | [VcfToIntervalList](https://gatk.broadinstitute.org/hc/en-us/articles/360036897672) | Picard | | GenotypeConcordance | [GenotypeConcordance](https://gatk.broadinstitute.org/hc/en-us/articles/360036348932) | Picard | @@ -176,7 +176,7 @@ DNA fingerprinting helps maintain sample identity and avoid sample swaps. The Il #### 6. Evaluating an existing fingerprint (optional) -If the genotyping sample already has a corresponding fingerprint VCF file, the workflow can also optionally check the existing fingerprint to confirm sample identity. It uses the CheckFingerPrints task to calculate genotype concordance between the workflow’s genotyping output VCF (final_output_vcf) and the known genotype specified in a fingerprint_genotypes_vcf_file. The workflow returns a boolean for if the sample genotype failed concordance, as well as a Logarithm of Odds (LOD) score for concordance. +If the genotyping sample already has a corresponding fingerprint VCF file, the workflow can also optionally check the existing fingerprint to confirm sample identity. It uses the CheckFingerprintTask task to calculate genotype concordance between the workflow’s genotyping output VCF (final_output_vcf) and the known genotype specified in a fingerprint_genotypes_vcf_file. The workflow returns a boolean for if the sample genotype failed concordance, as well as a Logarithm of Odds (LOD) score for concordance. #### 7. Genotype concordance (optional) From 493db0803cc1571a0a950c8402510bdff5f41b0d Mon Sep 17 00:00:00 2001 From: kayleemathews Date: Tue, 13 Feb 2024 15:42:29 -0500 Subject: [PATCH 4/5] updated imputation docs --- website/docs/Pipelines/Imputation_Pipeline/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/website/docs/Pipelines/Imputation_Pipeline/README.md b/website/docs/Pipelines/Imputation_Pipeline/README.md index 8d82efbc58..4743d3c1af 100644 --- a/website/docs/Pipelines/Imputation_Pipeline/README.md +++ b/website/docs/Pipelines/Imputation_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Imputation_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [Imputation_v1.0.0](https://github.com/broadinstitute/warp/releases?q=Imputation_v1.0.0&expanded=true) | August, 2021 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [Imputation_v1.1.11](https://github.com/broadinstitute/warp/releases?q=Imputation_v1.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | ## Introduction to the Imputation pipeline The Imputation pipeline imputes missing genotypes from either a multi-sample VCF or an array of single sample VCFs using a large genomic reference panel. It is based on the [Michigan Imputation Server pipeline](https://imputationserver.readthedocs.io/en/latest/pipeline/). Overall, the pipeline filters, phases, and performs imputation on a multi-sample VCF. It outputs the imputed VCF along with key imputation metrics. @@ -54,7 +54,7 @@ For examples of how to specify each input in a configuration file, as well as cl | genetics_maps_eagle | Genetic map file for phasing.| File | | output_callset_name | Output callset name. | String | | split_output_to_single_sample | Boolean to split out the final combined VCF to individual sample VCFs; set to false by default. | Boolean | -| merge_ssvcf_mem_gb | Memory allocation for MergeSingleSampleVcfs (in GB). | Int | +| merge_ssvcf_mem_mb | Optional integer specifying memory allocation for MergeSingleSampleVcfs (in MB); default is 3000. | Int | | frac_well_imputed_threshold | Threshold for the fraction of well-imputed sites; default set to 0.9. | Float | | chunks_fail_threshold | Maximum threshold for the number of chunks allowed to fail; default set to 1. | Float | | vcf_suffix | File extension used for the VCF in the reference panel. | String | From 8cd7e5b63b06b39e86844bc94936e7d722fd59d1 Mon Sep 17 00:00:00 2001 From: Kaylee Mathews <95316074+kayleemathews@users.noreply.github.com> Date: Thu, 15 Feb 2024 15:50:30 -0500 Subject: [PATCH 5/5] Update website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com> --- .../Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md index 3f8125c861..6a22b75667 100644 --- a/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md +++ b/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md @@ -7,7 +7,7 @@ slug: /Pipelines/Exome_Germline_Single_Sample_Pipeline/README | Pipeline Version | Date Updated | Documentation Author | Questions or Feedback | | :----: | :---: | :----: | :--------------: | -| [ExomeGermlineSingleSample_v3.1.17](https://github.com/broadinstitute/warp/releases?q=ExomeGermlineSingleSample_v3.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | +| [ExomeGermlineSingleSample_v3.1.17](https://github.com/broadinstitute/warp/releases?q=ExomeGermlineSingleSample_v3.0.0&expanded=true) | February, 2024 | Elizabeth Kiernan | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) | The Exome Germline Single Sample pipeline implements data pre-processing and initial variant calling according to the GATK Best Practices for germline SNP and Indel discovery in human exome sequencing data.