Skip to content

Commit

Permalink
Km doc updates (#1206)
Browse files Browse the repository at this point in the history
* add azure JG link

* update exome docs

* update illumina docs

* updated imputation docs

* Update website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/README.md

Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com>

---------

Co-authored-by: ekiernan <55763654+ekiernan@users.noreply.github.com>
  • Loading branch information
kayleemathews and ekiernan authored Feb 16, 2024
1 parent 32bcffe commit 149e049
Show file tree
Hide file tree
Showing 5 changed files with 13 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
sidebar_position: 2
---

# Exome Germline Single Sample v3.0.0 Methods
# Exome Germline Single Sample v3.1.17 Methods

The following contains a detailed methods description outlining the pipeline’s process, software, and tools that can be modified for a publication methods section.

## Detailed Methods

Preprocessing and variant calling was performed using the ExomeGermlineSingleSample 3.0.0 pipeline using Picard 2.23.8, GATK 4.2.2.0, and Samtools 1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)).
Preprocessing and variant calling was performed using the ExomeGermlineSingleSample 3.1.17 pipeline using Picard 2.26.10, GATK 4.5.0.0, and Samtools 1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)).

### Pre-processing and QC

Expand All @@ -31,4 +31,5 @@ Prior to variant calling, the variant calling interval list was split to enable
The pipeline’s final outputs included metrics, the ValidateSamFile validation reports, an aligned CRAM with index, and a reblocked GVCF containing variant calls with an accompanying index.

## Previous methods documents
- [ExomeGermlineSingleSample_v3.0.0](https://github.com/broadinstitute/warp/blob/ExomeGermlineSingleSample_v3.0.0/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md)
- [ExomeGermlineSingleSample_v2.4.4](https://github.com/broadinstitute/warp/blob/ExomeGermlineSingleSample_v2.6.0/website/docs/Pipelines/Exome_Germline_Single_Sample_Pipeline/exome.methods.md)
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar_position: 2

# VCF Overview: Illumina Genotyping Array

The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.11.0 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline.
The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.12.15 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline.

This document describes the Array pipeline’s unique VCF fields and attributes that are not listed in the standard VCF specification. To learn more about the pipeline, see the [Illumina Genotyping Array Pipeline Overview](./README.md).

Expand All @@ -26,7 +26,7 @@ Each VCF has meta information fields with attributes that generally describe the
- extendedIlluminaManifestVersion - Version of the ‘extended Illumina manifest’ used by the VCF - generation software.
- extendedManifestFile - File name of the ‘extended Illumina manifest’ used by the VCF generation software
- fingerprintGender - Gender (sex) determined using an orthogonal fingerprinting technology, populated by an optional parameter used by the VCF generation software
- gtcCallRate - GTC call rate of the sample processed that is generated by the autocall/gencall software and represents the fraction of callable loci that had valid calls
- gtcCallRate - GTC call rate of the sample processed that is generated by the autocall/gencall software and represents the fraction of callable loci that had valid calls; ignores zeroed-out SNPs
- imagingDate - Creation date for the chip well barcode IDATs (raw image scans)
- manifestFile - Name of the Illumina manifest (.bpm) file used by the VCF generation software
- sampleAlias - Sample name
Expand Down Expand Up @@ -112,4 +112,4 @@ The remaining attributes describe the cluster definitions provided in the cluste
- meanX_BB - Mean of normalized X for BB cluster
- meanY_AA - Mean of normalized Y for AA cluster
- meanY_AB - Mean of normalized Y for AB cluster
- meanY_BB - Mean of normalized Y for BB cluster
- meanY_BB - Mean of normalized Y for BB cluster
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ slug: /Pipelines/Illumina_Genotyping_Arrays_Pipeline/README

| Pipeline Version | Date Updated | Documentation Author | Questions or Feedback |
| :----: | :---: | :----: | :--------------: |
| [Version 1.11.6](https://github.com/broadinstitute/warp/releases) | October, 2021 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) |
| [Version 1.12.15](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) |

![The Illumina Genotyping Array Pipeline](./IlluminaGenotyping.png)

Expand Down Expand Up @@ -121,7 +121,7 @@ The following table provides a summary of the WDL tasks and software tools calle
| SubsetArrayVCF | [SubsetArrayVCF](https://gatk.broadinstitute.org/hc/en-us/articles/360036362532) | GATK |
| CollectArraysVariantCallingMetrics | [CollectArraysVariantCallingMetrics](https://gatk.broadinstitute.org/hc/en-us/articles/360037593871) | Picard |
| SelectVariants | [SelectVariants](https://gatk.broadinstitute.org/hc/en-us/articles/360036362532) | GATK |
| CheckFingerprint | [CheckFingerprint](https://gatk.broadinstitute.org/hc/en-us/articles/360036358752) | Picard |
| CheckFingerprintTask | [CheckFingerprint](https://gatk.broadinstitute.org/hc/en-us/articles/360036358752) | Picard |
| VcfToIntervalList | [VcfToIntervalList](https://gatk.broadinstitute.org/hc/en-us/articles/360036897672) | Picard |
| GenotypeConcordance | [GenotypeConcordance](https://gatk.broadinstitute.org/hc/en-us/articles/360036348932) | Picard |

Expand Down Expand Up @@ -176,7 +176,7 @@ DNA fingerprinting helps maintain sample identity and avoid sample swaps. The Il

#### 6. Evaluating an existing fingerprint (optional)

If the genotyping sample already has a corresponding fingerprint VCF file, the workflow can also optionally check the existing fingerprint to confirm sample identity. It uses the CheckFingerPrints task to calculate genotype concordance between the workflow’s genotyping output VCF (final_output_vcf) and the known genotype specified in a fingerprint_genotypes_vcf_file. The workflow returns a boolean for if the sample genotype failed concordance, as well as a Logarithm of Odds (LOD) score for concordance.
If the genotyping sample already has a corresponding fingerprint VCF file, the workflow can also optionally check the existing fingerprint to confirm sample identity. It uses the CheckFingerprintTask task to calculate genotype concordance between the workflow’s genotyping output VCF (final_output_vcf) and the known genotype specified in a fingerprint_genotypes_vcf_file. The workflow returns a boolean for if the sample genotype failed concordance, as well as a Logarithm of Odds (LOD) score for concordance.

#### 7. Genotype concordance (optional)

Expand Down
4 changes: 2 additions & 2 deletions website/docs/Pipelines/Imputation_Pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ slug: /Pipelines/Imputation_Pipeline/README

| Pipeline Version | Date Updated | Documentation Author | Questions or Feedback |
| :----: | :---: | :----: | :--------------: |
| [Imputation_v1.0.0](https://github.com/broadinstitute/warp/releases?q=Imputation_v1.0.0&expanded=true) | August, 2021 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) |
| [Imputation_v1.1.11](https://github.com/broadinstitute/warp/releases?q=Imputation_v1.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) |

## Introduction to the Imputation pipeline
The Imputation pipeline imputes missing genotypes from either a multi-sample VCF or an array of single sample VCFs using a large genomic reference panel. It is based on the [Michigan Imputation Server pipeline](https://imputationserver.readthedocs.io/en/latest/pipeline/). Overall, the pipeline filters, phases, and performs imputation on a multi-sample VCF. It outputs the imputed VCF along with key imputation metrics.
Expand Down Expand Up @@ -54,7 +54,7 @@ For examples of how to specify each input in a configuration file, as well as cl
| genetics_maps_eagle | Genetic map file for phasing.| File |
| output_callset_name | Output callset name. | String |
| split_output_to_single_sample | Boolean to split out the final combined VCF to individual sample VCFs; set to false by default. | Boolean |
| merge_ssvcf_mem_gb | Memory allocation for MergeSingleSampleVcfs (in GB). | Int |
| merge_ssvcf_mem_mb | Optional integer specifying memory allocation for MergeSingleSampleVcfs (in MB); default is 3000. | Int |
| frac_well_imputed_threshold | Threshold for the fraction of well-imputed sites; default set to 0.9. | Float |
| chunks_fail_threshold | Maximum threshold for the number of chunks allowed to fail; default set to 1. | Float |
| vcf_suffix | File extension used for the VCF in the reference panel. | String |
Expand Down
2 changes: 2 additions & 0 deletions website/docs/Pipelines/JointGenotyping/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ The pipeline can be configured to run using one of the following GATK variant fi

The pipeline takes in a sample map file listing GVCF files produced by HaplotypeCaller in GVCF mode and produces a filtered VCF file (with index) containing genotypes for all samples present in the input VCF files. All sites that are present in the input VCF file are retained. Filtered sites are annotated as such in the FILTER field. If you are new to VCF files, see the [file type specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf).

The JointGenotyping pipeline can be adapted to run on Microsoft Azure instead of Google Cloud. For more information, see the [azure-warp-joint-calling GitHub repository](https://github.com/broadinstitute/azure-warp-joint-calling).

## Set-up

### JointGenotyping Installation and Requirements
Expand Down

0 comments on commit 149e049

Please sign in to comment.