From 67760deb4f059e5cd573f45a2726f87f4f14e989 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:28:21 -0600 Subject: [PATCH 01/14] update giltab link --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 3e95001..7316710 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,7 @@ !!! Note In the past life, QuaC repo used to live at [UAB - Gitlab](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/quac). It was + Gitlab](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/public/quac). It was migrated to Github in Jan 2023, and the Gitlab version has been archived. From 41d905833344d405db30e9b5054c2f20185ce2cb Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:28:43 -0600 Subject: [PATCH 02/14] decouples readme from readthedocs --- docs/index.md | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 77 insertions(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 563ed56..e369e46 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1 +1,77 @@ -{!README.md!} +# QuaC + +🦆🦆 Don't duck that QC thingy 🦆🦆 + + +!!! Note + + In the past life, QuaC repo used to live at [UAB + Gitlab](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/public/quac). It was + migrated to Github in Jan 2023, and the Gitlab version has been archived. + + +## What is QuaC? + +QuaC is a snakemake-based pipeline that runs several QC tools for WGS/WES samples and then summarizes their results +using pre-defined, configurable QC thresholds. + +In summary, QuaC performs the following: + +- Runs several QC tools using `BAM` and `VCF` files as input. At our center CGDS, these files are produced as part of + the [small variant caller + pipeline](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/small_variant_caller_pipeline). +- Using [QuaC-Watch](./quac_watch.md) tool, it performs QC checkup based on the expected thresholds for certain QC metrics and summarizes + the results for easier human consumption +- Aggregates QC output as well as QuaC-Watch output using MulitQC, both at the sample level and project level. +- Optionally, above mentioned QuaC-Watch and QC aggregation steps can accept pre-run results from few QC tools (fastqc, + fastq-screen, picard's markduplicates) when run with flag `--include_prior_qc`. + + +!!! note "CGDS users only" + + * At CGDS, BAM and VCF files produced by the + [small variant caller pipeline](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/small_variant_caller_pipeline) + are used as input to QuaC. + * Tools fastqc, fastq-screen, and picard's markduplicates, whose output are accepted by QuaC when used with + flag `--include_prior_qc`, are produced by this small_variant_caller_pipeline. + +!!! info + + QuaC is built to use with Human WGS/WES data. If you would like to use it with non-human data, please modify the pipeline as needed -- especially the thresholds used in QuaC-Watch configs. + + +## QC tools + +### Tools run by QuaC + +QuaC quacks using the tools listed below: + +| Tool | Use | QC Type | +| -------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | ---------------------------------------- | +| [Qualimap](http://qualimap.conesalab.org/) | Summarizes several alignment metrics using BAM file | BAM quality | +| [Picard-CollectMultipleMetrics](https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics) | Summarizes alignment metrics from BAM file using several modules | BAM quality | +| [Picard-CollectWgsMetrics](https://broadinstitute.github.io/picard/command-line-overview.html#CollectWgsMetrics) | Collects metrics about coverage and performance using BAM file | BAM quality | +| [mosdepth](https://github.com/brentp/mosdepth) | Fast alignment depth calculation using BAM file | BAM quality | +| [indexcov](https://github.com/brentp/goleft/tree/master/indexcov) | Estimate coverage from BAM index for GS
(*Skipped in exome mode*) | BAM quality | +| [covviz](https://github.com/brwnj/covviz) | Identifies large, coverage-based anomalies for GS using Indexcov output
(*Skipped in exome mode*) | BAM quality | +| [bcftools stats](https://samtools.github.io/bcftools/bcftools.html#stats) | Summarizes VCF file stats | VCF quality | +| [verifybamid](https://github.com/Griffan/VerifyBamID) | Estimates within-species (i.e., cross-sample) contamination using BAM file | Within-species contamination | +| [somalier](https://github.com/brentp/somalier) | Estimation of sex, ancestry and relatedness using BAM file | Sex, ancestry and relatedness estimation | + + +### Optional QC output consumed by QuaC + +Optionally QuaC can also utilize QC results produced by the tools listed below when run with flag `--include_prior_qc`. + + +| Tool | Use | QC Type | +| ------------------------------------------------------------------------------------------------------------ | ------------------------------------------------- | ------------- | +| [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) | Performs QC on raw sequence reads data (FASTQ) | FASTQ quality | +| [FastQ Screen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/) | Screens FASTQ for other-species contamination | FASTQ quality | +| [Picard's MarkDuplicates](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates) | Determines level of read duplication on BAM files | BAM quality | + + +!!! note "CGDS users only" + + * At CGDS, these optional tools were run by our small_variant_caller_pipeline. + From 35e8e214af50254e80253ed0aa82bbb2fd3f45e4 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:30:25 -0600 Subject: [PATCH 03/14] shrink readme to the essentials --- README.md | 57 +++++-------------------------------------------------- 1 file changed, 5 insertions(+), 52 deletions(-) diff --git a/README.md b/README.md index 7316710..026cd56 100644 --- a/README.md +++ b/README.md @@ -3,11 +3,9 @@ 🦆🦆 Don't duck that QC thingy 🦆🦆 -!!! Note - - In the past life, QuaC repo used to live at [UAB - Gitlab](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/public/quac). It was - migrated to Github in Jan 2023, and the Gitlab version has been archived. +> **_NOTE:_** In the past life, QuaC repo used to live at [UAB +> Gitlab](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/public/quac). It was migrated to +> Github in Jan 2023, and the Gitlab version has been archived. ## What is QuaC? @@ -27,53 +25,8 @@ In summary, QuaC performs the following: fastq-screen, picard's markduplicates) when run with flag `--include_prior_qc`. -!!! note "CGDS users only" - - * At CGDS, BAM and VCF files produced by the - [small variant caller pipeline](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/small_variant_caller_pipeline) - are used as input to QuaC. - * Tools fastqc, fastq-screen, and picard's markduplicates, whose output are accepted by QuaC when used with - flag `--include_prior_qc`, are produced by this small_variant_caller_pipeline. - -!!! info - - QuaC is built to use with Human WGS/WES data. If you would like to use it with non-human data, please modify the pipeline as needed -- especially the thresholds used in QuaC-Watch configs. - - -## QC tools - -### Tools run by QuaC - -QuaC quacks using the tools listed below: - -| Tool | Use | QC Type | -| -------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | ---------------------------------------- | -| [Qualimap](http://qualimap.conesalab.org/) | Summarizes several alignment metrics using BAM file | BAM quality | -| [Picard-CollectMultipleMetrics](https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics) | Summarizes alignment metrics from BAM file using several modules | BAM quality | -| [Picard-CollectWgsMetrics](https://broadinstitute.github.io/picard/command-line-overview.html#CollectWgsMetrics) | Collects metrics about coverage and performance using BAM file | BAM quality | -| [mosdepth](https://github.com/brentp/mosdepth) | Fast alignment depth calculation using BAM file | BAM quality | -| [indexcov](https://github.com/brentp/goleft/tree/master/indexcov) | Estimate coverage from BAM index for GS
(*Skipped in exome mode*) | BAM quality | -| [covviz](https://github.com/brwnj/covviz) | Identifies large, coverage-based anomalies for GS using Indexcov output
(*Skipped in exome mode*) | BAM quality | -| [bcftools stats](https://samtools.github.io/bcftools/bcftools.html#stats) | Summarizes VCF file stats | VCF quality | -| [verifybamid](https://github.com/Griffan/VerifyBamID) | Estimates within-species (i.e., cross-sample) contamination using BAM file | Within-species contamination | -| [somalier](https://github.com/brentp/somalier) | Estimation of sex, ancestry and relatedness using BAM file | Sex, ancestry and relatedness estimation | - - -### Optional QC output consumed by QuaC - -Optionally QuaC can also utilize QC results produced by the tools listed below when run with flag `--include_prior_qc`. - - -| Tool | Use | QC Type | -| ------------------------------------------------------------------------------------------------------------ | ------------------------------------------------- | ------------- | -| [fastqc](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) | Performs QC on raw sequence reads data (FASTQ) | FASTQ quality | -| [FastQ Screen](https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen/) | Screens FASTQ for other-species contamination | FASTQ quality | -| [Picard's MarkDuplicates](https://broadinstitute.github.io/picard/command-line-overview.html#MarkDuplicates) | Determines level of read duplication on BAM files | BAM quality | - - -!!! note "CGDS users only" - - * At CGDS, these optional tools were run by our small_variant_caller_pipeline. +> **_NOTE:_** QuaC is built to use with Human WGS/WES data. If you would like to use it with non-human data, please +> modify the pipeline as needed -- especially the thresholds used in QuaC-Watch configs. ## Documentation From 0b112e62956724ef0105997f036f4e3f3b01e858 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:33:42 -0600 Subject: [PATCH 04/14] updates broken link --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 026cd56..261518e 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ In summary, QuaC performs the following: - Runs several QC tools using `BAM` and `VCF` files as input. At our center CGDS, these files are produced as part of the [small variant caller pipeline](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/sciops/pipelines/small_variant_caller_pipeline). -- Using [QuaC-Watch](./quac_watch.md) tool, it performs QC checkup based on the expected thresholds for certain QC metrics and summarizes +- Using [QuaC-Watch](./docs/quac_watch.md) tool, it performs QC checkup based on the expected thresholds for certain QC metrics and summarizes the results for easier human consumption - Aggregates QC output as well as QuaC-Watch output using MulitQC, both at the sample level and project level. - Optionally, above mentioned QuaC-Watch and QC aggregation steps can accept pre-run results from few QC tools (fastqc, From 3c665d33d07488e22998bb16ae7a3b03c9bd171e Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:34:13 -0600 Subject: [PATCH 05/14] ignores url check as valid url still results in error 403 --- docs/input_output.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/input_output.md b/docs/input_output.md index 3b9223d..ff26506 100644 --- a/docs/input_output.md +++ b/docs/input_output.md @@ -2,10 +2,13 @@ ## Input + + Samples belonging to a project are provided as input via `--pedigree` to QuaC in [pedigree file format](https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format). Only the samples that are supplied in pedigree file will be processed by QuaC and all of these samples must belong to the same project. + !!! note "CGDS users only" From 7f0f1a03e618a053d21ba45e0afb757915d1ad83 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:38:24 -0600 Subject: [PATCH 06/14] adds license to readme --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 261518e..8b08286 100644 --- a/README.md +++ b/README.md @@ -38,3 +38,6 @@ Full documentation, including installation and how to run QuaC, is available at * **Mana**valan Gajapathy +## License + +[GNU GPLv3](./LICENSE) \ No newline at end of file From 39744df527637e5792ba5290672edfd043018ef7 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:42:10 -0600 Subject: [PATCH 07/14] adds contributing to readme --- README.md | 8 +++++++- docs/CONTRIBUTING.md | 2 +- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 8b08286..86e4767 100644 --- a/README.md +++ b/README.md @@ -38,6 +38,12 @@ Full documentation, including installation and how to run QuaC, is available at * **Mana**valan Gajapathy + ## License -[GNU GPLv3](./LICENSE) \ No newline at end of file +[GNU GPLv3](./LICENSE) + + +## Contributing + +See [here](./docs/CONTRIBUTING.md) contributing guidelines. diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index 7d94e2c..64351ad 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -1,6 +1,6 @@ # Contributing Guidelines -:grin: :tada: Thank you for taking the time to contribute! :grin: :tada: +😁 🎉 Thank you for taking the time to contribute! 😁 🎉 The following is a set of guidelines for contributing to QuaC. From 4466ab91fe982324026877121e7af6e7244c35fd Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:44:52 -0600 Subject: [PATCH 08/14] adds snakemake badge --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 86e4767..1a98346 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,5 @@ +[![Snakemake](https://img.shields.io/badge/snakemake-≥5.6.0-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io) + # QuaC 🦆🦆 Don't duck that QC thingy 🦆🦆 From ff6b998394a73e80404201a21ad717c196e3c674 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:46:24 -0600 Subject: [PATCH 09/14] updates snakemake ver --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 1a98346..0d82315 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -[![Snakemake](https://img.shields.io/badge/snakemake-≥5.6.0-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io) +[![Snakemake](https://img.shields.io/badge/snakemake-6.0.5-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io) # QuaC From 8e789884dd718614a143e44385247a38617aedf0 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:53:16 -0600 Subject: [PATCH 10/14] adds badge to readme --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 0d82315..59bf115 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,6 @@ [![Snakemake](https://img.shields.io/badge/snakemake-6.0.5-brightgreen.svg?style=flat)](https://snakemake.readthedocs.io) +[![ReadTheDocs](https://readthedocs.org/projects/quac/badge/?version=latest)](https://quac.readthedocs.io/en/stable/) + # QuaC From 5401f7bb6c9c4b0deec218328f430f940b81d67a Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 16:56:24 -0600 Subject: [PATCH 11/14] typo fix --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 59bf115..ffe9bf7 100644 --- a/README.md +++ b/README.md @@ -50,4 +50,4 @@ Full documentation, including installation and how to run QuaC, is available at ## Contributing -See [here](./docs/CONTRIBUTING.md) contributing guidelines. +See [here](./docs/CONTRIBUTING.md) for contributing guidelines. From f7bf098f891da67bc92e125f43db79e493a6c076 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 17:00:42 -0600 Subject: [PATCH 12/14] updates changelog --- README.md | 5 +++++ docs/Changelog.md | 5 +++++ 2 files changed, 10 insertions(+) diff --git a/README.md b/README.md index ffe9bf7..5a8db73 100644 --- a/README.md +++ b/README.md @@ -51,3 +51,8 @@ Full documentation, including installation and how to run QuaC, is available at ## Contributing See [here](./docs/CONTRIBUTING.md) for contributing guidelines. + + +## Changelog + +See [here](./docs/Changelog.md) \ No newline at end of file diff --git a/docs/Changelog.md b/docs/Changelog.md index 2924715..e3635bf 100644 --- a/docs/Changelog.md +++ b/docs/Changelog.md @@ -12,6 +12,11 @@ YYYY-MM-DD John Doe ``` --- +2023-03-01 Manavalan Gajapathy + +* Decouples readme.md from readthedocs setup + + 2023-02-28 Manavalan Gajapathy * Adds license From 2ec2ab53fd4adfc9b77873c770dd66c1368f0887 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 17:14:34 -0600 Subject: [PATCH 13/14] Update docs/index.md Co-authored-by: Brandon M Wilk --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index e369e46..dd16bbe 100644 --- a/docs/index.md +++ b/docs/index.md @@ -5,7 +5,7 @@ !!! Note - In the past life, QuaC repo used to live at [UAB + In a past life, QuaC used a different remote Git management provider, [UAB Gitlab](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/public/quac). It was migrated to Github in Jan 2023, and the Gitlab version has been archived. From 1dd17571e1cfd307ca78a519e196da339720d1c9 Mon Sep 17 00:00:00 2001 From: Manavalan Gajapathy Date: Wed, 1 Mar 2023 17:14:59 -0600 Subject: [PATCH 14/14] Update README.md Co-authored-by: Brandon M Wilk --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5a8db73..091bd14 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ 🦆🦆 Don't duck that QC thingy 🦆🦆 -> **_NOTE:_** In the past life, QuaC repo used to live at [UAB +> **_NOTE:_** In a past life, QuaC used a different remote Git management provider, [UAB > Gitlab](https://gitlab.rc.uab.edu/center-for-computational-genomics-and-data-science/public/quac). It was migrated to > Github in Jan 2023, and the Gitlab version has been archived.