We provide example data for a test run with three hashed samples of human PBMC cells, so that hashing deconvolution, GEX analysis and ADT analysis can be performed.
The raw test data comprises
- ADT FASTQ files
- GEX FASTQ files
All config files are ready to use in the testdata
subdirectory.
- HashingFile with hashtag barcodes
- featureReferenceFile with all ADT barcodes
As an alternative we provide the count data that is generated on the example data described above, to allow skipping the resource-intensive cellranger count and CITE-Seq steps. Note that the test run requires approximately 12 GB of memory to complete successfully.
To start a quick test run
- Clone the gExcite git repository and go into the new directory
gExcite_pipeline
that will be referred to as "the gExcite working directory" in this documentation.
git clone https://github.com/ETH-NEXUS/gExcite_pipeline.git ;
cd gExcite_pipeline
- Unpack the test data matrices by running the test data preparation bash script in the gExcite working directory.
sh prepare_quick_testrun.sh
The directories results
and fastqs
, containing the raw count matrices, are now available in the working directory.
- Install Snakemake and mamba on your system
conda create -c bioconda -c conda-forge --name snakemake mamba snakemake ;
conda activate snakemake
- Do a dry-run to test the configuration
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --printshellcmds --dry-run --rerun-triggers mtime
NOTE: the parameter --rerun-triggers mtime
makes sure only changes to the input data trigger a rerun of the pipeline.
- Start the Snakemake workflow
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --printshellcmds --rerun-triggers mtime --cores 1
NOTE: if the pipeline should be run on a compute cluster using a job scheduling system (e.g. LSF, Slurm) the command needs to be adjusted accordingly. Please refer to the Snakemake documentation on cluster execution for platform-specific details.
To start a full test run that also includes the resource-intensive cellranger count and CITE-Seq steps:
- Clone the gExcite git repository and go into the new directory
gExcite_pipeline
that will be referred to as "the gExcite working directory" in this documentation.
git clone https://github.com/ETH-NEXUS/gExcite_pipeline.git ;
cd gExcite_pipeline
-
Download the FASTQ files archive and extract it with
unzip gexcite_testdata_fastqs.zip
-
Move the extracted directory (named "fastqs") into the gExcite working directory.
-
Install Snakemake and mamba on your system
conda create -c bioconda -c conda-forge --name snakemake mamba snakemake ;
conda activate snakemake
-
Install the Cellranger software. Follow the instructions on the 10xGenomics installation support page to install cellranger and to include the cellranger binary to your path. Download the cellranger references.
-
Insert the paths to the available cellranger software and reference directory into the config file
config/config.yaml
- In section [
resources
], [reference_transcriptome
] needs to point to the location of the genomic reference used for the cellranger mapping - In sections [
tools
][cellranger_count_gex
] and [tools
][cellranger_count_adt
]:
[call
] needs to point to the path to the cellranger installation
- In section [
-
Do a dry-run to test the configuration
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --printshellcmds --dry-run
- Start the Snakemake workflow
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --printshellcmds --cores 1
NOTE: if the pipeline should be run on a compute cluster using a job scheduling system (e.g. LSF, Slurm) the command needs to be adjusted accordingly. Please refer to the Snakemake documentation on cluster execution for platform-specific details.