We provide example data for a test run with three hashed samples of human PBMC cells, so that hashing deconvolution, GEX analysis and ADT analysis can be performed.
The raw test data comprises
- ADT FASTQ files
- GEX FASTQ files
All config files are ready to use in the testdata
- HashingFile with hashtag barcodes
- featureReferenceFile with all ADT barcodes
As an alternative we provide the count data that is generated on the example data described above, to allow skipping the resource-intensive cellranger count and CITE-Seq steps. Note that the test run requires approximately 12 GB of memory to complete successfully.
To start a quick test run
- Clone the gExcite git repository and go into the new directory
that will be referred to as "the gExcite working directory" in this documentation.
git clone https://github.com/ETH-NEXUS/gExcite_pipeline.git ;
cd gExcite_pipeline
- Unpack the test data matrices by running the test data preparation bash script in the gExcite working directory.
sh prepare_quick_testrun.sh
The directories results
and fastqs
, containing the raw count matrices, are now available in the working directory.
- Install Snakemake and mamba on your system
conda create -c bioconda -c conda-forge --name snakemake mamba snakemake ;
conda activate snakemake
- Do a dry-run to test the configuration
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --printshellcmds --dry-run --rerun-triggers mtime
NOTE: the parameter --rerun-triggers mtime
makes sure only changes to the input data trigger a rerun of the pipeline.
- Start the Snakemake workflow
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --printshellcmds --rerun-triggers mtime --cores 1
NOTE: if the pipeline should be run on a compute cluster using a job scheduling system (e.g. LSF, Slurm) the command needs to be adjusted accordingly. Please refer to the Snakemake documentation on cluster execution for platform-specific details.
To start a full test run that also includes the resource-intensive cellranger count and CITE-Seq steps:
- Clone the gExcite git repository and go into the new directory
that will be referred to as "the gExcite working directory" in this documentation.
git clone https://github.com/ETH-NEXUS/gExcite_pipeline.git ;
cd gExcite_pipeline
Download the FASTQ files archive and extract it with
unzip gexcite_testdata_fastqs.zip
Move the extracted directory (named "fastqs") into the gExcite working directory.
Install Snakemake and mamba on your system
conda create -c bioconda -c conda-forge --name snakemake mamba snakemake ;
conda activate snakemake
Install the Cellranger software. Follow the instructions on the 10xGenomics installation support page to install cellranger and to include the cellranger binary to your path. Download the cellranger references.
Insert the paths to the available cellranger software and reference directory into the config file
- In section [
], [reference_transcriptome
] needs to point to the location of the genomic reference used for the cellranger mapping - In sections [
] and [tools
] needs to point to the path to the cellranger installation
- In section [
Do a dry-run to test the configuration
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --printshellcmds --dry-run
- Start the Snakemake workflow
snakemake -s workflow/Snakefile --configfile config/config.yaml --use-conda --printshellcmds --cores 1
NOTE: if the pipeline should be run on a compute cluster using a job scheduling system (e.g. LSF, Slurm) the command needs to be adjusted accordingly. Please refer to the Snakemake documentation on cluster execution for platform-specific details.