CUT&RUNTools requires Python 2.7, R 3.3, Java, Perl 5 and Slurm job submission environment. It is designed to run on a cluster set up.
Installation also requires GCC to compile some C-based source code.
The following tools should be already installed. Check the corresponding website to see how to install them if not. Special notes for Atactk and UCSC-tools, see below.
In the bracket is the version we have. CUT&RUNTools may work with a lower version of each tool.
- Trimmomatic (0.36) link
- Bowtie2 (2.2.9) link
- Samtools (1.3.1) link
- Picard (2.8.0) link
- MACS2 (2.1.1) link
- MEME (4.12.0) link
- Bedops (2.4.30) link
- Bedtools (2.26.0) link
- Atactk link- we provide special install instructions since we need to patch a source file.
- UCSC-tools link- we provide special install instructions.
Other tools already contained in CUT&RUNTools:
This is a python2 package that determines the enzyme cut frequency matrix. Originally for Tn5 transposase in ATAC-seq, the logic of the tool also applies to other digestions like CUT&RUN. However since original implementation is designed for ATAC-seq it needs to be patched in one small place in the code to make it suitable for CUT&RUN analysis. Otherwise, we estimate that the cut frequency is sometime off by 1 bp in its calculation.
We provide a patch for this problem. Install the patched version of the package by reading atactk.install.sh
. The patches make_cut_matrix.patch
and metrics.py.patch
. Then install by:
source atactk.install.sh
This will use pip to install atactk to the user's home directory (~/.local/bin).
We need two specific tools bedGraph2BigWig and fetchChromSizes, and provide a script ucsc-tools.install
to automatically download and install them from the UCSC genome browser:
source ucsc-tools.install
This will download the two executables in the current directory.
If you are a conda/bioconda user, you can already obtain the pre-requisite softwares above from the bioconda package repository. On a linux system, the anaconda environment might look like the following:
conda info -e
# conda environments:
#
base * /home/qzhu/anaconda2
bedGraphToBigWig /home/qzhu/anaconda2/envs/bedGraphToBigWig
bedops /home/qzhu/anaconda2/envs/bedops
bedtools /home/qzhu/anaconda2/envs/bedtools
bowtie2 /home/qzhu/anaconda2/envs/bowtie2
deeptools /home/qzhu/anaconda2/envs/deeptools
macs2 /home/qzhu/anaconda2/envs/macs2
samtools /home/qzhu/anaconda2/envs/samtools
trimmomatic /home/qzhu/anaconda2/envs/trimmomatic
You still need to follow the custom Atactk and UCSC-tools instructions above.
If you are using Ubuntu Linux 18.04 or above, obtaining and installing the pre-requisites should be in most cases pretty simple.
You can either use the apt-get install
command to install things like bedops
, bedtools
, bowtie2
. For other packages that are not in the Ubuntu package manager,
they can be installed by going to the official channel, downloading the source tar file, extracting it, and doing ./configure --prefix=/usr/local
followed by make && make install
.
On my system, which is the Harvard Medical School O2 Computing Cluster, these pre-requisites have been installed by the system admin (see module spider
on O2), so there is no need to worry about them
if you are a user from the HMS community.
The configuration file tells CutRunTools where to locate the prerequisite tools. This is a JSON file. A sample JSON file is provided below (config.json
). Filling in the information should be pretty easy: in most cases we need to provide the path to the bin
directory of each tool.
{
"Rscriptbin": "/n/app/R/3.3.3/bin",
"pythonbin": "/n/app/python/2.7.12/bin",
"perlbin": "/n/app/perl/5.24.0/bin",
"javabin": "/n/app/java/jdk-1.8u112/bin",
"trimmomaticbin": "/n/app/trimmomatic/0.36/bin",
"trimmomaticjarfile": "trimmomatic-0.36.jar",
"bowtie2bin": "/n/app/bowtie2/2.2.9/bin",
"samtoolsbin": "/n/app/samtools/1.3.1/bin",
"adapterpath": "/home/qz64/cutrun_pipeline/adapters",
"picardbin": "/n/app/picard/2.8.0/bin",
"picardjarfile": "picard-2.8.0.jar",
"macs2bin": "/n/app/macs2/2.1.1.20160309/bin",
"macs2pythonlib": "/n/app/macs2/2.1.1.20160309/lib/python2.7/site-packages",
"kseqbin": "/home/qz64/cutrun_pipeline",
"memebin": "/n/app/meme/4.12.0/bin",
"bedopsbin": "/n/app/bedops/2.4.30",
"bedtoolsbin": "/n/app/bedtools/2.26.0/bin",
"makecutmatrixbin": "/home/qz64/.local/bin",
"bt2idx": "/n/groups/shared_databases/bowtie2_indexes",
"genome_sequence": "/home/qz64/chrom.hg19/hg19.fa",
"extratoolsbin": "/home/qz64/cutrun_pipeline",
"extrasettings": "/home/qz64/cutrun_pipeline",
"input/output": {
"fastq_directory": "/n/scratch2/qz64/Nan_18_aug23/Nan_run_19",
"workdir": "/n/scratch2/qz64/workdir",
"fastq_sequence_length": 42,
"organism_build": "hg19"
},
"motif_finding": {
"num_bp_from_summit": 150,
"num_peaks": 5000,
"total_peaks": 15000,
"motif_scanning_pval": 0.0005,
"num_motifs": 20
},
"cluster": {
"email": "johndoe@gmail.com",
"step_alignment": {
"queue": "short",
"memory": 32000,
"time_limit": "0-12:00"
},
"step_process_bam": {
"queue": "short",
"memory": 32000,
"time_limit": "0-12:00"
},
"step_motif_find": {
"queue": "short",
"memory": 32000,
"time_limit": "0-12:00"
},
"step_footprinting": {
"queue": "short",
"memory": 32000,
"time_limit": "0-12:00"
}
}
}
Pay attention to the the first 20 lines of this file which concern the software installation. The rest is related to an actual analysis (explained in USAGE.md).
To check if the paths are correct and if the softwares in these paths indeed work:
./validate.py config.json --ignore-input-output --software
This could be due to that the versions of Python are inconsistent. The version of Python indicated in the field pythonbin
should be consistent with the version of Python used to install MACS2, and version used to install make_cut_matrix (makecutmatrixbin
).
To confirm this, on our system, we enter
head -n 1 /n/app/macs2/2.1.1.20160309/bin/macs2
#!/n/app/python/2.7.12/bin/python
head -n 1 /home/qz64/.local/bin/make_cut_matrix
#!/n/app/python/2.7.12/bin/python
Use /n/app/python/2.7.12/bin
for the pythonbin
field in config.json
.
Similarly for Perl (perlbin
) and meme-chip (memebin
). Version of Perl used to install meme should be used for perlbin
:
head -n 1 /home/qz64/meme/bin/meme-chip
#!/n/app/perl/5.24.0/bin/perl
Use /n/app/perl/5.24.0/bin
as value for perlbin
in config.json
.
Validate script may flag an error with MEME installation: XML/Simple.pm is not found.
You may encounter this error if the XML PERL module is installed to a custom directory (i.e. home rather than in /usr/
or /usr/local
).
If this is the case, the solution is to set up the Perl library environment variables.
On our system, the custom Perl modules are installed to /home/qz64/perl5-O2/lib
, so we enter:
PERL5LIB="/home/qz64/perl5-O2/lib/perl5":$PERL5LIB
PERL_LOCAL_LIB_ROOT="/home/qz64/perl5-O2":$PERL_LOCAL_LIB_ROOT
export PERL5LIB
export PERL_LOCAL_LIB_ROOT
Then try validate script again.
The genome sequence of a specific organism build (such as hg19, hg38) is required for motif finding. We provide a script assemblies.install
to download this automatically from NCBI. We specifically require repeat-masked version of genome sequence file. See assemblies.install
for more details.
source assemblies.install
In R, run:
install.packages("CENTIPEDE", repos="http://R-Forge.R-project.org")
We first install a special trimmer we wrote kseq
. This tool further trims the reads by 6 nt to get rid of the problem of possible adapter run-through. To install:
source make_kseq_test.sh
Once completed, CutRunTools is installed.
CutRunTools does not need to be installed to any special location such as /usr/bin, or /usr/local/bin.
It can be run directly from the directory containing the CutRunTools scripts.
The main program consists of create_scripts.py
, validate.py
, and a set of scripts in aligned.aug10
, and in macs2.narrow.aug18
which perform the important motif finding and footprinting analyses.
We suggest placing the CUT&RUNTools scripts to a more permanent location (like ~/cutruntools-scripts
or /usr/local/cutruntools-scripts
). For example:
#suppose this is where cutruntools scripts are downloaded to:
pwd
/home/qz64/Downloads/qzhudfci-cutruntools-1806e65e5b28
#copy it in a more permanent location like home directory
mkdir ~/cutruntools-scripts
cp -r /home/qz64/Downloads/qzhudfci-cutruntools-1806e65e5b28/* ~/cutruntools-scripts/.
See USAGE.md for details. Briefly, a user writes a config.json
configuration file for a new analysis. CutRunTools uses this to generate a set of slurm-based job-submission scripts, customized to the user's samples and environment. These job-submission scripts can be directly used by the user to start analyzing his/her Cut&Run samples.
#Example
#Use the config.json from ~/cutruntools-scripts as a template for a new analysis
cp ~/cutruntools-scripts/config.json config.jan4.json
#modify config.jan4.json to your samples and needs
~/cutruntools-scripts/create_scripts.py config.jan4.json
#the SLURM-based job submission scripts would now have been created. See USAGE.md how to continue.