-
Notifications
You must be signed in to change notification settings - Fork 0
Modules
The Quality Control (QC) module runs fastqc
on the input fastq
files. A quality report (in html
format) is generated in the aux
output subfolder. More details on how to interpret the report is available here.
The file generation module uncompresses the input fastq
files and generate both fastq and fasta files both with normal format and with one fragment per line.
The pattern filtering module runs scan_for_matches
to identify those reads in the library that contain the pattern specified in the patterns.tsv
file.
Following scan_for_matches
syntax, it is fairly simple to specify portions of the fragment of un/known length and sequence, allowing also for mismatches, insertion and/or deletions if needed. More details on the syntax, with examples, are available here.
The module also updates the main summary
output with information on the filter results and used patterns.
The alignment module starts by trimming the non-genomic part of the reads, and then performs single-/paired-ends alignment with either bwa
or bowtie2
(depending on user selection). Then, it uses samtools
to generate bam
and bai
files and to update the main summary
output with some general and preliminary information on the alignment. It finishes by adding the trimmed linkers back to the sam
file generated by the alignment.
The alignment filter module removes any secondary alignment, chimeras and R2 (in case of pair-end sequencing), unmapped reads, low-quality alignments and reads aligned to absent chromosomes (based on user choice). Finally, it resets the position of the alignments to the 5'of the + strand of the cutsite. Also, it updates the main summary
output with more punctual alignment information.
The UMI analysis module performs four operations:
- Group reads that fall exactly on the same genomic coordinate.
- Assign UMIs to the closest cutsite allowing a maximum distance, further reads are considered orphans and discarded. If no cusite list is provided this step is skipped.
- Removes UMI with low reading quality and performs a strict de-duplication.
- Generate bed files with number of de-duplicated UMIs per genomic location (i.e., cutsite).
Library complexity is estimated useing the preseq
package. Additional information on how to properly setup preseq
are available in INSTALL.md
.