Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help me debug step 3 in HaploFill #8

Open
gluspaula opened this issue Nov 8, 2024 · 3 comments
Open

Help me debug step 3 in HaploFill #8

gluspaula opened this issue Nov 8, 2024 · 3 comments
Assignees

Comments

@gluspaula
Copy link

gluspaula commented Nov 8, 2024

Hi,

I need help with troubleshooting HaploFill step 3.
I edited my original post for simplicity because I think I solved my own problem. Nevertheless, for the repeats.bed file, should i include repetitive regions of all .fasta files; hap1, hap2, and Unplaced?

@noecochetel
Copy link
Collaborator

Hi,

Sorry for the late answer. Could you please indicate me the file mentioning this HaploFill step 3?
HaploFill usually takes the repeat annotation from hap1 and hap2 concatenated into a single file.

@noecochetel noecochetel self-assigned this Dec 12, 2024
@gluspaula
Copy link
Author

Hi, thank you for your response. I figure that part out, but I have a new part I need help troubleshooting.
It's with coverage for HaploFill. I'm having trouble with the ERROR Some sequences failed the coverage information check. I have tried several of the usage combos. -C -s; -C --b1 --b2; and --C1 --C2. I've been trying to troubleshoot the first sequence(contig) that generates the error and the sequence has coverage information for every single base. When I ran -C -s, i was able to see that for this particular sequence, the .bam files don't get generated. I generated the .bam file for this sequence by itself and I get coverage for all bases, plus the bedtools coverage text file with the information as well. One thing i found out is that there are several soft and hard clips in this sequence, so I don't know if this has a role, and I don't know if that is the same or different for the sequences that worked.
Right now I am regenerating the coverage .bam and .txt files using the same minimap, samtools, and bedtools settings that HaploFill.py uses; although, i think this will still not work because I get the same error when i choose -C -s.
Do you have any comments that may help me troubleshoot this step.

For background, the genome assembly.fasta was generated using Flye --keep-haplotypes --meta with ONT reads of a mean qc 18 and min read length of 10k. Final mean coverage in the flye.log was 46x. I used this fasta file for HaploSplit, then I used HaploBreak to break the Ns. Now, I'm inputting in HaploFill.py the following:
python $HaploFill_path
-1 $HAP1
-2 $HAP2
-U $UNPLACED
-c $CORRESPONDENCE
-r $REPEATS
--C1 $COV_HAP1
--C2 $COV_HAP2
-o $OUTPUT_DIR
-t $TEMP_FILE
--map_threads $THREADS
--sequencing_technology ONT

I appreciate any help you can give :)

@noecochetel
Copy link
Collaborator

I usually run HaploFill with the following command:

python HaploFill.py -1 ${hap1}.fasta -2 ${hap2}.fasta -U ${unplaced}.fasta -c correspondance.txt --exclusion exclusion.txt -r ${repeats}.gff3 --repeats_format GFF3 -o ${haplofill_run} -C --b1 illumina.on.${hap1}.sorted.bam --b2 illumina.on.${hap2}.sorted.bam > ${haplofill_run}.log 2> ${haplofill_run}.err

Here is an example on how to get the paired-end illumina alignments illumina.on.${hapx}.sorted.bam:

bwa index ${hapx}.fasta 
bwa mem -t 24 ${hapx}.fasta ${reads_1}.fq.gz ${reads_2}.fq.gz | samtools view -bS -T ${hapx}.fasta - | samtools sort -l 9 -@ 24 -m 1500M -o illumina.on.${hapx}.sorted.bam
samtools index illumina.on.${hapx}.sorted.bam

Hope it helps. Please post the content of the error message if you encounter any issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants