NGM: Header does not match the data #50

jhcaddisfly · 2019-10-08T13:14:00Z

Hello,
I really need help!

I mapped paired-end short reads to a fasta file using ngm and piped the output to samtools.
I used samtools sort to generate a sorted bamfile which I want to index with samtools index.

This is the command I used:
ngm -r HK1_racon1_nanopolished_genome.fa -1 cHK1_1.fq -2 cHK1_2.fq -t64 --max-read-length 150 --affine | samtools sort -l 9 -@ 10 -O BAM -o HK1.paired.sort.bam && samtools index HK1.paired.sort.bam

I got the following error message from samtools:
[bam_sort_core] merging from 310 files and 10 in-memory blocks...
[E::hts_idx_push] Region 589897413..589897563 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
samtools index: failed to create index for "HK1.paired.sort.bam": Numerical result out of range

It seems that one or more of the reads fall outside of the header reference. The maximum length of any reference sequence is 35803611, while the read that triggered the error starts at 589897413. It looks like the longest chromosome in the SQ headers is ~36Mb while the alignments are out to ~590Mb. This would appear to indicate the header does not match the data and therefore the indexing is not working. So, the issue might be with the ngm alignment.
Do you have any tipps?

fritzsedlazeck · 2019-10-08T13:26:07Z

Your command cannot work like this. You are piping Sam output into the sort command.

I would do it stepwise to see if the mapping is a problem or the samtools command etc.

Thanks
Fritz

daviesrob · 2019-10-08T15:32:58Z

samtools sort works fine with SAM input, and has done for quite a long time now.
Related samtools issue is samtools/samtools#1118

fritzsedlazeck · 2019-10-08T19:05:27Z

could you send me the sam file with just the header and the read?
This is very strange... never saw something like this.
Thanks
Fritz

jhcaddisfly · 2019-10-09T09:31:56Z

Thanks for your answer! How can I send it? I can't attach .sam here.

fritzsedlazeck · 2019-10-09T09:47:40Z

true sorry. Please send it to me : fritz.sedlazeck@gmail.com
This should just be a small file. I want to see if there is something going on with the naming of the sequences or so..
Thanks
Fritz

jhcaddisfly · 2019-10-09T12:12:08Z

How can I extract just the header and the read?

daviesrob · 2019-10-09T13:04:04Z

samtools view -H HK1.paired.sort.bam > badread.header will extract the header.
samtools view HK1.paired.sort.bam | grep 589897414 > badread.body should get the read.
cat badread.header badread.body > badread.sam will make a SAM file with the dubious read in it.
It's possible to attach the result to this issue either by compressing it with zip and attaching the .zip file, or (if it's small) by changing the suffix to .txt.

fritzsedlazeck · 2019-10-09T13:18:11Z

samtools view -H HK1.paired.sort.bam > my.sam
samtools view HK1.paired.sort.bam | grep 589897414 >> my.sam

yeah feel free to just post / upload a txt file.
Thanks
Fritz

jhcaddisfly · 2019-10-09T14:30:17Z

Thanks for your answers.
Please find attached the text file with the read and sam header.
my.txt

I also tried to run the commands seperately, but got the same error message.
ngm -r HK1_racon1_nanopolished_genome.fa -1 cHK1_1.fq -2 cHK1_2.fq -o PO2.paired.sam -t64 --max-read-length 150 --affine > mappingpaired.log 2> mappingpaired.err && samtools view -@ 10 -b PO2.paired.sam > PO2.paired.bam && samtools sort -@60 -T PO2.paired -o PO2.sortpaired.bam pilon1_PO2.paired.bam && samtools index PO2.sortpaired.bam

Thanks,

Jacqueline

bioteksampath · 2022-06-13T13:07:10Z

Hi @jhcaddisfly @fritzsedlazeck
wondering, does this issue have been resolved?
I'm getting the same issue.

snf2) [sap223@cnic-base-lgn-19001 wheat_SV]$ samtools index stanley_4hifi.fa.gz.s.bam stanley_4hifi.fa.gz.s.bai
[E::hts_idx_push] Region 536856213..536872358 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
samtools index: failed to create index for "stanley_4hifi.fa.gz.s.bam": Numerical result out of range

`snf2) [sap223@cnic-base-lgn-19001 wheat_SV]$ samtools view -H stanley_4hifi.fa.gz.s.bam `
@HD	VN:1.0	SO:coordinate
@SQ	SN:chr1A	LN:598499322
@SQ	SN:chr1B	LN:710550373
@SQ	SN:chr1D	LN:503069673
@SQ	SN:chr2A	LN:802606345
@SQ	SN:chr2B	LN:804701322
@SQ	SN:chr2D	LN:660890328
@SQ	SN:chr3A	LN:760518506
@SQ	SN:chr3B	LN:853810196
@SQ	SN:chr3D	LN:630146416
@SQ	SN:chr4A	LN:763531415
@SQ	SN:chr4B	LN:705137937
@SQ	SN:chr4D	LN:519741802
@SQ	SN:chr5A	LN:717196012
@SQ	SN:chr5B	LN:726179364
@SQ	SN:chr5D	LN:583791869
@SQ	SN:chr6A	LN:626232364
@SQ	SN:chr6B	LN:725914243
@SQ	SN:chr6D	LN:500231259
@SQ	SN:chr7A	LN:750217655
@SQ	SN:chr7B	LN:773236065
@SQ	SN:chr7D	LN:657874758
@SQ	SN:chrUn	LN:203823449
@PG	ID:ngmlr	PN:nextgenmap-lr	VN:0.2.7	CL:ngmlr -t 32 -r Landmark_v13.fasta -q stanley_4hifi.fa.gz -o stanley_4hifi.fa.gz.sam -x ont

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NGM: Header does not match the data #50

NGM: Header does not match the data #50

jhcaddisfly commented Oct 8, 2019

fritzsedlazeck commented Oct 8, 2019

daviesrob commented Oct 8, 2019

fritzsedlazeck commented Oct 8, 2019

jhcaddisfly commented Oct 9, 2019

fritzsedlazeck commented Oct 9, 2019

jhcaddisfly commented Oct 9, 2019

daviesrob commented Oct 9, 2019

fritzsedlazeck commented Oct 9, 2019

jhcaddisfly commented Oct 9, 2019

bioteksampath commented Jun 13, 2022 •

edited

Loading

NGM: Header does not match the data #50

NGM: Header does not match the data #50

Comments

jhcaddisfly commented Oct 8, 2019

fritzsedlazeck commented Oct 8, 2019

daviesrob commented Oct 8, 2019

fritzsedlazeck commented Oct 8, 2019

jhcaddisfly commented Oct 9, 2019

fritzsedlazeck commented Oct 9, 2019

jhcaddisfly commented Oct 9, 2019

daviesrob commented Oct 9, 2019

fritzsedlazeck commented Oct 9, 2019

jhcaddisfly commented Oct 9, 2019

bioteksampath commented Jun 13, 2022 • edited Loading

bioteksampath commented Jun 13, 2022 •

edited

Loading