Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding discrepancies between ngmaster and pubmlst allele calls #27

Open
slvrshot opened this issue Feb 5, 2019 · 13 comments
Open

Finding discrepancies between ngmaster and pubmlst allele calls #27

slvrshot opened this issue Feb 5, 2019 · 13 comments
Assignees
Labels

Comments

@slvrshot
Copy link

slvrshot commented Feb 5, 2019

I was just wondering what might explain this....I have some spades assembled WGS sequences and I used the contigs as input for ngmaster. I upload the same contigs to pubmlst and get completely different porB and tbpB alleles. Could you maybe shed some light on why this is?

@slvrshot
Copy link
Author

slvrshot commented Feb 5, 2019

Version 0.4

@andersgs
Copy link
Contributor

andersgs commented Feb 5, 2019

Could be discrepancies between the DBs (assuming the DB you are using for ngmaster is a bit out of date compared to pubmlst) or our BLAST logic might be different. Could you provide the exact alleles from ngmaster and pubmlst? The contig sequences for us to test would also be terrific.

@slvrshot
Copy link
Author

slvrshot commented Feb 6, 2019

Thanks @andersgs Here is a link: https://1drv.ms/u/s!Ah_Q0DzXKdX-gaRGGilJERAc2FVGvA

I moved over to the lastest version of ngmaster and updated the database.

The attached contigs were reported as ST 14121 in the original paper but the call from ngmaster is 14122 (different call in porB 485 vs 8154). The tbpB allele is 110.

The interesting thing is 485 and 8154 are reverse complementary to each other. Is this just a problem with ngmast needing to curate the database and remove redundant alleles?

@andersgs
Copy link
Contributor

andersgs commented May 5, 2020

@slvrshot sorry for the delayed reply. Nice find, by the way. We may need to add some logic to our DB download to remove such sequences. I would say that is indeed an issue with ngmast!

@AdmiralenOla
Copy link

You will find that there are plenty such errors in the scheme unfortunately: Alleles that are reverse complements of each other, alleles that are nested within each other (they are no longer required to be the same length), different start sites etc.

I predict that NG-MAST will go into retirement soon and be replaced by cgMLST except for legacy purposes.

@andersgs
Copy link
Contributor

andersgs commented May 19, 2020

Hi @AdmiralenOla. I suspect that eventually most old sub-tying schemes will retired in favour of a cgMLST scheme --- i might be wrong. Is there a specific cgMLST scheme and nomenclature the community is using or agreeing that it should be the canonical scheme for gono?

@AdmiralenOla
Copy link

Not sure I'm competent to answer that, but there are several schemes in development at PubMLST that our institute is using at least:

As for nomenclature, I'm not sure. Maybe @kjolley can provide some information?

@andersgs
Copy link
Contributor

@AdmiralenOla are you finding that cgMLST gives you similar resolution to NG-MAST? Any push back from epis?

@kjolley
Copy link

kjolley commented May 19, 2020

Hi @AdmiralenOla. I suspect that is true all pretty much all the old sub-tying schemes. Is there a specific cgMLST scheme and nomenclature the community is using or agreeing that it should be the canonical scheme for gono?

I'm pretty sure that this is not an issue with most old schemes. Most scheme curators make efforts to remove any problematic alleles, and certainly would not assign reverse-complemented alleles or nested alleles. This seems to be a particular problem with the legacy NG-MAST scheme and there have been attempts to clean it up on PubMLST, probably resulting in the different results seen (I haven't been directly involved in these efforts so can't provide any more insight).

The only recent scheme for Ng that I'm aware of is the cgMLST scheme that @AdmiralenOla mentioned which has been published (https://pubmed.ncbi.nlm.nih.gov/32163580/).

@andersgs
Copy link
Contributor

@kjolley so sorry, that was a blanket statement. I was not referring to the quality of the DBs, but more that they may eventually be retired in favour of something like cgMLST.

Thank you for adding to the discussion.

@andersgs
Copy link
Contributor

@kjolley i have edited my response to be clearer.

@kjolley
Copy link

kjolley commented May 19, 2020

No problem @andersgs . Some schemes clearly are better curated than others.

@Jolein-Laumen
Copy link

I notice the same problem for the porB allele of the NG-STAR scheme. I have 26 isolates where ngmaster would identify a novel full length allele similar to one in the database (e.g. ~8) while uploading contigs to PubMLST and the pyngoST tool do identify an exact match with the same or another allele than identified by ngmaster (e.g. 31). I inspected all reference mappings manually and confirmed the alleles reported by PubMLST and pyngoST. I updated the ngmaster database and the alleles reported are present, but still not being called so it seems to be caused by the algorithm.

Happy to supply some contig sequences if this would help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants