-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding discrepancies between ngmaster and pubmlst allele calls #27
Comments
Version 0.4 |
Could be discrepancies between the DBs (assuming the DB you are using for ngmaster is a bit out of date compared to pubmlst) or our BLAST logic might be different. Could you provide the exact alleles from ngmaster and pubmlst? The contig sequences for us to test would also be terrific. |
Thanks @andersgs Here is a link: https://1drv.ms/u/s!Ah_Q0DzXKdX-gaRGGilJERAc2FVGvA I moved over to the lastest version of ngmaster and updated the database. The attached contigs were reported as ST 14121 in the original paper but the call from ngmaster is 14122 (different call in porB 485 vs 8154). The tbpB allele is 110. The interesting thing is 485 and 8154 are reverse complementary to each other. Is this just a problem with ngmast needing to curate the database and remove redundant alleles? |
@slvrshot sorry for the delayed reply. Nice find, by the way. We may need to add some logic to our DB download to remove such sequences. I would say that is indeed an issue with ngmast! |
You will find that there are plenty such errors in the scheme unfortunately: Alleles that are reverse complements of each other, alleles that are nested within each other (they are no longer required to be the same length), different start sites etc. I predict that NG-MAST will go into retirement soon and be replaced by cgMLST except for legacy purposes. |
Hi @AdmiralenOla. I suspect that eventually most old sub-tying schemes will retired in favour of a cgMLST scheme --- i might be wrong. Is there a specific cgMLST scheme and nomenclature the community is using or agreeing that it should be the canonical scheme for gono? |
Not sure I'm competent to answer that, but there are several schemes in development at PubMLST that our institute is using at least:
As for nomenclature, I'm not sure. Maybe @kjolley can provide some information? |
@AdmiralenOla are you finding that cgMLST gives you similar resolution to NG-MAST? Any push back from epis? |
I'm pretty sure that this is not an issue with most old schemes. Most scheme curators make efforts to remove any problematic alleles, and certainly would not assign reverse-complemented alleles or nested alleles. This seems to be a particular problem with the legacy NG-MAST scheme and there have been attempts to clean it up on PubMLST, probably resulting in the different results seen (I haven't been directly involved in these efforts so can't provide any more insight). The only recent scheme for Ng that I'm aware of is the cgMLST scheme that @AdmiralenOla mentioned which has been published (https://pubmed.ncbi.nlm.nih.gov/32163580/). |
@kjolley so sorry, that was a blanket statement. I was not referring to the quality of the DBs, but more that they may eventually be retired in favour of something like cgMLST. Thank you for adding to the discussion. |
@kjolley i have edited my response to be clearer. |
No problem @andersgs . Some schemes clearly are better curated than others. |
I notice the same problem for the porB allele of the NG-STAR scheme. I have 26 isolates where ngmaster would identify a novel full length allele similar to one in the database (e.g. ~8) while uploading contigs to PubMLST and the pyngoST tool do identify an exact match with the same or another allele than identified by ngmaster (e.g. 31). I inspected all reference mappings manually and confirmed the alleles reported by PubMLST and pyngoST. I updated the ngmaster database and the alleles reported are present, but still not being called so it seems to be caused by the algorithm. Happy to supply some contig sequences if this would help! |
I was just wondering what might explain this....I have some spades assembled WGS sequences and I used the contigs as input for ngmaster. I upload the same contigs to pubmlst and get completely different porB and tbpB alleles. Could you maybe shed some light on why this is?
The text was updated successfully, but these errors were encountered: