Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We noticed an issue whereby the order of a fasta impacted the plasmid assignment. I.e. moving a contig from the start of a fasta to the end moved it from plasmid to chromosome.
The cause was a lack of stable sorting for contig link counts.
When assigning clusters based on low-linkage contigs, the contig link counts are sorted ascending based on number of links. This means that contigs with less links come first:
This makes sense. However, given that many contigs share a link count, the sorting was dependent on the order of the sequences in the input fasta.
contig_clust_assoc
was already defined in the sorting/ordering function, so I assume it was meant to be used as a tie-breaker for sorting. However, it didn't appear to be used for anything.By incorporating the cluster scores, we can sort the contigs first by link count, and secondarily by score (with highest scores coming first). This brings stable sort ordering regardless of the order of the fasta file.