Skip to content

Commit

Permalink
fine-tune negation handling
Browse files Browse the repository at this point in the history
- negating internal monosaccharides in subgraph_isomorphism now possible
- make sure that presence of negated motif somewhere else in the glycan doesn't bother subgraph_isomorphism for matching
- better formatting of get_match matches
- access subgraph_isomorphism without decorator, if needed
  • Loading branch information
Bribak committed Jan 26, 2025
1 parent f394bda commit 7558d9b
Show file tree
Hide file tree
Showing 4 changed files with 47 additions and 29 deletions.
18 changes: 11 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@
- Improved the description of blood group motifs in `motif_list` (including type 3 blood group antigens, ExtB, and parent motifs) (b94744e)

##### Fixed 🐛
- Fixed the "Oglycan_core6" motif definition in `motif_list` to no longer overlap with core 2 structures
- Fixed the "Oglycan_core6" motif definition in `motif_list` to no longer overlap with core 2 structures (f394bda)

#### loader
##### Added ✨
- Added `count_nested_brackets` helper function to monitor level of nesting in glycans (41bb1a1, d57b836)
- Added dictionaries with lists of strings as values as a new supported data type for `DataFrameSerializer` (034b6ad)
- Added `share_neighbor` helper function to check whether two nodes in a glycan graph share a neighbor
- Added `share_neighbor` helper function to check whether two nodes in a glycan graph share a neighbor (f394bda)

##### Changed 🔄
- Changed `resources.open_text` to `resources.files` to prevent `DeprecationWarning` from `importlib` (0c94995)
Expand Down Expand Up @@ -117,9 +117,12 @@
- Ensured that `compare_glycans` is 100% order-specific, never matching something like ("Gal(b1-4)GlcNAc", "GlcNAc(b1-4)Gal") (5a99d6b)
- `glycan_to_nxGraph` will now return an empty graph if the input is an empty string (4f1ccfa)
- `get_possible_topologies` will now also produce a warning (and return the input) if an already defined topology is provided as a pre-calculated graph (3f22f14)
- Negation in `subgraph_isomorphism` can now also be added for internal monosaccharides (e.g., "Neu5Ac(a2-3)!Gal(b1-4)GlcNAc")
- Functions with the `handle_negation` decorator can now be accessed without the decorator via `.__wrapped__`

##### Fixed 🐛
- Fixed an edge case in which `subgraph_isomorphism` could erroneously return False if any of the matchings were in the wrong order, if "count = False"
- Fixed an edge case in which `subgraph_isomorphism` could erroneously return False if any of the matchings were in the wrong order, if "count = False" (f394bda)
- Fixed an edge case in which negated motifs in `subgraph_isomorphism` sometimes wrongly returned False because the negated motif was present somewhere else in the glycan (but the intended motif was still there)

#### draw
##### Added ✨
Expand Down Expand Up @@ -148,7 +151,7 @@
- `get_glycanova` will now raise a ValueError if fewer than three groups are provided in the input data (f76535e)
- Improved console drawing quality controlled by `display_svg_with_matplotlib` and image quality in Excel cells using `plot_glycans_excel` (a64f694)
- The "periods" argument in `get_jtk` is now a keyword argument and has a default value of [12, 24] (87ea2fc)
- `specify_linkages` can now also handle super-narrow linkage wildcards like Galb3/4
- `specify_linkages` can now also handle super-narrow linkage wildcards like Galb3/4 (f394bda)

##### Fixed 🐛
- Fixed a FutureWarning in `get_lectin_array` by avoiding DataFrame.groupby with axis=1 (f76535e)
Expand All @@ -164,11 +167,12 @@

#### regex
##### Changed 🔄
- Improved tracing in `try_matching` for complicated branching cases
- Improved tracing in `try_matching` for complicated branching cases (f394bda)
- Ensured that `format_retrieved_matches` outputs the identified motifs in the canonical IUPAC representation

##### Deprecated ⚠️
- Deprecated `process_pattern`; will be done in-line instead
- Deprecated `expand_pattern`; will be handled by `specify_linkages` and improvements in `subgraph_isomorphism` instead
- Deprecated `process_pattern`; will be done in-line instead (f394bda)
- Deprecated `expand_pattern`; will be handled by `specify_linkages` and improvements in `subgraph_isomorphism` instead (f394bda)

##### Fixed 🐛
- Fixed an issue in `get_match_batch`, in which precompiled patterns caused issues in `get_match` (194f31c)
Expand Down
51 changes: 30 additions & 21 deletions glycowork/motif/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
import pandas as pd
import networkx as nx
from scipy.sparse.linalg import eigsh
from functools import lru_cache
from functools import lru_cache, wraps


@lru_cache(maxsize = 1024)
Expand Down Expand Up @@ -234,6 +234,7 @@ def expand_termini_list(motif: Union[str, nx.Graph], # Glycan motif sequence or
def handle_negation(original_func: Callable # Function to wrap
) -> Callable: # Wrapped function handling negation
"Decorator for handling negation patterns in glycan matching functions"
@wraps(original_func)
def wrapper(glycan, motif, *args, **kwargs):
if isinstance(motif, str) and '!' in motif:
return subgraph_isomorphism_with_negation(glycan, motif, *args, **kwargs)
Expand Down Expand Up @@ -311,29 +312,37 @@ def subgraph_isomorphism_with_negation(glycan: Union[str, nx.Graph], # Glycan se
) -> Union[bool, int, Tuple[int, List[List[int]]]]: # Boolean presence, count, or (count, matches)
"Check if motif exists as subgraph in glycan, handling negation patterns"
if isinstance(motif, str):
temp = motif[motif.index('!'):]
motif_stub = (motif[:motif.index('!')] + temp[temp.index(')')+1:]).replace('[]', '')
negated_part = re.search(r'(![A-Za-z0-9?(-]+)', motif).group(1) + ')'
to_replace = '' if motif.startswith('!') or '[!' in motif else 'Monosaccharide(?1-?)'
motif_stub = motif.replace(negated_part, to_replace).replace('[]', '')
negated_part_clean = glycan_to_nxGraph(negated_part.replace('!', ''))
else:
motif_stub = motif.copy()
nodes_to_remove = {node for node, data in motif_stub.nodes(data = True) if '!' in data.get('string_labels', '')}
nodes_to_remove.update({node + 1 for node in nodes_to_remove if node + 1 in motif_stub})
motif_stub.remove_nodes_from(nodes_to_remove)
res = subgraph_isomorphism(glycan, motif_stub, termini_list = termini_list, count = count, return_matches = return_matches)
if not res or (isinstance(res, tuple) and not res[0]):
return res
negated_nodes = {n for n, data in motif.nodes(data = True) if '!' in data.get('string_labels', '')}
negated_nodes.update({node + 1 for node in negated_nodes if node + 1 in motif_stub})
motif_stub.remove_nodes_from(negated_nodes)
negated_part_clean = nx.subgraph(motif, negated_nodes)
res = subgraph_isomorphism.__wrapped__(glycan, motif_stub, termini_list = termini_list, count = count, return_matches = True)
if not res[0]:
return (0, []) if return_matches else 0 if count else False
valid_matches = []
negated_len = len(negated_part_clean)
for match_nodes in res[1]:
ggraph = glycan_to_nxGraph(glycan) if isinstance(glycan, str) else glycan.copy()
context_nodes = set(match_nodes)
for step in range(negated_len):
for node in list(context_nodes):
context_nodes.update(list(ggraph.neighbors(node)))
context_subgraph = ggraph.subgraph(context_nodes)
if not subgraph_isomorphism.__wrapped__(context_subgraph, negated_part_clean):
valid_matches.append(match_nodes)
if count:
total = len(valid_matches)
return (total, valid_matches) if return_matches else total
elif return_matches:
return (1 if valid_matches else 0, valid_matches)
else:
if isinstance(motif, str):
motif_too_large = motif.replace('!', '')
else:
motif_too_large = motif.copy()
for node, data in motif_too_large.nodes(data = True):
if '!' in data.get('string_labels', ''):
motif_too_large.nodes[node]['string_labels'] = data['string_labels'].replace('!', '')
res2 = subgraph_isomorphism(glycan, motif_too_large, termini_list = termini_list, count = count, return_matches = return_matches)
if res2:
return (0, []) if return_matches else 0 if count else False
else:
return res
return bool(valid_matches)


def generate_graph_features(glycan: Union[str, nx.Graph], # Glycan sequence or network graph
Expand Down
2 changes: 1 addition & 1 deletion glycowork/motif/regex.py
Original file line number Diff line number Diff line change
Expand Up @@ -417,7 +417,7 @@ def format_retrieved_matches(lists: List[List[int]], # List of traces
ggraph: nx.Graph # Glycan graph
) -> List[str]: # Matching glycan strings
"Convert traces into glycan strings"
return sorted([graph_to_string(ggraph.subgraph(trace)) for trace in lists if nx.is_connected(ggraph.subgraph(trace))], key = len, reverse = True)
return sorted([canonicalize_iupac(graph_to_string(ggraph.subgraph(trace))) for trace in lists if nx.is_connected(ggraph.subgraph(trace))], key = len, reverse = True)


def filter_dealbreakers(lists: List[List[int]], # List of traces
Expand Down
5 changes: 5 additions & 0 deletions tests/test_core_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -1885,6 +1885,8 @@ def test_subgraph_isomorphism_with_negation():
result, matches = subgraph_isomorphism_with_negation(glycan, motif, count=True, return_matches=True)
assert isinstance(result, int)
assert isinstance(matches, list)
assert subgraph_isomorphism_with_negation("Neu5Ac(a2-3)Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc", '!Neu5Ac(a2-?)Gal(?1-?)GlcNAc', return_matches=True) == (1, [[12, 13, 14]])
assert subgraph_isomorphism_with_negation("Neu5Ac(a2-3)Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Neu5Ac(a2-3)Gal(b1-4)GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc", 'Neu5Ac(a2-?)!Gal(b1-4)GlcNAc', return_matches=True) == (1, [[0, 1, 2, 3, 6]])


def test_categorical_node_match_wildcard():
Expand Down Expand Up @@ -2293,6 +2295,9 @@ def test_get_match():
assert get_match("Hex-HexNAc-([Hex|Fuc])*-HexNAc", "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc") == ["Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc"]
assert get_match(".-.-([Hex|Fuc])+-.", "GalNAc(b1-4)GlcNAc(b1-2)Man(a1-3)[Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc") == ['Neu5Gc(a2-6)GalNAc(b1-4)[Fuc(a1-3)]GlcNAc', 'Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc']
assert get_match("Fuc-Galb3/4-([Hex|Fuc])*-HexNAc", "Fuc(a1-2)Gal(b1-?)[Fuc(a1-?)]GlcNAc(b1-6)[Gal(b1-3)]GalNAc") == ["Fuc(a1-2)Gal(b1-?)[Fuc(a1-?)]GlcNAc"]
assert get_match("Fuc-([^Gal])+-GlcNAc", "Neu5Ac(a2-3)Gal(b1-4)[Fuc(a1-3)]GlcNAc(b1-2)Man(a1-3)[Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc") == ['Fuc(a1-3)[Gal(b1-4)]GlcNAc']
assert get_match(".-HexNAc$", "Neu5Ac(a2-3)Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc") == ['GlcNAc(b1-4)GlcNAc']
assert get_match("!Neu5Ac-Gal-GlcNAc", "Neu5Ac(a2-3)Gal(b1-3)[Fuc(a1-4)]GlcNAc(b1-2)Man(a1-3)[Fuc(a1-3)[Gal(b1-4)]GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)GlcNAc") == ["Gal(b1-4)GlcNAc"]


def test_motif_to_regex():
Expand Down

0 comments on commit 7558d9b

Please sign in to comment.