Skip to content

Commit

Permalink
Arborform .itor_next & .itor_children now are monads (i.e., class Fur…
Browse files Browse the repository at this point in the history
…cation)
  • Loading branch information
rlayers committed Feb 3, 2023
1 parent a786b34 commit b0c873f
Show file tree
Hide file tree
Showing 10 changed files with 125 additions and 103 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ graph TD;
You can then search your tree using plumule: a powerful structured query language:

```python
'**[d:section]{**[d:word] & [lcs:power,right]}' # Find sections that contain words 'power' or 'right'
'**[d:section]{**[d:word] & [lcs:power,right]}' # Plumule query to find sections that containing words 'power' or 'right'
```

Try out [this demo](docs/demos/us_constitution) yourself, which shows how easy it is to parse, visualize, and query the US Constitution using Pawpaw.
Expand Down Expand Up @@ -205,9 +205,9 @@ With this single line of code, Pawpaw generates a fully hierarchical, tree of ph
And you can search the tree using Pawpaw's *plumule*, a powerful XPATH-like structured query language:

```python
>>> print(*doc.find_all('**[d:dig]'), sep=', ') # all digits
>>> print(*doc.find_all('**[d:digit]'), sep=', ') # all digits
9, 1, 0, 1, 1, 1, 2, 1, 3
>>> print(*doc.find_all('**[d:num]{</*[s:i]}'), sep=', ') # all numbers with 'i' in their name
>>> print(*doc.find_all('**[d:number]{</*[s:i]}'), sep=', ') # all numbers with 'i' in their name
9, 13
```

Expand Down
4 changes: 2 additions & 2 deletions docs/4. Arborform.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,8 +142,8 @@ A good strategy for avoiding these scenarios it to utilize a divide-and-conquer
```mermaid
classDiagram
class Itorator{
+itor_next Itorator | Types.F_ITO_2_ITOR | None
+itor_children Itorator | Types.F_ITO_2_ITOR | None
+itor_next Furcation[Ito, Itorator]
+itor_children Furcation[Ito, Itorator]
+postorator Postorator | Types.F_ITOS_2_BITOS | None
+_iter(ito) C_IT_ITOS
+traverse(ito) C_IT_ITOS
Expand Down
22 changes: 11 additions & 11 deletions docs/5. Traversal & Query.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,18 +193,18 @@ filter := '[' [NOT] key ':' value ']'
The filter key conists of a string that indicates what filtering action to perform on the current axis nodes. The filter value
provides additional data to the filtering action. Allowable values for keys and values are as follows:
| Key | Alt Keys(s) | Meaning | Example(s) |
| :----: | :----: | :--- | :--- |
| ``'desc'`` | ``'d'`` | One or more ``str`` values used to match againss ``.desc`` of axis; values must be separated with commas, literal commas must be escaped | ``[d:number]``<br />``[d:word,char]`` |
| ``'str'`` | ``'s'`` | A ``str`` used to match against ``str()`` of axis | ``[s:foo]``<br />``[s:foo,goo]``|
| ``'str-casefold'`` | ``'scf'``,<br />``'lcs'`` | Checks if casefolded ``str()`` of axis matches casefolded value | ``[scf:FoO,GoO]`` |
| Key | Alt Keys(s) | Meaning | Example(s) |
| :----: |:-----------------------------:| :--- | :--- |
| ``'desc'`` | ``'d'`` | One or more ``str`` values used to match againss ``.desc`` of axis; values must be separated with commas, literal commas must be escaped | ``[d:number]``<br />``[d:word,char]`` |
| ``'str'`` | ``'s'`` | A ``str`` used to match against ``str()`` of axis | ``[s:foo]``<br />``[s:foo,goo]``|
| ``'str-casefold'`` | ``'scf'``,<br />``'lcs'`` | Checks if casefolded ``str()`` of axis matches casefolded value | ``[scf:FoO,GoO]`` |
| ``'str-casefold-ew'`` | ``'scfew'``,<br />``'lcsew'`` | Checks if casefolded ``str()`` of axis ends with with casefolded value | ``[scfew:a,1]`` |
| ``'str-casefold-sw'`` | ``'scfsw'``,<br />``'lcsew'`` | Checks if casefolded ``str()`` of axis starts with with casefolded value | ``[scfsw:a,1]`` |
| ``'str-ew'`` | ``'sew'`` | Checks if ``str()`` of axis starts with value | ``[scfew:a,1]`` |
| ``'str-sw'`` | ``'ssw'`` | Checks if ``str()`` of axis ends with with value | ``[scfsw:a,1]`` |
| ``'index'`` | ``'v'`` | One or more tuples consisting of a *start* and optional *stop* ``int`` values used to match against the enumeration index(ices) of the axis; *start* and *stop* must be separated with hyphens, tuples must be separated with commas | ``[i:1]``<br />``[i:2,3,4]``<br />``[i:2-3]``<br />``[i:2,5-7]`` |
| ``'predicate'`` | ``'p'`` | Key for filter function used to match against axis A ``str`` used as a key to entry in dictionary of type: typing.Dict[str, typing.Callable[[int, Ito], bool] The value retrieved from the ``dict`` use used as a filter against the axis | ``[p:key1]``<br />``[p:key1,key2]`` |
| ``'value'`` | ``'v'`` | A ``str`` used as a key to entry in dictionary of type:: typing.Dict[str, typing.Any] The value retrieved from the ``dict`` is used to match against the ``.value()`` of the axis | ``[p:key]``<br />``[p:key1, key2]`` |
| ``'str-casefold-sw'`` | ``'scfsw'``,<br />``'lcssw'`` | Checks if casefolded ``str()`` of axis starts with with casefolded value | ``[scfsw:a,1]`` |
| ``'str-ew'`` | ``'sew'`` | Checks if ``str()`` of axis starts with value | ``[scfew:a,1]`` |
| ``'str-sw'`` | ``'ssw'`` | Checks if ``str()`` of axis ends with with value | ``[scfsw:a,1]`` |
| ``'index'`` | ``'v'`` | One or more tuples consisting of a *start* and optional *stop* ``int`` values used to match against the enumeration index(ices) of the axis; *start* and *stop* must be separated with hyphens, tuples must be separated with commas | ``[i:1]``<br />``[i:2,3,4]``<br />``[i:2-3]``<br />``[i:2,5-7]`` |
| ``'predicate'`` | ``'p'`` | Key for filter function used to match against axis A ``str`` used as a key to entry in dictionary of type: typing.Dict[str, typing.Callable[[int, Ito], bool] The value retrieved from the ``dict`` use used as a filter against the axis | ``[p:key1]``<br />``[p:key1,key2]`` |
| ``'value'`` | ``'v'`` | A ``str`` used as a key to entry in dictionary of type:: typing.Dict[str, typing.Any] The value retrieved from the ``dict`` is used to match against the ``.value()`` of the axis | ``[p:key]``<br />``[p:key1, key2]`` |
Parentheses are allowed to perform logical grouping:
Expand Down
8 changes: 5 additions & 3 deletions docs/demos/us_constitution/us_constitution.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,20 +23,22 @@ def get_parser() -> pawpaw.arborform.Itorator:
a_splitter.itor_next = a_desc

a_extractor = pawpaw.arborform.Extract(regex.compile(r'Article\. (?<key>[A-Z]+)\.\n(?<value>.+)', regex.DOTALL))
a_desc.itor_children = lambda ito: a_extractor if ito.desc == 'article' else None
a_desc.itor_children = (lambda ito: ito.desc == 'article', a_extractor)

# Section (only some articles have sections)
s_splitter = pawpaw.arborform.Split(
regex.compile(r'(?<=\n+)(?=Section\.)', regex.DOTALL),
boundary_retention=pawpaw.arborform.Split.BoundaryRetention.LEADING,
desc='section')
nlp = pawpaw.nlp.SimpleNlp().itor
a_extractor.itor_children = lambda ito: (s_splitter if ito.str_startswith('Section.') else nlp) if ito.desc == 'value' else None
# a_extractor.itor_children = lambda ito: (s_splitter if ito.str_startswith('Section.') else nlp) if ito.desc == 'value' else None
a_extractor.itor_children.append((lambda ito: ito.desc == 'value' and ito.str_startswith('Section.'), s_splitter))
a_extractor.itor_children.append((lambda ito: ito.desc == 'value', nlp))

s_extractor = pawpaw.arborform.Extract(regex.compile(r'Section\. (?<key>\d+)\.\n(?<value>.+)', regex.DOTALL))
s_splitter.itor_children = s_extractor

s_extractor.itor_children = lambda ito: nlp if ito.desc == 'value' else None
s_extractor.itor_children = (lambda ito: ito.desc == 'value', nlp)

return a_splitter

Expand Down
5 changes: 3 additions & 2 deletions pawpaw/_furcation.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
I = typing.TypeVar('I') # Input to predicate
R = typing.TypeVar('R') # Return value type; should be "anything but None", but Python lacks this ability


class Furcation(list[PredicatedValue], typing.Generic[I, R]):
C_ITEM = PredicatedValue | tuple[typing.Callable[[I], bool], R | None] | typing.Callable[[I], bool] | R

Expand All @@ -18,12 +19,12 @@ def tautology(cls, item: I) -> bool:
def evaluate(self, item: I) -> R | None:
i_typ, r_typ = self.generic_types()

if not type_magic.isinstance_ex(i, i_typ):
if not type_magic.isinstance_ex(item, i_typ):
raise Errors.parameter_invalid_type('item', item, i_typ)

for pv in self:
if pv.predicate(item):
return pv.val
return pv.value

return None

Expand Down
17 changes: 9 additions & 8 deletions pawpaw/_predicated_value.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,18 @@

F_PREDICATE = typing.Callable[[typing.Any], bool]


class PredicatedValue:
def __init__(self, predicate: F_PREDICATE, val: typing.Any):
def __init__(self, predicate: F_PREDICATE, value: typing.Any):
if not type_magic.functoid_isinstance(predicate, F_PREDICATE):
raise Errors.parameter_invalid_type('predicate', predicate, F_PREDICATE)
self._predicate: F_PREDICATE = predicate
self._val: typing.Any = val
self._value: typing.Any = value

@property
def predicate(self) -> F_PREDICATE:
return self._predicate
@property
def predicate(self) -> F_PREDICATE:
return self._predicate

@property
def val(self) -> typing.Any:
return self._val
@property
def value(self) -> typing.Any:
return self._value
68 changes: 30 additions & 38 deletions pawpaw/arborform/itorator/itorator.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,57 +4,51 @@
import types
import typing

from pawpaw import Types, Errors, Ito, type_magic
from pawpaw import Types, Errors, Ito, type_magic, PredicatedValue, Furcation
from pawpaw.arborform.postorator.postorator import Postorator


from dataclasses import dataclass

@dataclass(frozen=True)
class ItoDist:
predicate: typing.Callable[[Ito], bool]
itorator: Itorator | None


class Itorator(ABC):
# define Python user-defined exceptions
class SelfChainingError(ValueError):
"""Raised when attempt is made to add self to the pipeline"""
def __init__(self, type: str):
self.message = f'can\t add self to {type} chain'

def __init__(self, tag: str | None = None):
if tag is not None and not isinstance(tag, str):
raise Errors.parameter_invalid_type('desc', tag, str)
self.tag = tag
self._itor_next: Itorator | Types.F_ITO_2_ITOR | None = None
self._itor_children: Itorator | Types.F_ITO_2_ITOR | None = None
self._itor_next = Furcation[Ito, Itorator]()
self._itor_children = Furcation[Ito, Itorator]()
self._postorator: Postorator | Types.F_ITOS_2_BITOS | None = None
self._post_func: Types.F_ITOS_2_BITOS | None = None

@property
def itor_next(self) -> Types.F_ITO_2_ITOR:
def itor_next(self) -> Furcation[Ito, Itorator]():
return self._itor_next

@itor_next.setter
def itor_next(self, val: Itorator | Types.F_ITO_2_ITOR | None):
if val is self:
raise ValueError('can\'t set .itor_next to self')
elif isinstance(val, Itorator):
self._itor_next = lambda ito: val
elif val is None or type_magic.functoid_isinstance(val, Types.F_ITO_2_ITOR):
self._itor_next = val
else:
raise Errors.parameter_invalid_type('val', val, Itorator, Types.F_ITO_2_ITOR, types.NoneType)
def itor_next(self, val: Itorator | PredicatedValue | tuple[typing.Callable[[Ito], bool], Itorator | None] | None) -> None:
if (val is self) or (isinstance(val, PredicatedValue) and val.value is self) or (isinstance(val, tuple) and val[1] is self):
raise Itorator.SelfChainingError('itor_next')

self._itor_next.clear()
if val is not None:
self._itor_next.append(val)

@property
def itor_children(self) -> Types.F_ITO_2_ITOR:
def itor_children(self) -> Furcation[Ito, Itorator]():
return self._itor_children

@itor_children.setter
def itor_children(self, val: Itorator | Types.F_ITO_2_ITOR | None):
if val is self:
raise ValueError('can\'t set .itor_children to self')
elif isinstance(val, Itorator):
self._itor_children = lambda ito: val
elif val is None or type_magic.functoid_isinstance(val, Types.F_ITO_2_ITOR):
self._itor_children = val
else:
raise Errors.parameter_invalid_type('val', val, Itorator, Types.F_ITO_2_ITOR, types.NoneType)
def itor_children(self, val: Itorator | PredicatedValue | tuple[typing.Callable[[Ito], bool], Itorator | None] | None) -> None:
if (val is self) or (isinstance(val, PredicatedValue) and val.value is self) or (isinstance(val, tuple) and val[1] is self):
raise Itorator.SelfChainingError('itor_children')

self._itor_children.clear()
if val is not None:
self._itor_children.append(val)

@property
def postorator(self) -> Postorator | Types.F_ITOS_2_BITOS | None:
Expand All @@ -75,16 +69,14 @@ def _iter(self, ito: Ito) -> Types.C_SQ_ITOS:
pass

def _do_children(self, ito: Ito) -> None:
if self._itor_children is not None:
itor_c = self._itor_children(ito)
if itor_c is not None:
for c in itor_c._traverse(ito, True):
pass # force iter walk
if (itor_c := self._itor_children.evaluate(ito)) is not None:
for c in itor_c._traverse(ito, True):
pass # force iter walk

def _do_next(self, ito: Ito) -> Types.C_IT_ITOS:
if self._itor_next is None:
if (itor_n := self._itor_next.evaluate(ito)) is None:
yield ito
elif (itor_n := self._itor_next(ito)) is not None:
else:
yield from itor_n._traverse(ito)

def _do_post(self, parent: Ito, itos: Types.C_IT_ITOS) -> Types.C_IT_ITOS:
Expand Down
1 change: 0 additions & 1 deletion pawpaw/ito.py
Original file line number Diff line number Diff line change
Expand Up @@ -1424,7 +1424,6 @@ class Types:
F_ITO_2_B = typing.Callable[[Ito], bool]
F_ITO_2_VAL = typing.Callable[[Ito], typing.Any]
F_ITO_2_DESC = typing.Callable[[Ito], str]
F_ITO_2_ITOR = typing.Callable[[Ito], 'Itorator']
F_ITO_2_SQ_ITOS = typing.Callable[[Ito], C_SQ_ITOS]
F_ITO_2_IT_ITOS = typing.Callable[[Ito], C_IT_ITOS]

Expand Down
31 changes: 26 additions & 5 deletions tests/test_itorator.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,42 @@
import typing

import regex
from pawpaw import Ito, Types
from pawpaw import Ito, Types, PredicatedValue
from pawpaw.arborform import Itorator, Reflect
from tests.util import _TestIto


class TestItorator(_TestIto):
"""Uses Reflect and Wrap classes, which have trivial implementation, to test base class functionality"""

def test_add_self(self):
def test_add_self_itor_children(self):
i = Reflect()

with self.assertRaises(ValueError):
i.itor_children = i

def test_add_self_itor_next(self):
i = Reflect()

with self.assertRaises(ValueError):
i.itor_next = i

with self.assertRaises(ValueError):
i.itor_next = (lambda ito: True, i)

with self.assertRaises(ValueError):
i.itor_next = PredicatedValue(lambda ito: True, i)

def test_set_itor_next_none(self):
i_root = Reflect()
self.assertEqual(0, len(i_root.itor_next))

i_root.itor_next = Itorator.wrap(lambda ito: ito.str_split())
self.assertEqual(1, len(i_root.itor_next))

i_root.itor_next = None
self.assertEqual(0, len(i_root.itor_next))

def test_wrap_lambda(self):
s = 'abc'
root = Ito(s)
Expand All @@ -38,7 +57,8 @@ def test_wrap_itorator(self):
itor_split_words = Itorator.wrap(lambda ito: ito.str_split())
itor_strip_first = Itorator.wrap(lambda ito: [ito[1:]])
itor_strip_last = Itorator.wrap(lambda ito: [ito[:-1]])
itor_split_words.itor_next = lambda ito: itor_strip_first if str(ito) == 'two' else itor_strip_last
itor_split_words.itor_next.append((lambda ito: str(ito) == 'two', itor_strip_first))
itor_split_words.itor_next.append(itor_strip_last)

# Wrap the multi-endpoint itorator
itor_wrap = Itorator.wrap(itor_split_words)
Expand All @@ -54,7 +74,7 @@ def test_wrap_itorator(self):
def test_traverse(self):
s = 'abc'
root = Ito(s)
self.add_chars_as_children(root, 'Child')
root.children.add(*root)

reflect = Reflect()
rv = [*reflect.traverse(root)]
Expand Down Expand Up @@ -200,7 +220,8 @@ def test_traverse_complex(self):
func = lambda ito: [*ito.split(regex.compile(r'(?<=[A-Z])(?=[a-z])'), desc='upper or lower')]
splt_case = Itorator.wrap(func)

namer.itor_children = lambda ito: splt_digits if ito.desc == 'numeric' else splt_case
namer.itor_children = (lambda ito: ito.desc == 'numeric', splt_digits)
namer.itor_children.append(splt_case)

func = lambda ito: [ito.clone(i, i + 1, desc='char') for i in range(ito.start, ito.stop)]
splt_chars = Itorator.wrap(func)
Expand Down
Loading

0 comments on commit b0c873f

Please sign in to comment.