Welcome to treg’s documentation!

treg utilizes trie structured regex patterns to search texts for a potientially large number of words and phrases.

Basic Example:

from Treg import Treg

# Initialize a new pattern
treg = Treg()
# Add some phrases
treg.add_phrases([
    Phrase(phrase='afternoon tea', meta={'fun': 1}),
    Phrase(phrase='tea party', meta={'fun': 3}),
    # ...
])
# Compile the pattern
treg.compile()
# Happy searching!
for match in treg.find_iter(
        "A long collection of afternoon tea party recipes ...",
        overlapped=True):
    print(match)

# Output
Match(phrases=[Phrase(phrase='afternoon tea', meta={'fun': 1})], start=16, end=29)
Match(phrases=[Phrase(phrase='tea party', meta={'fun': 3})], start=26, end=35)
class treg.Match(phrases: List[treg.Phrase], start: int, end: int)

Data class for found search phrases. Each match refers to a specific snippet of the searched text and all phrases that match that snippet.

Parameters:
  • phrases (List[Phrase]) – matching phrases
  • start (int) – start character offset
  • end (int) – end character offset
class treg.Phrase(phrase: str, meta: Optional[dict] = None)

Data class for search phrases.

Parameters:
  • phrase (str) – the phrase to be searched for
  • meta (dict) – additional meta data to be returned together with the phrase if found
class treg.Treg(token_pattern: str = '\w+', optional_ws: bool = False)

Treg base class

Parameters:
  • token_pattern (str) – regex pattern used to differentiate between tokens and whitespace
  • optional_ws (bool) – whether or not to threat whitespaces inside search phrases as optional
add_phrase(phrase: treg.Phrase)

Add a phrase to be searched for. Note that phrases can only be added as long as the pattern is not compiled.

Parameters:phrase – phrase to be searched for
add_phrases(phrases: Iterable[treg.Phrase])

Add multiple phrases at once. See add_phrase().

Parameters:phrases – Iterable of phrases to be searched for
compile()

Compile the pattern. Once compiled find_iter() can be used to search texts. Note that a compiled pattern can’t be modified anymore. Depending on the number of search phrases compiling can take a while.

find_iter(text: str, overlapped: bool = False) → Iterator[treg.Match]

Find search phrases within text. Note that the pattern needs to be compiled first.

Parameters:
  • text – text to be searched
  • overlapped – whether overlapping matches or only the left most match should be returned. The latter is default.
Returns:

Iterator of found phrases (matches)

Return type:

Iterator[Match]

is_compiled()

Return whether the pattern is already compiled.

classmethod load(path: str) → treg.Treg

Load a previously saved / pickled Treg object.

Parameters:path (str) – file path
Returns:Treg object
Type:Treg
save(path: str)

Save the Treg object to a file using pickle. Note that in order to pickle a Treg object all Phrases in particular their meta data needs to be pickleable.

Parameters:path (str) – file path