Welcome to treg’s documentation!¶
treg utilizes trie structured regex patterns to search texts for a potientially large number of words and phrases.
Basic Example:
from Treg import Treg
# Initialize a new pattern
treg = Treg()
# Add some phrases
treg.add_phrases([
Phrase(phrase='afternoon tea', meta={'fun': 1}),
Phrase(phrase='tea party', meta={'fun': 3}),
# ...
])
# Compile the pattern
treg.compile()
# Happy searching!
for match in treg.find_iter(
"A long collection of afternoon tea party recipes ...",
overlapped=True):
print(match)
# Output
Match(phrases=[Phrase(phrase='afternoon tea', meta={'fun': 1})], start=16, end=29)
Match(phrases=[Phrase(phrase='tea party', meta={'fun': 3})], start=26, end=35)
-
class
treg.
Match
(phrases: List[treg.Phrase], start: int, end: int)¶ Data class for found search phrases. Each match refers to a specific snippet of the searched text and all phrases that match that snippet.
Parameters: - phrases (List[Phrase]) – matching phrases
- start (int) – start character offset
- end (int) – end character offset
-
class
treg.
Phrase
(phrase: str, meta: Optional[dict] = None)¶ Data class for search phrases.
Parameters: - phrase (str) – the phrase to be searched for
- meta (dict) – additional meta data to be returned together with the phrase if found
-
class
treg.
Treg
(token_pattern: str = '\w+', optional_ws: bool = False)¶ Treg base class
Parameters: - token_pattern (str) – regex pattern used to differentiate between tokens and whitespace
- optional_ws (bool) – whether or not to threat whitespaces inside search phrases as optional
-
add_phrase
(phrase: treg.Phrase)¶ Add a phrase to be searched for. Note that phrases can only be added as long as the pattern is not compiled.
Parameters: phrase – phrase to be searched for
-
add_phrases
(phrases: Iterable[treg.Phrase])¶ Add multiple phrases at once. See
add_phrase()
.Parameters: phrases – Iterable of phrases to be searched for
-
compile
()¶ Compile the pattern. Once compiled
find_iter()
can be used to search texts. Note that a compiled pattern can’t be modified anymore. Depending on the number of search phrases compiling can take a while.
-
find_iter
(text: str, overlapped: bool = False) → Iterator[treg.Match]¶ Find search phrases within text. Note that the pattern needs to be compiled first.
Parameters: - text – text to be searched
- overlapped – whether overlapping matches or only the left most match should be returned. The latter is default.
Returns: Iterator of found phrases (matches)
Return type: Iterator[Match]
-
is_compiled
()¶ Return whether the pattern is already compiled.
-
classmethod
load
(path: str) → treg.Treg¶ Load a previously saved / pickled Treg object.
Parameters: path (str) – file path Returns: Treg object Type: Treg
-
save
(path: str)¶ Save the Treg object to a file using pickle. Note that in order to pickle a Treg object all Phrases in particular their meta data needs to be pickleable.
Parameters: path (str) – file path