Extracts noun phrases from text, including coordinated phrases like “generative AI and live fire testing”, and removes subphrases like “AI” if “generative AI” is also found. Example: text = “Natural language processing (NLP) is a field of computer science, artificial intelligence,” “and computational linguistics concerned with the interactions between computers and human” “(natural) languages.” extract_noun_phrases(text) [‘Natural language processing’, ‘NLP’, ‘field’, ‘computer science’, ‘artificial intelligence’, ‘computational linguistics’, ‘interactions’, ‘computers’, ‘languages’, ‘human’]

source

contains_sentence

 contains_sentence (sentence, text)

Returns True if sentence is contained in text ignoring whether tokens are delmited by spaces or newlines or tabs.

source

remove_sentence

 remove_sentence (sentence, text, remove_follow=False,
                  flags=re.IGNORECASE)

*Removes a sentence or phrase from text ignoring whether tokens are delimited by spaces or newlines or tabs.

If remove_follow=True, then subsequent text until the first newline is also removed.*

source

segment

 segment (text:str, unit:str='paragraph', maxchars:int=2048)

Segments text into a list of paragraphs or sentences depending on value of unit (one of {'paragraph', 'sentence'}. The maxchars parameter is the maximum size of any unit of text.

source

filtered_generator

 filtered_generator (generator, criteria=[])

*Filters a generator based on a given predicate function.

Args: generator: The generator to filter. criteria: List of functions that take an element from the generator and return True if the element should be included, False otherwise.

Yields: Elements from the original generator that satisfy the predicate.*

source

batch_generator

 batch_generator (iterable, batch_size)

Batched results from generator

source

batch_list

 batch_list (input_list, batch_size)

Split list into chunks

source

get_template_vars

 get_template_vars (template_str:str)

Get template variables from a template string.

source

format_string

 format_string (string_to_format:str, **kwargs:str)

Format a string with kwargs

source

SafeFormatter

 SafeFormatter (format_dict:Optional[Dict[str,str]]=None)

Safe string formatter that does not raise KeyError if key is missing. Adapted from llama_index.