utils
get_datadir
get_datadir ()
download
download (url, filename, verify=False)
df_to_md
df_to_md (df, caption=None)
Converts pd.Dataframe to markdown
html_to_df
html_to_df (html_str:str)
Convert HTML to dataframe.
md_to_df
md_to_df (md_str:str)
Convert Markdown to dataframe.
contains_sentence
contains_sentence (sentence, text)
Returns True if sentence is contained in text ignoring whether tokens are delmited by spaces or newlines or tabs.
remove_sentence
remove_sentence (sentence, text, remove_follow=False, flags=re.IGNORECASE)
*Removes a sentence or phrase from text ignoring whether tokens are delimited by spaces or newlines or tabs.
If remove_follow=True
, then subsequent text until the first newline is also removed.*
segment
segment (text:str, unit:str='paragraph', maxchars:int=2048)
Segments text into a list of paragraphs or sentences depending on value of unit
(one of {'paragraph', 'sentence'}
. The maxchars
parameter is the maximum size of any unit of text.
filtered_generator
filtered_generator (generator, criteria=[])
*Filters a generator based on a given predicate function.
Args: generator: The generator to filter. criteria: List of functions that take an element from the generator and return True if the element should be included, False otherwise.
Yields: Elements from the original generator that satisfy the predicate.*
batch_generator
batch_generator (iterable, batch_size)
Batched results from generator
batch_list
batch_list (input_list, batch_size)
Split list into chunks
get_template_vars
get_template_vars (template_str:str)
Get template variables from a template string.
format_string
format_string (string_to_format:str, **kwargs:str)
Format a string with kwargs
SafeFormatter
SafeFormatter (format_dict:Optional[Dict[str,str]]=None)
Safe string formatter that does not raise KeyError if key is missing. Adapted from llama_index.