ingest.helpers

helper utilities for ingesting documents

source

extract_files

 extract_files (source_dir:str, extensions:Union[dict,list])

Extract files of all supplied extensions.


source

extract_extension

 extract_extension (file_path:str)

Extracts file extension (including dot) from file path


source

extract_tables

 extract_tables (filepath:Optional[str]=None,
                 docs:Optional[List[langchain_core.documents.base.Document
                 ]]=[])

*Extract tables from PDF and append to end of supplied Document list. Accepts either a filepath or a list of LangChain Document objects all from a single file. If filepath is empty, the file path of interest is extracted from docs.

Returns an updated list of Document objects appended with extracted tables.*


source

includes_caption

 includes_caption (d:langchain_core.documents.base.Document)

Returns True if content of supplied Document includes a table caption