ingest.helpers
helper utilities for ingesting documents
extract_files
extract_files (source_dir:str, extensions:Union[dict,list])
Extract files of all supplied extensions.
extract_extension
extract_extension (file_path:str)
Extracts file extension (including dot) from file path
extract_tables
extract_tables (filepath:Optional[str]=None, docs:Optional[List[langchain_core.documents.base.Document ]]=[])
*Extract tables from PDF and append to end of supplied Document list. Accepts either a filepath
or a list of LangChain Document
objects all from a single file. If filepath
is empty, the file path of interest is extracted from docs
.
Returns an updated list of Document objects appended with extracted tables.*
includes_caption
includes_caption (d:langchain_core.documents.base.Document)
Returns True if content of supplied Document includes a table caption