ingest.stores.dual
DualStore
DualStore (dense_persist_directory:Optional[str]=None, sparse_persist_directory:Optional[str]=None, **kwargs)
Helper class that provides a standard way to create an ABC using inheritance.
DualStore.get_db
DualStore.get_db ()
Returns the dense store’s database instance. For consistency with the VectorStore interface.
DualStore.exists
DualStore.exists ()
Returns True if either store exists.
DualStore.add_documents
DualStore.add_documents (documents:Sequence[langchain_core.documents.bas e.Document], batch_size:int=1000, **kwargs)
Add documents to both dense and sparse stores.
DualStore.remove_document
DualStore.remove_document (id_to_delete)
Remove a document from both stores.
DualStore.remove_source
DualStore.remove_source (source:str)
Remove a document by source from both stores.
DualStore.update_documents
DualStore.update_documents (doc_dicts:dict, **kwargs)
Update documents in both stores.
DualStore.get_all_docs
DualStore.get_all_docs ()
Get all documents from the dense store. For simplicity, we only return documents from one store since they should be the same.
DualStore.get_doc
DualStore.get_doc (id)
Get a document by ID from the dense store.
DualStore.get_size
DualStore.get_size ()
Get the size of the dense store.
DualStore.erase
DualStore.erase (confirm=True)
Erase both stores.
DualStore.query
DualStore.query (q:str, **kwargs)
Query using the sparse store.
DualStore.semantic_search
DualStore.semantic_search (query:str, **kwargs)
Perform semantic search using the dense store.
VectorStore.check
VectorStore.check ()
Raise exception if VectorStore.exists()
returns False
VectorStore.ingest
VectorStore.ingest (source_directory:str, chunk_size:int=500, chunk_overlap:int=50, ignore_fn:Optional[Callable]=None, batch_size:int=41000, **kwargs)
Ingests all documents in source_directory
(previously-ingested documents are ignored). When retrieved, the Document objects will each have a metadata
dict with the absolute path to the file in metadata["source"]
. Extra kwargs fed to ingest.load_single_document
.
Type | Default | Details | |
---|---|---|---|
source_directory | str | path to folder containing document store | |
chunk_size | int | 500 | text is split to this many characters by langchain.text_splitter.RecursiveCharacterTextSplitter |
chunk_overlap | int | 50 | character overlap between chunks in langchain.text_splitter.RecursiveCharacterTextSplitter |
ignore_fn | Optional | None | Optional function that accepts the file path (including file name) as input and returns True if file path should not be ingested. |
batch_size | int | 41000 | batch size used when processing documents |
kwargs | VAR_KEYWORD | ||
Returns | None |