ingest.stores.dual

Dual vector store implementation for ingesting documents into both sparse and dense stores

source

DualStore

 DualStore (dense_persist_directory:Optional[str]=None,
            sparse_persist_directory:Optional[str]=None, **kwargs)

Helper class that provides a standard way to create an ABC using inheritance.


source

DualStore.get_db

 DualStore.get_db ()

Returns the dense store’s database instance. For consistency with the VectorStore interface.


source

DualStore.exists

 DualStore.exists ()

Returns True if either store exists.


source

DualStore.add_documents

 DualStore.add_documents
                          (documents:Sequence[langchain_core.documents.bas
                          e.Document], batch_size:int=1000, **kwargs)

Add documents to both dense and sparse stores.


source

DualStore.remove_document

 DualStore.remove_document (id_to_delete)

Remove a document from both stores.


source

DualStore.remove_source

 DualStore.remove_source (source:str)

Remove a document by source from both stores.


source

DualStore.update_documents

 DualStore.update_documents (doc_dicts:dict, **kwargs)

Update documents in both stores.


source

DualStore.get_all_docs

 DualStore.get_all_docs ()

Get all documents from the dense store. For simplicity, we only return documents from one store since they should be the same.


source

DualStore.get_doc

 DualStore.get_doc (id)

Get a document by ID from the dense store.


source

DualStore.get_size

 DualStore.get_size ()

Get the size of the dense store.


source

DualStore.erase

 DualStore.erase (confirm=True)

Erase both stores.


source

DualStore.query

 DualStore.query (q:str, **kwargs)

Query using the sparse store.


source

VectorStore.check

 VectorStore.check ()

Raise exception if VectorStore.exists() returns False


source

VectorStore.ingest

 VectorStore.ingest (source_directory:str, chunk_size:int=500,
                     chunk_overlap:int=50,
                     ignore_fn:Optional[Callable]=None,
                     batch_size:int=41000, **kwargs)

Ingests all documents in source_directory (previously-ingested documents are ignored). When retrieved, the Document objects will each have a metadata dict with the absolute path to the file in metadata["source"]. Extra kwargs fed to ingest.load_single_document.

Type Default Details
source_directory str path to folder containing document store
chunk_size int 500 text is split to this many characters by langchain.text_splitter.RecursiveCharacterTextSplitter
chunk_overlap int 50 character overlap between chunks in langchain.text_splitter.RecursiveCharacterTextSplitter
ignore_fn Optional None Optional function that accepts the file path (including file name) as input and returns True if file path should not be ingested.
batch_size int 41000 batch size used when processing documents
kwargs VAR_KEYWORD
Returns None