ingest.stores.dual

Dual vector store implementation for ingesting documents into both sparse and dense stores

ElasticsearchStore

 ElasticsearchStore (dense_vector_field:str='dense_vector', **kwargs)

A unified Elasticsearch-based dual store that supports both dense vector searches and sparse text searches in a single index. Uses composition to manage both stores.

source

DualStore

 DualStore (dense_kind:str='chroma',
            dense_persist_location:Optional[str]=None,
            sparse_kind:str='whoosh',
            sparse_persist_location:Optional[str]=None, **kwargs)

Helper class that provides a standard way to create an ABC using inheritance.

source

DualStore.exists

 DualStore.exists ()

Returns True if either store exists.

source

DualStore.add_documents

 DualStore.add_documents
                          (documents:Sequence[langchain_core.documents.bas
                          e.Document], batch_size:int=1000, **kwargs)

Add documents to both dense and sparse stores. If both stores use the same persist_location, only add once.

source

DualStore.remove_document

 DualStore.remove_document (id_to_delete)

Remove a document from both stores.

source

DualStore.remove_source

 DualStore.remove_source (source:str)

*Remove a document by source from both stores.

The source can either be the full path to a document or a parent folder. Returns the number of records deleted.*

source

DualStore.update_documents

 DualStore.update_documents (doc_dicts:dict, **kwargs)

Update documents in both stores.

source

DualStore.get_all_docs

 DualStore.get_all_docs ()

Get all documents from the dense store. For simplicity, we only return documents from one store since they should be the same.

source

DualStore.get_doc

 DualStore.get_doc (id)

Get a document by ID from the dense store.

source

DualStore.get_size

 DualStore.get_size ()

Get the size of the dense store.

source

DualStore.erase

 DualStore.erase (confirm=True)

Erase both stores.

source

VectorStore.query

 VectorStore.query (query:str, **kwargs)

Generic query method that invokes the store’s search method. This provides a consistent interface across all store types.

source

DualStore.semantic_search

 DualStore.semantic_search (query:str, **kwargs)

Perform semantic search using the dense store.

source

VectorStore.check

 VectorStore.check ()

Raise exception if VectorStore.exists() returns False

source

VectorStore.ingest

 VectorStore.ingest (source_directory:str, chunk_size:int=500,
                     chunk_overlap:int=50,
                     ignore_fn:Optional[Callable]=None,
                     batch_size:int=41000, **kwargs)

Ingests all documents in source_directory (previously-ingested documents are ignored). When retrieved, the Document objects will each have a metadata dict with the absolute path to the file in metadata["source"]. Extra kwargs fed to ingest.load_single_document.

	Type	Default	Details
source_directory	str		path to folder containing document store
chunk_size	int	500	text is split to this many characters by langchain.text_splitter.RecursiveCharacterTextSplitter
chunk_overlap	int	50	character overlap between chunks in `langchain.text_splitter.RecursiveCharacterTextSplitter`
ignore_fn	Optional	None	Optional function that accepts the file path (including file name) as input and returns `True` if file path should not be ingested.
batch_size	int	41000	batch size used when processing documents
kwargs	VAR_KEYWORD
Returns	None