import tempfileingest.stores.dense
DenseStore
DenseStore (**kwargs)
A factory for built-in DenseStore instances.
DenseStore.create
DenseStore.create (persist_location=None, kind=None, **kwargs)
*Factory method to construct a DenseStore instance.
Extra kwargs passed to object instantiation.
Args: persist_location: where the vector database is stored kind: one of {chroma, elasticsearch}
Returns: DenseStore instance*
ElasticsearchDenseStore
ElasticsearchDenseStore (dense_vector_field:str='dense_vector', **kwargs)
Elasticsearch store with dense vector search capabilities. Extends DenseStore to provide Elasticsearch-based dense vector storage.
ChromaStore
ChromaStore (persist_location:Optional[str]=None, **kwargs)
A dense vector store based on Chroma.
ChromaStore.exists
ChromaStore.exists ()
Returns True if vector store has been initialized and contains documents.
ChromaStore.add_documents
ChromaStore.add_documents (documents, batch_size:int=41000)
Stores instances of langchain_core.documents.base.Document in vectordb
ChromaStore.remove_document
ChromaStore.remove_document (id_to_delete)
Remove a single document with ID, id_to_delete.
ChromaStore.remove_source
ChromaStore.remove_source (source:str)
*Deletes all documents in a Chroma collection whose source metadata field starts with the given prefix. The source argument can either be a full path to a document or a prefix (e.g., parent folder).
Args: - source: The source value or prefix
Returns: - The number of documents deleted*
ChromaStore.update_documents
ChromaStore.update_documents (doc_dicts:dict, **kwargs)
Update a set of documents (doc in index with same ID will be over-written)
| Type | Details | |
|---|---|---|
| doc_dicts | dict | dictionary with keys ‘page_content’, ‘source’, ‘id’, etc. | 
| kwargs | VAR_KEYWORD | 
ChromaStore.get_all_docs
ChromaStore.get_all_docs ()
Returns all docs
ChromaStore.get_doc
ChromaStore.get_doc (id)
Retrieve a record by ID
ChromaStore.get_size
ChromaStore.get_size ()
Get total number of records
ChromaStore.erase
ChromaStore.erase (confirm=True)
Resets collection and removes and stored documents
VectorStore.query
VectorStore.query (query:str, **kwargs)
Generic query method that invokes the store’s search method. This provides a consistent interface across all store types.
ChromaStore.semantic_search
ChromaStore.semantic_search (*args, **kwargs)
Perform a semantic search of the vector DB. Returns results as LangChain Document objects.
VectorStore.ingest
VectorStore.ingest (source_directory:str, chunk_size:int=500, chunk_overlap:int=50, ignore_fn:Optional[Callable]=None, batch_size:int=41000, **kwargs)
Ingests all documents in source_directory (previously-ingested documents are ignored). When retrieved, the Document objects will each have a metadata dict with the absolute path to the file in metadata["source"]. Extra kwargs fed to ingest.load_single_document.
| Type | Default | Details | |
|---|---|---|---|
| source_directory | str | path to folder containing document store | |
| chunk_size | int | 500 | text is split to this many characters by langchain.text_splitter.RecursiveCharacterTextSplitter | 
| chunk_overlap | int | 50 | character overlap between chunks in langchain.text_splitter.RecursiveCharacterTextSplitter | 
| ignore_fn | Optional | None | Optional function that accepts the file path (including file name) as input and returns Trueif file path should not be ingested. | 
| batch_size | int | 41000 | batch size used when processing documents | 
| kwargs | VAR_KEYWORD | ||
| Returns | None | 
temp_dir = tempfile.TemporaryDirectory()
tempfolder = temp_dir.namestore = DenseStore.create(tempfolder)
store.ingest("tests/sample_data/ktrain_paper/")Creating new vectorstore at /tmp/tmpmftvr854
Loading documents from tests/sample_data/ktrain_paper/Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00,  7.85it/s]
Processing and chunking 6 new documents: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 985.74it/s]Split into 41 chunks of text (max. 500 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.01it/s]Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methodstype(store)__main__.ChromaStorestore.get_size()41a_document = store.get_all_docs()[0]store.remove_document(a_document['id'])store.get_size()40