ingest.stores.sparse

full-text search engine

source

SparseStore

 SparseStore (persist_directory:Optional[str]=None,
              index_name:str='myindex', **kwargs)

Helper class that provides a standard way to create an ABC using inheritance.


source

default_schema

 default_schema ()

source

SparseStore.get_db

 SparseStore.get_db ()

Get raw index


source

SparseStore.exists

 SparseStore.exists ()

Returns True if documents have been added to search index


source

SparseStore.add_documents

 SparseStore.add_documents
                            (docs:Sequence[langchain_core.documents.base.D
                            ocument], limitmb:int=1024, verbose:bool=True,
                            **kwargs)

Indexes documents. Extra kwargs supplied to TextStore.ix.writer.

Type Default Details
docs Sequence list of LangChain Documents
limitmb int 1024 maximum memory in megabytes to use
verbose bool True Set to False to disable progress bar
kwargs VAR_KEYWORD

source

SparseStore.remove_document

 SparseStore.remove_document (value:str, field:str='id')

Remove document with corresponding value and field. Default field is the id field.


source

SparseStore.update_documents

 SparseStore.update_documents (doc_dicts:dict, **kwargs)

Update a set of documents (doc in index with same ID will be over-written)

Type Details
doc_dicts dict dictionary with keys ‘page_content’, ‘source’, ‘id’, etc.
kwargs VAR_KEYWORD

source

SparseStore.get_all_docs

 SparseStore.get_all_docs ()

Returns a generator to iterate through all indexed documents


source

SparseStore.get_doc

 SparseStore.get_doc (id:str)

Get an indexed record by ID


source

SparseStore.get_size

 SparseStore.get_size ()

Gets size of index


source

SparseStore.erase

 SparseStore.erase (confirm=True)

Clears index


source

SparseStore.query

 SparseStore.query (q:str, fields:Sequence=['page_content'],
                    highlight:bool=True, limit:int=10, page:int=1,
                    return_dict:bool=False,
                    filters:Optional[Dict[str,str]]=None,
                    where_document:Optional[str]=None)

*Queries the index

Args

  • q: the query string
  • fields: a list of fields to search
  • highlight: If True, highlight hits
  • limit: results per page
  • page: page of hits to return
  • return_dict: If True, return list of dictionaries instead of LangChain Document objects
  • filters: filter results by field values (e.g., {‘extension’:‘pdf’})
  • where_document: optional query to further filter results*

source

VectorStore.check

 VectorStore.check ()

Raise exception if VectorStore.exists() returns False


source

VectorStore.ingest

 VectorStore.ingest (source_directory:str, chunk_size:int=500,
                     chunk_overlap:int=50,
                     ignore_fn:Optional[Callable]=None,
                     batch_size:int=41000, **kwargs)

Ingests all documents in source_directory (previously-ingested documents are ignored). When retrieved, the Document objects will each have a metadata dict with the absolute path to the file in metadata["source"]. Extra kwargs fed to ingest.load_single_document.

Type Default Details
source_directory str path to folder containing document store
chunk_size int 500 text is split to this many characters by langchain.text_splitter.RecursiveCharacterTextSplitter
chunk_overlap int 50 character overlap between chunks in langchain.text_splitter.RecursiveCharacterTextSplitter
ignore_fn Optional None Optional function that accepts the file path (including file name) as input and returns True if file path should not be ingested.
batch_size int 41000 batch size used when processing documents
kwargs VAR_KEYWORD
Returns None