Using Different Vector Stores

This example demonstrates how to use the VectorStoreFactory in OnPrem.LLM to easily create and experiment with different types of vector stores for your RAG (Retrieval-Augmented Generation) and semantic search applications.

The VectorStoreFactory provides a unified interface for creating three different types of vector stores, each optimized for different use cases:

This makes it easy to experiment with different search strategies and find the best approach for your specific data and use case.

Setup

First, let’s create some sample documents that we’ll use throughout our examples:

import tempfile
import os
from langchain_core.documents import Document
from onprem.ingest.stores import VectorStoreFactory

# Create some sample documents for our examples
sample_docs = [
    Document(
        page_content="Machine learning is a subset of artificial intelligence that enables computers to learn without explicit programming.",
        metadata={"source": "ml_intro.txt", "topic": "AI", "difficulty": "beginner"}
    ),
    Document(
        page_content="Deep learning uses neural networks with multiple layers to model and understand complex patterns in data.",
        metadata={"source": "dl_guide.txt", "topic": "AI", "difficulty": "intermediate"}
    ),
    Document(
        page_content="Natural language processing (NLP) enables computers to understand and process human language.",
        metadata={"source": "nlp_basics.txt", "topic": "AI", "difficulty": "beginner"}
    ),
    Document(
        page_content="Vector databases store high-dimensional vectors and enable similarity search for AI applications.",
        metadata={"source": "vector_db.txt", "topic": "databases", "difficulty": "intermediate"}
    ),
    Document(
        page_content="Retrieval-augmented generation (RAG) combines information retrieval with language generation for better AI responses.",
        metadata={"source": "rag_overview.txt", "topic": "AI", "difficulty": "advanced"}
    ),
    Document(
    page_content="Cats have five toes on their front paws, four on their back paws, and zero interest in your personal space..",
    metadata={"source": "cat_facts.txt", "topic": "cats", "difficulty": "advanced"}
    )
]

print(f"Created {len(sample_docs)} sample documents for testing")
Created 6 sample documents for testing

Integration with LLM

The VectorStoreFactory works seamlessly with OnPrem.LLM for complete RAG (Retrieval-Augmented Generation) workflows.

By default, supplying store_type="dense" to LLM will use ChromaStore and supplying store_type="sparse" will use WhooshStore. If you supply store_type="dual", a hybrid vector store that uses both ChromaStore and WhooshStore is used.

The ElasticsearchStore is also a hybrid vector store in that it stores documents as both dense vectors and sparse vectors.

To use ElasticsearchStore like the one we used above, you can supply it to load_vectorstore as a custom vector store:

llm = LLM(...)
llm.load_vectorstore(custom_vectorstore=elasticsearch_store)

You can also implement and use your own custom VectorStore instances (by subclassing DenseStore, SparseStore, or DualStore) using whatever vector database backend you like.

For illustration purposes, in the example below, we explictly tell LLM to use WhooshStore as a custom vector store. (This is equivalent to supplying store_type="sparse" to LLM, but it shows how you would use LLM with Elasticsearch or your own custom vector store.)

# Example: Using VectorStoreFactory with LLM for RAG
print("🤖 Integration with OnPrem.LLM:")

# Create a simple document corpus
documents_dir = tempfile.mkdtemp()
doc_files = {
    "ai_overview.txt": "Artificial intelligence is transforming how we work and live. Machine learning enables computers to learn from data without explicit programming.",
    "ml_types.txt": "There are three main types of machine learning: supervised learning uses labeled data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns through trial and error.",
    "applications.txt": "AI applications include natural language processing for text analysis, computer vision for image recognition, and recommendation systems for personalized content."
}

# Write documents to files
for filename, content in doc_files.items():
    with open(os.path.join(documents_dir, filename), 'w') as f:
        f.write(content)

print(f"✓ Created {len(doc_files)} documents in {documents_dir}")

# Show how to use custom vector store with LLM
from onprem import LLM
from onprem.ingest.stores import VectorStoreFactory

# Create custom vector store
store = VectorStoreFactory.create('whoosh', persist_location='/tmp/my_search_index')

# Create LLM and use custom vector store
llm = LLM('openai/gpt-4o-mini', vectordb_path=tempfile.mkdtemp())
llm.load_vectorstore(custom_vectorstore=store)

# Ingest documents
llm.ingest(documents_dir)

print('\n\n----RAG EXAMPLE----')
# Ask questions
question = 'What are the types of machine learning?'
print(f'QUESTION: {question}')
print()
result = llm.ask(question)

print('\n\nSOURCES:')
for i, d in enumerate(result['source_documents']):
    print(f"source #{i+1}: {d.metadata['source']}")
store.erase(confirm=False)
🤖 Integration with OnPrem.LLM:
✓ Created 3 documents in /tmp/tmpjekc6pkt
Creating new vectorstore at /tmp/my_search_index
Loading documents from /tmp/tmpjekc6pkt
Loading new documents: 100%|█████████████████████| 3/3 [00:00<00:00, 175.48it/s]
Processing and chunking 3 new documents: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 248.67it/s]
Split into 3 chunks of text (max. 500 chars each for text; max. 2000 chars for tables)
100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 983.81it/s]
Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods


----RAG EXAMPLE----
QUESTION: What are the types of machine learning?

The types of machine learning are:

1. Supervised learning - uses labeled data.
2. Unsupervised learning - finds patterns in unlabeled data.
3. Reinforcement learning - learns through trial and error.

SOURCES:
source #1: /tmp/tmpjekc6pkt/ml_types.txt
source #2: /tmp/tmpjekc6pkt/ai_overview.txt
True

Applying LLMs to Documents in Pre-Existing Search Engines

Many applications have documents already stored in a conventional Elasticsearch index with no vector embeddings. Surprsingly, you can still apply RAG and semantic sesarch to such documents despite the fact that they have not been preprocessed for generative AI.

RAG With an Existing Elasticsearch Index

The ElasticsearchSparseStore module in OnPrem.LLM allows you to point OnPrem.LLM to any Elasticsearch instance for RAG and semantic similarity applications.

You can do so by instantiating ElasticsearchSparseStore as follows:

from onprem.ingest.stores import VectorStoreFactory
store = VectorStoreFactory.create(
    kind='elasticsearch_sparse', 
    persist_location='https://localhost:9200',
    index_name='NAME_OF_YOUR_INDEX',
    # Map OnPrem.LLM field names to your existing field names
    content_field='content',      # Your content field name
    id_field='doc_id',            # Your ID field name
    source_field='filepath',      # Your source field name (optional)
    content_analyzer='english',   # Your analyzer (defaults to standard)
    # Optional: Authentication if needed
    basic_auth=('elastic', 'CHANGEME'),
    verify_certs=False, # change to True if you provide path to ES certs as we did above
    # Optional: Enable semantic search with dynamic chunking
    chunk_for_semantic_search=True,
    chunk_size=500,
    chunk_overlap=50.
    n_candidates=25,       # number of documents to inspect for answer (default: limit*10)
)

# traditional keyword search
results = store.search('"machine learning"', filters={'extension' : 'pdf') # assuming here you have an extension field in your index

# semantic searches (no vectors need to be indexed in your Elasticsearch instance!)
results = store.semantic_search('"machine learning"', return_chunks=False) # set return_chunks=True for RAG applications
# best matching chunk from document
best_chunk_id =  results[0].metadata['best_chunk_idx']
print(results[0].metadata['chunks'][best_chunk_id]

# OUTPUT: 'of the machine learning (ML) workflow such as data-preprocessing and human-in-the-loop
#          model tuning and inspection. Following inspiration from a blog post by Rachel Thomas of
#          fast.ai (Howard and Gugger, 2020), we refer to this as Augmented Machine Learning.'

# RAG
from onprem import LLM
llm = LLM(n_gpu_layers=-1)
llm.load_vectorstore(custom_vectorstore=elasticsearch_store)
result = llm.ask('What is machine learning?')

The interesting thing in this example above is that:

  1. Embeddings do not have to be stored in the Elasticsearch index and are computed dynamically.
  2. Documents do not even need to be pre-chunked in your index.

RAG With SharePoint Documents

You can also point OnPrem.LLM to SharePoint documents.

# connect to SharePoint
from onprem.ingest import VectorStoreFactory
connection_params={'persist_location':"https://sharepoint.YOUR_ORGANIZATION.org", # URL of your SharePoint site
                   'username':os.getenv('USERNAME'), # e.g., CORP\username
                   'password':os.getenv('PASSWORD'),
                    'n_candidates':10}  # maximum number of Sharepoint documents to inspect for answer (default: limit*10)
store = VectorStoreFactory.create('sharepoint', **connection_params)

# traditional keyword search (results are entire documents)
results = store.search('"generative AI" AND "material science"', where_document="NSF", limit=10)

# semantic search (results are text chunks from entire documents)
results = store.semantic_search('Can generative AI be applied to material science?', where_document='NSF AND "material science"', limit=4)

# RAG
from onprem import LLM
llm = LLM(n_gpu_layers=-1, verbose=0)
llm.load_vectorstore(custom_vectorstore=store)
result = llm.ask('Can generative AI be applied to material science?', limit=4, where_document='NSF AND "material science"')

For RAG with SharePoint, we offer the following recommendations: 1. Many SharePoint sites are configured to not return the indexed text content as part of the query results. In these situations, OnPrem.LLM will attempt to download the documents from SharePoint and perform real-time text extraction and text chunking. For these reasons, a lower n_candidates value is recommended (see above). 2. SharePoint Search uses the Keyword Query Language (KQL) — a proprietary query language designed by Microsoft for SharePoint and other Microsoft search products (like Exchange and Microsoft Search). KQL is missing some features that are useful in yielding relevant results. For these reasons, we recommend you help the LLM target the right documents by provding a supplemental query to filter documents via the where_documents argument, as we did above.

# Clean up temporary directories
import shutil

temp_dirs = [chroma_path, whoosh_path, documents_dir]
for temp_dir in temp_dirs:
    try:
        shutil.rmtree(temp_dir)
    except:
        pass
        
print("🧹 Cleaned up temporary directories")
🧹 Cleaned up temporary directories