Using Different Vector Stores

This example demonstrates how to use the VectorStoreFactory in OnPrem.LLM to easily create and experiment with different types of vector stores for your RAG (Retrieval-Augmented Generation) and semantic search applications.

The VectorStoreFactory provides a unified interface for creating three different types of vector stores, each optimized for different use cases:

ChromaStore (default): Dense vector search using embeddings for semantic search
WhooshStore: Sparse keyword search using full-text indexing with on-the-fly dense vector encoding for semantic search.
ElasticsearchStore: Unified hybrid search combining both dense and sparse approaches, including support for hybrid search using RRF.

This makes it easy to experiment with different search strategies and find the best approach for your specific data and use case.

Setup

First, let’s create some sample documents that we’ll use throughout our examples:

import tempfile
import os
from langchain_core.documents import Document
from onprem.ingest.stores import VectorStoreFactory

# Create some sample documents for our examples
sample_docs = [
    Document(
        page_content="Machine learning is a subset of artificial intelligence that enables computers to learn without explicit programming.",
        metadata={"source": "ml_intro.txt", "topic": "AI", "difficulty": "beginner"}
    ),
    Document(
        page_content="Deep learning uses neural networks with multiple layers to model and understand complex patterns in data.",
        metadata={"source": "dl_guide.txt", "topic": "AI", "difficulty": "intermediate"}
    ),
    Document(
        page_content="Natural language processing (NLP) enables computers to understand and process human language.",
        metadata={"source": "nlp_basics.txt", "topic": "AI", "difficulty": "beginner"}
    ),
    Document(
        page_content="Vector databases store high-dimensional vectors and enable similarity search for AI applications.",
        metadata={"source": "vector_db.txt", "topic": "databases", "difficulty": "intermediate"}
    ),
    Document(
        page_content="Retrieval-augmented generation (RAG) combines information retrieval with language generation for better AI responses.",
        metadata={"source": "rag_overview.txt", "topic": "AI", "difficulty": "advanced"}
    ),
    Document(
    page_content="Cats have five toes on their front paws, four on their back paws, and zero interest in your personal space..",
    metadata={"source": "cat_facts.txt", "topic": "cats", "difficulty": "advanced"}
    )
]

print(f"Created {len(sample_docs)} sample documents for testing")

Created 6 sample documents for testing

Example 1: ChromaStore (Dense Vector Search)

ChromaStore is the default option and excels at semantic similarity search. It’s perfect when you want to find documents that are conceptually similar to your query, even if they don’t share exact keywords.

# Create ChromaStore using the factory (default)
chroma_path = tempfile.mkdtemp()
chroma_store = VectorStoreFactory.create(
    kind='chroma',  # or just use default: VectorStoreFactory.create()
    persist_location=chroma_path
)

print(f"Created ChromaStore at: {chroma_path}")
print(f"Store type: {type(chroma_store).__name__}")

# Add documents
chroma_store.add_documents(sample_docs)
print(f"Added {len(sample_docs)} documents to ChromaStore")

# Test semantic search - look for documents about AI/ML
results = chroma_store.semantic_search("artificial intelligence and machine learning", limit=3)
print(f"\nSemantic search results for 'artificial intelligence and machine learning':")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content[:60]}... (from {doc.metadata['source']})")
    print(f"   Similarity score: {doc.metadata.get('score', 'N/A'):.3f}")

# Test semantic search - look for documents about felines
results = chroma_store.semantic_search("feline feet", limit=3)
print(f"\nSemantic search results for 'feline feet':")
for i, doc in enumerate(results, 1):
    print(f"{i}. {doc.page_content[:60]}... (from {doc.metadata['source']})")
    print(f"   Similarity score: {doc.metadata.get('score', 'N/A'):.3f}")

# Show that semantic search finds conceptually related content
print(f"\nSemantic search for 'computer intelligence' (no exact keyword matches):")
results = chroma_store.semantic_search("computer intelligence", limit=2)
for doc in results:
    print(f"- {doc.page_content[:60]}... (score: {doc.metadata.get('score', 'N/A'):.3f}, category: {doc.metadata.get('topic', 'N/A')})")

Created ChromaStore at: /tmp/tmpmlbc1286
Store type: ChromaStore
Creating embeddings. May take some minutes...

100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.51it/s]

Added 6 documents to ChromaStore

Semantic search results for 'artificial intelligence and machine learning':
1. Machine learning is a subset of artificial intelligence that... (from ml_intro.txt)
   Similarity score: 0.621
2. Deep learning uses neural networks with multiple layers to m... (from dl_guide.txt)
   Similarity score: 0.439
3. Vector databases store high-dimensional vectors and enable s... (from vector_db.txt)
   Similarity score: 0.357

Semantic search results for 'feline feet':
1. Cats have five toes on their front paws, four on their back ... (from cat_facts.txt)
   Similarity score: 0.538
2. Vector databases store high-dimensional vectors and enable s... (from vector_db.txt)
   Similarity score: 0.059
3. Natural language processing (NLP) enables computers to under... (from nlp_basics.txt)
   Similarity score: 0.030

Semantic search for 'computer intelligence' (no exact keyword matches):
- Machine learning is a subset of artificial intelligence that... (score: 0.524, category: AI)
- Natural language processing (NLP) enables computers to under... (score: 0.406, category: AI)

Example 2: WhooshStore (Sparse Keyword Search)

WhooshStore uses full-text search and is excellent for exact keyword matching and boolean queries. It’s faster for ingestion and works well when you know specific terms you’re looking for. Unlike ChromaStore, WhooshStore converts text to dense vectors on-the-fly for semantic searches. Since vectors are not computed at index time, ingestion is very fast.

# Create WhooshStore using the factory
whoosh_path = tempfile.mkdtemp()
whoosh_store = VectorStoreFactory.create(
    kind='whoosh',
    persist_location=whoosh_path
)

print(f"Created WhooshStore at: {whoosh_path}")
print(f"Store type: {type(whoosh_store).__name__}")

# Add documents
whoosh_store.add_documents(sample_docs)
print(f"Added {len(sample_docs)} documents to WhooshStore")

# Test keyword search - exact term matching
results = whoosh_store.query("neural networks", limit=3)
print(f"\nKeyword search results for 'neural networks':")
print(f"Total hits: {results['total_hits']}")
for i, hit in enumerate(results['hits'], 1):
    print(f"{i}. {hit['page_content'][:60]}... (from {hit['source']})")

# Show boolean search capabilities
results = whoosh_store.query("machine AND learning", limit=3)
print(f"\nBoolean search for 'machine AND learning':")
print(f"Total hits: {results['total_hits']}")
for hit in results['hits']:
    print(f"- {hit['page_content'][:60]}...")

# Test semantic search (uses embeddings on top of keyword results)
semantic_results = whoosh_store.semantic_search("feline feet", limit=2, filters={'topic' :'cats'})
print(f"\nSemantic search results for 'feline feet':")
for doc in semantic_results:
    print(f"- {doc.page_content[:60]}... (score: {doc.metadata.get('score', 'N/A'):.3f}, category: {doc.metadata.get('topic', 'N/A')})")
whoosh_store.erase(confirm=False)

Created WhooshStore at: /tmp/tmp94i3unhk
Store type: WhooshStore

100%|███████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1404.73it/s]

Added 6 documents to WhooshStore

Keyword search results for 'neural networks':
Total hits: 1
1. Deep learning uses neural networks with multiple layers to m... (from dl_guide.txt)

Boolean search for 'machine AND learning':
Total hits: 1
- Machine learning is a subset of artificial intelligence that...


Semantic search results for 'feline feet':
- Cats have five toes on their front paws, four on their back ... (score: 0.538, category: cats)

True

Example 3: ElasticsearchStore (Hybrid Search)

ElasticsearchStore combines both dense and sparse search capabilities in a single unified store. It can perform keyword search, semantic search, and hybrid search that combines both approaches.

Note: This example requires Elasticsearch to be running. These examples use Elasticsearch 8.15.5, but Elasticsearch 9.x is also supported.

You can download Elasticsearch and start it from command-line:

 ./elasticsearch-8.15.5/bin/elasticsearch

When starting Elasticsearch for the first time, make note of the password and set the following dictionary accordingly:

If you don’t have Elasticsearch installed, you can skip this section or also try setting it up using Docker:


# Elasticsearch 8.x with security disabled:
docker run -d --name elasticsearch -p 9200:9200 -e "discovery.type=single-node" -e "xpack.security.enabled=false" -e "xpack.security.http.ssl.enabled=false" elasticsearch:8.15.5

elastic_params = {'persist_location': 'https://localhost:9200', 
                  'index_name': 'demo_index', 
                  'verify_certs': True, 
                  'ca_certs': '/PATH/TO/ELASTIC_FOLDER/elasticsearch-8.15.5/config/certs/http_ca.crt', 
                  'basic_auth': ('elastic', 'YOUR_PASSWORD')}

# Create ElasticsearchStore using the factory
  # Note: This requires Elasticsearch to be running on localhost:9200
  try:
      elasticsearch_store = VectorStoreFactory.create(
          kind='elasticsearch', **elastic_params,
      )

      print(f"Created ElasticsearchStore")
      print(f"Store type: {type(elasticsearch_store).__name__}")

      # Add documents
      elasticsearch_store.add_documents(sample_docs)
      print(f"Added {len(sample_docs)} documents to ElasticsearchStore")

      # Test keyword search (sparse)
      search_results = elasticsearch_store.search("neural networks", limit=3)
      print(f"\nKeyword search results for 'neural networks':")
      print(f"Total hits: {search_results['total_hits']}")
      for hit in search_results['hits']:
          print(f"- {hit['page_content'][:60]}... (from {hit['source']})")

      # Test semantic search (dense)
      #semantic_results = elasticsearch_store.semantic_search("AI algorithms", limit=3)
      semantic_results = elasticsearch_store.semantic_search("artificial intelligence and machine learning", limit=3)

      print(f"\nSemantic search results for 'artificial intelligence and machine learning':")
      print(f"Total returned results: {len(semantic_results)}")
      for hit in semantic_results:
          # Show more precision in scores to see if they're actually different
          score = hit.metadata.get('score', 'N/A')
          score_str = f"{score:.6f}" if isinstance(score, (int, float)) else str(score)
          print(f"- {hit.page_content[:60]}... (score: {score_str}, category: {hit.metadata.get('topic', 'N/A')})")

      # Test semantic search (dense)
      semantic_results = elasticsearch_store.semantic_search("feline feet", limit=3)
      print(f"\nSemantic search results for 'feline feet':")
      print(f"Total results returned: {len(semantic_results)}")
      for hit in semantic_results:
          # Show more precision in scores to see if they're actually different
          score = hit.metadata.get('score', 'N/A')
          score_str = f"{score:.6f}" if isinstance(score, (int, float)) else str(score)
          print(f"- {hit.page_content[:60]}... (score: {score_str}, category: {hit.metadata.get('topic', 'N/A')})")
      
      # Test hybrid search (combines both dense and sparse)
      hybrid_results = elasticsearch_store.hybrid_search(
          "AI algorithms",
          limit=3,
          weights=[0.7, 0.3]  # 70% semantic, 30% keyword
      )
      print(f"\nHybrid search results for 'machine learning algorithms':")
      print(f"Total returned results: {len(hybrid_results)}")
      for hit in hybrid_results:
          score = hit.metadata.get('score', 'N/A')
          score_str = f"{score:.6f}" if isinstance(score, (int, float)) else str(score)
          print(f"- {hit.page_content[:60]}... (combined score: {score_str})")

      # Clean up
      elasticsearch_store.erase(confirm=False)
      print(f"\nCleaned up ElasticsearchStore")

  except Exception as e:
      print(f"ElasticsearchStore example skipped: {e}")
      print("Make sure Elasticsearch is running on localhost:9200")

Created ElasticsearchStore
Store type: ElasticsearchStore
Added 6 documents to ElasticsearchStore

Keyword search results for 'neural networks':
Total hits: 1
- Deep learning uses neural networks with multiple layers to m... (from dl_guide.txt)

Semantic search results for 'artificial intelligence and machine learning':
Total returned results: 3
- Machine learning is a subset of artificial intelligence that... (score: 0.621063, category: AI)
- Deep learning uses neural networks with multiple layers to m... (score: 0.439149, category: AI)
- Vector databases store high-dimensional vectors and enable s... (score: 0.357402, category: databases)

Semantic search results for 'feline feet':
Total results returned: 3
- Cats have five toes on their front paws, four on their back ... (score: 0.537507, category: cats)
- Vector databases store high-dimensional vectors and enable s... (score: 0.059024, category: databases)
- Natural language processing (NLP) enables computers to under... (score: 0.029732, category: AI)

Hybrid search results for 'machine learning algorithms':
Total returned results: 3
- Vector databases store high-dimensional vectors and enable s... (combined score: 0.598861)
- Retrieval-augmented generation (RAG) combines information re... (combined score: 0.355312)
- Machine learning is a subset of artificial intelligence that... (combined score: 0.309971)

Cleaned up ElasticsearchStore

Integration with LLM

The VectorStoreFactory works seamlessly with OnPrem.LLM for complete RAG (Retrieval-Augmented Generation) workflows.

By default, supplying store_type="dense" to LLM will use ChromaStore and supplying store_type="sparse" will use WhooshStore. If you supply store_type="dual", a hybrid vector store that uses both ChromaStore and WhooshStore is used.

The ElasticsearchStore is also a hybrid vector store in that it stores documents as both dense vectors and sparse vectors.

To use ElasticsearchStore like the one we used above, you can supply it to load_vectorstore as a custom vector store:

llm = LLM(...)
llm.load_vectorstore(custom_vectorstore=elasticsearch_store)

You can also implement and use your own custom VectorStore instances (by subclassing DenseStore, SparseStore, or DualStore) using whatever vector database backend you like.

For illustration purposes, in the example below, we explictly tell LLM to use WhooshStore as a custom vector store. (This is equivalent to supplying store_type="sparse" to LLM, but it shows how you would use LLM with Elasticsearch or your own custom vector store.)

# Example: Using VectorStoreFactory with LLM for RAG
print("🤖 Integration with OnPrem.LLM:")

# Create a simple document corpus
documents_dir = tempfile.mkdtemp()
doc_files = {
    "ai_overview.txt": "Artificial intelligence is transforming how we work and live. Machine learning enables computers to learn from data without explicit programming.",
    "ml_types.txt": "There are three main types of machine learning: supervised learning uses labeled data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns through trial and error.",
    "applications.txt": "AI applications include natural language processing for text analysis, computer vision for image recognition, and recommendation systems for personalized content."
}

# Write documents to files
for filename, content in doc_files.items():
    with open(os.path.join(documents_dir, filename), 'w') as f:
        f.write(content)

print(f"✓ Created {len(doc_files)} documents in {documents_dir}")

# Show how to use custom vector store with LLM
from onprem import LLM
from onprem.ingest.stores import VectorStoreFactory

# Create custom vector store
store = VectorStoreFactory.create('whoosh', persist_location='/tmp/my_search_index')

# Create LLM and use custom vector store
llm = LLM('openai/gpt-4o-mini', vectordb_path=tempfile.mkdtemp())
llm.load_vectorstore(custom_vectorstore=store)

# Ingest documents
llm.ingest(documents_dir)

print('\n\n----RAG EXAMPLE----')
# Ask questions
question = 'What are the types of machine learning?'
print(f'QUESTION: {question}')
print()
result = llm.ask(question)

print('\n\nSOURCES:')
for i, d in enumerate(result['source_documents']):
    print(f"source #{i+1}: {d.metadata['source']}")
store.erase(confirm=False)

🤖 Integration with OnPrem.LLM:
✓ Created 3 documents in /tmp/tmpjekc6pkt
Creating new vectorstore at /tmp/my_search_index
Loading documents from /tmp/tmpjekc6pkt

Loading new documents: 100%|█████████████████████| 3/3 [00:00<00:00, 175.48it/s]
Processing and chunking 3 new documents: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 248.67it/s]

Split into 3 chunks of text (max. 500 chars each for text; max. 2000 chars for tables)

100%|████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 983.81it/s]

Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods


----RAG EXAMPLE----
QUESTION: What are the types of machine learning?

The types of machine learning are:

1. Supervised learning - uses labeled data.
2. Unsupervised learning - finds patterns in unlabeled data.
3. Reinforcement learning - learns through trial and error.

SOURCES:
source #1: /tmp/tmpjekc6pkt/ml_types.txt
source #2: /tmp/tmpjekc6pkt/ai_overview.txt

True

Applying LLMs to Documents in Pre-Existing Search Engines

Many applications have documents already stored in a conventional Elasticsearch index with no vector embeddings. Surprsingly, you can still apply RAG and semantic sesarch to such documents despite the fact that they have not been preprocessed for generative AI.

RAG With an Existing Elasticsearch Index

The ElasticsearchSparseStore module in OnPrem.LLM allows you to point OnPrem.LLM to any Elasticsearch instance for RAG and semantic similarity applications.

You can do so by instantiating ElasticsearchSparseStore as follows:

from onprem.ingest.stores import VectorStoreFactory
store = VectorStoreFactory.create(
    kind='elasticsearch_sparse', 
    persist_location='https://localhost:9200',
    index_name='NAME_OF_YOUR_INDEX',
    # Map OnPrem.LLM field names to your existing field names
    content_field='content',      # Your content field name
    id_field='doc_id',            # Your ID field name
    source_field='filepath',      # Your source field name (optional)
    content_analyzer='english',   # Your analyzer (defaults to standard)
    # Optional: Authentication if needed
    basic_auth=('elastic', 'CHANGEME'),
    verify_certs=False, # change to True if you provide path to ES certs as we did above
    # Optional: Enable semantic search with dynamic chunking
    chunk_for_semantic_search=True,
    chunk_size=500,
    chunk_overlap=50.
    n_candidates=25,       # number of documents to inspect for answer (default: limit*10)
)

# traditional keyword search
results = store.search('"machine learning"', filters={'extension' : 'pdf') # assuming here you have an extension field in your index

# semantic searches (no vectors need to be indexed in your Elasticsearch instance!)
results = store.semantic_search('"machine learning"', return_chunks=False) # set return_chunks=True for RAG applications
# best matching chunk from document
best_chunk_id =  results[0].metadata['best_chunk_idx']
print(results[0].metadata['chunks'][best_chunk_id]

# OUTPUT: 'of the machine learning (ML) workflow such as data-preprocessing and human-in-the-loop
#          model tuning and inspection. Following inspiration from a blog post by Rachel Thomas of
#          fast.ai (Howard and Gugger, 2020), we refer to this as Augmented Machine Learning.'

# RAG
from onprem import LLM
llm = LLM(n_gpu_layers=-1)
llm.load_vectorstore(custom_vectorstore=elasticsearch_store)
result = llm.ask('What is machine learning?')

The interesting thing in this example above is that:

Embeddings do not have to be stored in the Elasticsearch index and are computed dynamically.
Documents do not even need to be pre-chunked in your index.

RAG With SharePoint Documents

You can also point OnPrem.LLM to SharePoint documents.

# connect to SharePoint
from onprem.ingest import VectorStoreFactory
connection_params={'persist_location':"https://sharepoint.YOUR_ORGANIZATION.org", # URL of your SharePoint site
                   'username':os.getenv('USERNAME'), # e.g., CORP\username
                   'password':os.getenv('PASSWORD'),
                    'n_candidates':10}  # maximum number of Sharepoint documents to inspect for answer (default: limit*10)
store = VectorStoreFactory.create('sharepoint', **connection_params)

# traditional keyword search (results are entire documents)
results = store.search('"generative AI" AND "material science"', where_document="NSF", limit=10)

# semantic search (results are text chunks from entire documents)
results = store.semantic_search('Can generative AI be applied to material science?', where_document='NSF AND "material science"', limit=4)

# RAG
from onprem import LLM
llm = LLM(n_gpu_layers=-1, verbose=0)
llm.load_vectorstore(custom_vectorstore=store)
result = llm.ask('Can generative AI be applied to material science?', limit=4, where_document='NSF AND "material science"')

For RAG with SharePoint, we offer the following recommendations: 1. Many SharePoint sites are configured to not return the indexed text content as part of the query results. In these situations, OnPrem.LLM will attempt to download the documents from SharePoint and perform real-time text extraction and text chunking. For these reasons, a lower n_candidates value is recommended (see above). 2. SharePoint Search uses the Keyword Query Language (KQL) — a proprietary query language designed by Microsoft for SharePoint and other Microsoft search products (like Exchange and Microsoft Search). KQL is missing some features that are useful in yielding relevant results. For these reasons, we recommend you help the LLM target the right documents by provding a supplemental query to filter documents via the where_documents argument, as we did above.

# Clean up temporary directories
import shutil

temp_dirs = [chroma_path, whoosh_path, documents_dir]
for temp_dir in temp_dirs:
    try:
        shutil.rmtree(temp_dir)
    except:
        pass
        
print("🧹 Cleaned up temporary directories")

🧹 Cleaned up temporary directories