This example demonstrates how to use the VectorStoreFactory in OnPrem.LLM to easily create and experiment with different types of vector stores for your RAG (Retrieval-Augmented Generation) and semantic search applications.
The VectorStoreFactory provides a unified interface for creating three different types of vector stores, each optimized for different use cases:
ChromaStore (default): Dense vector search using embeddings for semantic search
WhooshStore: Sparse keyword search using full-text indexing with on-the-fly dense vector encoding for semantic search.
ElasticsearchStore: Unified hybrid search combining both dense and sparse approaches, including support for hybrid search using RRF.
This makes it easy to experiment with different search strategies and find the best approach for your specific data and use case.
Setup
First, let’s create some sample documents that we’ll use throughout our examples:
import tempfileimport osfrom langchain_core.documents import Documentfrom onprem.ingest.stores import VectorStoreFactory# Create some sample documents for our examplessample_docs = [ Document( page_content="Machine learning is a subset of artificial intelligence that enables computers to learn without explicit programming.", metadata={"source": "ml_intro.txt", "topic": "AI", "difficulty": "beginner"} ), Document( page_content="Deep learning uses neural networks with multiple layers to model and understand complex patterns in data.", metadata={"source": "dl_guide.txt", "topic": "AI", "difficulty": "intermediate"} ), Document( page_content="Natural language processing (NLP) enables computers to understand and process human language.", metadata={"source": "nlp_basics.txt", "topic": "AI", "difficulty": "beginner"} ), Document( page_content="Vector databases store high-dimensional vectors and enable similarity search for AI applications.", metadata={"source": "vector_db.txt", "topic": "databases", "difficulty": "intermediate"} ), Document( page_content="Retrieval-augmented generation (RAG) combines information retrieval with language generation for better AI responses.", metadata={"source": "rag_overview.txt", "topic": "AI", "difficulty": "advanced"} ), Document( page_content="Cats have five toes on their front paws, four on their back paws, and zero interest in your personal space..", metadata={"source": "cat_facts.txt", "topic": "cats", "difficulty": "advanced"} )]print(f"Created {len(sample_docs)} sample documents for testing")
Created 6 sample documents for testing
Example 1: ChromaStore (Dense Vector Search)
ChromaStore is the default option and excels at semantic similarity search. It’s perfect when you want to find documents that are conceptually similar to your query, even if they don’t share exact keywords.
# Create ChromaStore using the factory (default)chroma_path = tempfile.mkdtemp()chroma_store = VectorStoreFactory.create( kind='chroma', # or just use default: VectorStoreFactory.create() persist_location=chroma_path)print(f"Created ChromaStore at: {chroma_path}")print(f"Store type: {type(chroma_store).__name__}")# Add documentschroma_store.add_documents(sample_docs)print(f"Added {len(sample_docs)} documents to ChromaStore")# Test semantic search - look for documents about AI/MLresults = chroma_store.semantic_search("artificial intelligence and machine learning", limit=3)print(f"\nSemantic search results for 'artificial intelligence and machine learning':")for i, doc inenumerate(results, 1):print(f"{i}. {doc.page_content[:60]}... (from {doc.metadata['source']})")print(f" Similarity score: {doc.metadata.get('score', 'N/A'):.3f}")# Test semantic search - look for documents about felinesresults = chroma_store.semantic_search("feline feet", limit=3)print(f"\nSemantic search results for 'feline feet':")for i, doc inenumerate(results, 1):print(f"{i}. {doc.page_content[:60]}... (from {doc.metadata['source']})")print(f" Similarity score: {doc.metadata.get('score', 'N/A'):.3f}")# Show that semantic search finds conceptually related contentprint(f"\nSemantic search for 'computer intelligence' (no exact keyword matches):")results = chroma_store.semantic_search("computer intelligence", limit=2)for doc in results:print(f"- {doc.page_content[:60]}... (score: {doc.metadata.get('score', 'N/A'):.3f}, category: {doc.metadata.get('topic', 'N/A')})")
Created ChromaStore at: /tmp/tmp9k2it641
Store type: ChromaStore
Creating embeddings. May take some minutes...
Added 6 documents to ChromaStore
Semantic search results for 'artificial intelligence and machine learning':
1. Machine learning is a subset of artificial intelligence that... (from ml_intro.txt)
Similarity score: 0.621
2. Deep learning uses neural networks with multiple layers to m... (from dl_guide.txt)
Similarity score: 0.439
3. Vector databases store high-dimensional vectors and enable s... (from vector_db.txt)
Similarity score: 0.357
Semantic search results for 'feline feet':
1. Cats have five toes on their front paws, four on their back ... (from cat_facts.txt)
Similarity score: 0.538
2. Vector databases store high-dimensional vectors and enable s... (from vector_db.txt)
Similarity score: 0.059
3. Natural language processing (NLP) enables computers to under... (from nlp_basics.txt)
Similarity score: 0.030
Semantic search for 'computer intelligence' (no exact keyword matches):
- Machine learning is a subset of artificial intelligence that... (score: 0.524, category: AI)
- Natural language processing (NLP) enables computers to under... (score: 0.406, category: AI)
Example 2: WhooshStore (Sparse Keyword Search)
WhooshStore uses full-text search and is excellent for exact keyword matching and boolean queries. It’s faster for ingestion and works well when you know specific terms you’re looking for. Unlike ChromaStore, WhooshStore converts text to dense vectors on-the-fly for semantic searches. Since vectors are not computed at index time, ingestion is very fast.
# Create WhooshStore using the factorywhoosh_path = tempfile.mkdtemp()whoosh_store = VectorStoreFactory.create( kind='whoosh', persist_location=whoosh_path)print(f"Created WhooshStore at: {whoosh_path}")print(f"Store type: {type(whoosh_store).__name__}")# Add documentswhoosh_store.add_documents(sample_docs)print(f"Added {len(sample_docs)} documents to WhooshStore")# Test keyword search - exact term matchingresults = whoosh_store.query("neural networks", limit=3)print(f"\nKeyword search results for 'neural networks':")print(f"Total hits: {results['total_hits']}")for i, hit inenumerate(results['hits'], 1):print(f"{i}. {hit['page_content'][:60]}... (from {hit['source']})")# Show boolean search capabilitiesresults = whoosh_store.query("machine AND learning", limit=3)print(f"\nBoolean search for 'machine AND learning':")print(f"Total hits: {results['total_hits']}")for hit in results['hits']:print(f"- {hit['page_content'][:60]}...")# Test semantic search (uses embeddings on top of keyword results)semantic_results = whoosh_store.semantic_search("feline feet", limit=2, filters={'topic' :'cats'})print(f"\nSemantic search results for 'feline feet':")for doc in semantic_results:print(f"- {doc.page_content[:60]}... (score: {doc.metadata.get('score', 'N/A'):.3f}, category: {doc.metadata.get('topic', 'N/A')})")whoosh_store.erase(confirm=False)
Created WhooshStore at: /tmp/tmprihpjo79
Store type: WhooshStore
Added 6 documents to WhooshStore
Keyword search results for 'neural networks':
Total hits: 1
1. Deep learning uses neural networks with multiple layers to m... (from dl_guide.txt)
Boolean search for 'machine AND learning':
Total hits: 1
- Machine learning is a subset of artificial intelligence that...
Semantic search results for 'feline feet':
- Cats have five toes on their front paws, four on their back ... (score: 0.538, category: cats)
True
Example 3: ElasticsearchStore (Hybrid Search)
ElasticsearchStore combines both dense and sparse search capabilities in a single unified store. It can perform keyword search, semantic search, and hybrid search that combines both approaches.
Note: This example requires Elasticsearch to be running. These examples use Elasticsearch 8.15.5, but Elasticsearch 9.x is also supported.
You can download Elasticsearch and start it from command-line:
./elasticsearch-8.15.5/bin/elasticsearch
When starting Elasticsearch for the first time, make note of the password and set the following dictionary accordingly:
If you don’t have Elasticsearch installed, you can skip this section or also try setting it up using Docker:
# Elasticsearch 8.x with security disabled:docker run -d--name elasticsearch -p 9200:9200 -e"discovery.type=single-node"-e"xpack.security.enabled=false"-e"xpack.security.http.ssl.enabled=false" elasticsearch:8.15.5
# Create ElasticsearchStore using the factory# Note: This requires Elasticsearch to be running on localhost:9200try: elasticsearch_store = VectorStoreFactory.create( kind='elasticsearch', **elastic_params, )print(f"Created ElasticsearchStore")print(f"Store type: {type(elasticsearch_store).__name__}")# Add documents elasticsearch_store.add_documents(sample_docs)print(f"Added {len(sample_docs)} documents to ElasticsearchStore")# Test keyword search (sparse) search_results = elasticsearch_store.search("neural networks", limit=3)print(f"\nKeyword search results for 'neural networks':")print(f"Total hits: {search_results['total_hits']}")for hit in search_results['hits']:print(f"- {hit['page_content'][:60]}... (from {hit['source']})")# Test semantic search (dense)#semantic_results = elasticsearch_store.semantic_search("AI algorithms", limit=3) semantic_results = elasticsearch_store.semantic_search("artificial intelligence and machine learning", limit=3)print(f"\nSemantic search results for 'artificial intelligence and machine learning':")print(f"Total hits: {semantic_results['total_hits']}")for hit in semantic_results['hits']:# Show more precision in scores to see if they're actually different score = hit.get('score', 'N/A') score_str =f"{score:.6f}"ifisinstance(score, (int, float)) elsestr(score)print(f"- {hit['page_content'][:60]}... (score: {score_str}, category: {hit.get('topic', 'N/A')})")# Test semantic search (dense) semantic_results = elasticsearch_store.semantic_search("feline feet", limit=3)print(f"\nSemantic search results for 'feline feet':")print(f"Total hits: {semantic_results['total_hits']}")for hit in semantic_results['hits']:# Show more precision in scores to see if they're actually different score = hit.get('score', 'N/A') score_str =f"{score:.6f}"ifisinstance(score, (int, float)) elsestr(score)print(f"- {hit['page_content'][:60]}... (score: {score_str}, category: {hit.get('topic', 'N/A')})")# Test hybrid search (combines both dense and sparse) hybrid_results = elasticsearch_store.hybrid_search("AI algorithms", limit=3, weights=[0.7, 0.3] # 70% semantic, 30% keyword )print(f"\nHybrid search results for 'machine learning algorithms':")print(f"Total hits: {hybrid_results['total_hits']}")for hit in hybrid_results['hits']: score = hit.get('score', 'N/A') score_str =f"{score:.6f}"ifisinstance(score, (int, float)) elsestr(score)print(f"- {hit['page_content'][:60]}... (combined score: {score_str})")# Clean up elasticsearch_store.erase(confirm=False)print(f"\nCleaned up ElasticsearchStore")exceptExceptionas e:print(f"ElasticsearchStore example skipped: {e}")print("Make sure Elasticsearch is running on localhost:9200")
Created ElasticsearchStore
Store type: ElasticsearchStore
Added 6 documents to ElasticsearchStore
Keyword search results for 'neural networks':
Total hits: 1
- Deep learning uses neural networks with multiple layers to m... (from dl_guide.txt)
Semantic search results for 'artificial intelligence and machine learning':
Total hits: 6
- Machine learning is a subset of artificial intelligence that... (score: 0.621063, category: AI)
- Deep learning uses neural networks with multiple layers to m... (score: 0.439149, category: AI)
- Vector databases store high-dimensional vectors and enable s... (score: 0.357402, category: databases)
Semantic search results for 'feline feet':
Total hits: 6
- Cats have five toes on their front paws, four on their back ... (score: 0.537507, category: cats)
- Vector databases store high-dimensional vectors and enable s... (score: 0.059024, category: databases)
- Natural language processing (NLP) enables computers to under... (score: 0.029732, category: AI)
Hybrid search results for 'machine learning algorithms':
Total hits: 3
- Vector databases store high-dimensional vectors and enable s... (combined score: 0.598861)
- Retrieval-augmented generation (RAG) combines information re... (combined score: 0.355312)
- Machine learning is a subset of artificial intelligence that... (combined score: 0.309971)
Cleaned up ElasticsearchStore
Advanced Use Cases with Elasticsearch
Many applications have documents already stored in conventional Elasticsearch index with no vector embeddings. The ElasticsearchSparseStore module in OnPrem.LLM allows you to point OnPrem.LLM to any Elasticsearch instance for RAG and semantic similarity applications.
from onprem.ingest.stores.sparse import ElasticsearchSparseStorestore = ElasticsearchSparseStore( persist_location='https://localhost:9200', index_name='NAME_OF_YOUR_INDEX',# Map OnPrem.LLM field names to your existing field names content_field='content', # Your content field name id_field='doc_id', # Your ID field name source_field='filepath', # Your source field name (optional) content_analyzer='english', # Your analyzer (defaults to standard)# Optional: Authentication if needed basic_auth=('elastic', 'CHANGEME'), verify_certs=False, # change to True if you provide path to ES certs as we did above# Optional: Enable semantic search with dynamic chunking chunk_for_semantic_search=True, chunk_size=500, chunk_overlap=50)# traditional keyword searchresults = store.search('"machine learning"', filters={'extension' : 'pdf') # assuming here you have an extension field in your index# semantic searches (no vectors need to be indexed in your Elasticsearch instance!)results = store.semantic_search('"machine learning"', return_chunks=False) # set return_chunks=True for RAG applications# best matching chunk from documentbest_chunk_id = results[0].metadata['best_chunk_idx']print(results[0].metadata['chunks'][best_chunk_id]# OUTPUT: 'of the machine learning (ML) workflow such as data-preprocessing and human-in-the-loop# model tuning and inspection. Following inspiration from a blog post by Rachel Thomas of# fast.ai (Howard and Gugger, 2020), we refer to this as Augmented Machine Learning.'
The interesting thing in this example above is that:
Embeddings do not have to be stored in the Elasticsearch index and are computed dynamically.
Documents do not even need to be pre-chunked in your index.
Integration with LLM
The VectorStoreFactory works seamlessly with OnPrem.LLM for complete RAG (Retrieval-Augmented Generation) workflows.
By default, supplying store_type="dense" to LLM will use ChromaStore and supplying store_type="sparse" will use WhooshStore. To use ElasticsearchStore, you can supply it to load_vectorstore as a custom vector store:
You can also implement and use your own custom VectorStore instances (by subclassing DenseStore, SparseStore, or DualStore) using whatever vector database backend you like.
For illustration purposes, in the example below, we explictly tell LLM to use WhooshStore as a custom vector store. (This is equivalent to supplying store_type="sparse" to LLM, but it shows how you would use LLM with Elasticsearch or your own custom vector store.)
# Example: Using VectorStoreFactory with LLM for RAGprint("🤖 Integration with OnPrem.LLM:")# Create a simple document corpusdocuments_dir = tempfile.mkdtemp()doc_files = {"ai_overview.txt": "Artificial intelligence is transforming how we work and live. Machine learning enables computers to learn from data without explicit programming.","ml_types.txt": "There are three main types of machine learning: supervised learning uses labeled data, unsupervised learning finds patterns in unlabeled data, and reinforcement learning learns through trial and error.","applications.txt": "AI applications include natural language processing for text analysis, computer vision for image recognition, and recommendation systems for personalized content."}# Write documents to filesfor filename, content in doc_files.items():withopen(os.path.join(documents_dir, filename), 'w') as f: f.write(content)print(f"✓ Created {len(doc_files)} documents in {documents_dir}")# Show how to use custom vector store with LLMfrom onprem import LLMfrom onprem.ingest.stores import VectorStoreFactory# Create custom vector storestore = VectorStoreFactory.create('whoosh', persist_location='/tmp/my_search_index')# Create LLM and use custom vector storellm = LLM('openai/gpt-4o-mini', vectordb_path=tempfile.mkdtemp())llm.load_vectorstore(custom_vectorstore=store)# Ingest documentsllm.ingest(documents_dir)print('\n\n----RAG EXAMPLE----')# Ask questionsquestion ='What are the types of machine learning?'print(f'QUESTION: {question}')print()result = llm.ask(question)print('\n\nSOURCES:')for i, d inenumerate(result['source_documents']):print(f"source #{i+1}: {d.metadata['source']}")store.erase(confirm=False)
🤖 Integration with OnPrem.LLM:
✓ Created 3 documents in /tmp/tmpjekc6pkt
Creating new vectorstore at /tmp/my_search_index
Loading documents from /tmp/tmpjekc6pkt
Loading new documents: 100%|█████████████████████| 3/3 [00:00<00:00, 175.48it/s]
Processing and chunking 3 new documents: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 248.67it/s]
Split into 3 chunks of text (max. 500 chars each for text; max. 2000 chars for tables)
Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods
----RAG EXAMPLE----
QUESTION: What are the types of machine learning?
The types of machine learning are:
1. Supervised learning - uses labeled data.
2. Unsupervised learning - finds patterns in unlabeled data.
3. Reinforcement learning - learns through trial and error.
SOURCES:
source #1: /tmp/tmpjekc6pkt/ml_types.txt
source #2: /tmp/tmpjekc6pkt/ai_overview.txt
True
# Clean up temporary directoriesimport shutiltemp_dirs = [chroma_path, whoosh_path, documents_dir]for temp_dir in temp_dirs:try: shutil.rmtree(temp_dir)except:passprint("🧹 Cleaned up temporary directories")