from onprem import LLM
from onprem.pipelines import KVRouter
import tempfilepipelines.rag
RAGPipeline
RAGPipeline (llm, qa_template:str="Use the following pieces of context delimited by three backticks to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n```{context}```\n\nQuestion: {question}\nHelpful Answer:")
Retrieval-Augmented Generation pipeline for answering questions based on source documents.
RAGPipeline.ask
RAGPipeline.ask (question:str, contexts:Optional[list]=None, qa_template:Optional[str]=None, filters:Optional[Dict[str,str]]=None, where_document=None, folders:Optional[list]=None, limit:Optional[int]=None, score_threshold:Optional[float]=None, table_k:int=1, table_score_threshold:float=0.35, selfask:bool=False, router=None, **kwargs)
Answer a question using RAG approach.
Args: question: Question to answer contexts: Optional list of contexts. If None, retrieve from vectordb qa_template: Optional custom QA prompt template filters: Filter sources by metadata values where_document: Filter sources by document content folders: Folders to search limit: Number of sources to consider score_threshold: Minimum similarity score table_k: Maximum number of tables to consider table_score_threshold: Minimum similarity score for tables selfask: Use agentic Self-Ask prompting strategy **kwargs: Additional arguments passed to LLM.prompt
Returns: Dictionary with keys: answer, source_documents, question
| Type | Default | Details | |
|---|---|---|---|
| question | str | question as string | |
| contexts | Optional | None | optional list of contexts to answer question. If None, retrieve from vectordb. |
| qa_template | Optional | None | question-answering prompt template to use |
| filters | Optional | None | filter sources by metadata values using Chroma metadata syntax (e.g., {‘table’:True}) |
| where_document | NoneType | None | filter sources by document content (syntax varies by store type) |
| folders | Optional | None | folders to search (needed because LangChain does not forward “where” parameter) |
| limit | Optional | None | Number of sources to consider. If None, use LLM.rag_num_source_docs. |
| score_threshold | Optional | None | minimum similarity score of source. If None, use LLM.rag_score_threshold. |
| table_k | int | 1 | maximum number of tables to consider when generating answer |
| table_score_threshold | float | 0.35 | minimum similarity score for table to be considered in answer |
| selfask | bool | False | If True, use an agentic Self-Ask prompting strategy. |
| router | NoneType | None | Optional KVRouter instance for automatic filtering |
| kwargs | VAR_KEYWORD | ||
| Returns | Dict |
RAGPipeline.semantic_search
RAGPipeline.semantic_search (query:str, limit:int=4, score_threshold:float=0.0, filters:Optional[Dict[str,str]]=None, where_document=None, folders:Optional[list]=None, **kwargs)
Perform a semantic search of the vector DB.
The where_document parameter varies depending on the value of LLM.store_type. If LLM.store_type is ‘dense’, then where_document should be a dictionary in Chroma syntax (e.g., {“$contains”: “Canada”}) to filter results. If LLM.store_type is ‘sparse’, then where_document should be a boolean search string to filter query in Lucene syntax.
| Type | Default | Details | |
|---|---|---|---|
| query | str | search query as string | |
| limit | int | 4 | number of sources to retrieve |
| score_threshold | float | 0.0 | minimum threshold for score |
| filters | Optional | None | metadata filters |
| where_document | NoneType | None | filter search results based syntax of underlying store |
| folders | Optional | None | list of folders to consider |
| kwargs | VAR_KEYWORD | ||
| Returns | List |
RAGPipeline.needs_followup
RAGPipeline.needs_followup (question:str, parse=True, **kwargs)
Decide if follow-up questions are needed
RAGPipeline.decompose_question
RAGPipeline.decompose_question (question:str, parse=True, **kwargs)
Decompose a question into subquestions
KVRouter
KVRouter (field_name:str, field_descriptions:Dict[str,str], llm, router_prompt:str="Given the following query/question, select the most appropriate category that would contain the relevant information.\n\nQuery: {question}\n\nAvailable categories:\n{categories}\n\nSelect the best category from the list above, or 'none' if no category is appropriate.")
Key-Value Router for intelligent filtering based on query content.
Uses an LLM to select the most appropriate field value for filtering based on the query/question content.
CategorySelection
CategorySelection (category:str)
Pydantic model for category selection response.
KVRouter.route
KVRouter.route (question:str)
Select the best field value for the given question.
Args: question: The user’s question/query
Returns: Dictionary for filters parameter, or None if no appropriate category Example: {‘folder’: ‘sotu’} or None
KVRouter.route_and_search
KVRouter.route_and_search (query:str, rag_pipeline, **search_kwargs)
Convenience method that routes and performs semantic search.
Args: query: The search query rag_pipeline: RAGPipeline instance to search with **search_kwargs: Additional arguments passed to semantic_search
Returns: List of Document objects
Example: Using Query Routing with RAG
In this example, we use the KVRouter to route RAG queries to the correct set of ingested documents.
First, when we ingest documents, we assign a folder field to each document chunk. (You can also use the text_callables parameter to assign a field value based on text content.)
# Setup LLM and ingest with custom metadata
llm = LLM('openai/gpt-4o-mini', vectordb_path=tempfile.mkdtemp())
def set_folder(filepath):
if 'sotu' in filepath:
return 'sotu'
elif 'ktrain_paper' in filepath:
return 'ktrain'
else:
return 'na'
llm.ingest('tests/sample_data/sotu', file_callables={'folder': set_folder})
llm.ingest('tests/sample_data/ktrain_paper', file_callables={'folder': set_folder})Creating new vectorstore at /tmp/tmpzazbew9_/dense
Loading documents from tests/sample_data/sotu
Loading new documents: 100%|█████████████████████| 1/1 [00:00<00:00, 215.95it/s]
Processing and chunking 1 new documents: 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 994.15it/s]
Split into 43 chunks of text (max. 1000 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 2.18it/s]
Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods
Appending to existing vectorstore at /tmp/tmpzazbew9_/dense
Loading documents from tests/sample_data/ktrain_paper
Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00, 7.19it/s]
Processing and chunking 6 new documents: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1353.87it/s]
Split into 22 chunks of text (max. 1000 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 9.80it/s]
Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods
Next, we setup a KVRouter that returns the best key-value paper (in this case, a specific folder value) based on the question or query. The key-value pair is then used to filter the documents appropriately when retrieving source documents for answer generation. The router can be supplied direclty to the ask method so that only docuents in the appropriate folder are considered when generating answers.
# Create router
router = KVRouter(
field_name='folder',
field_descriptions={
'sotu': "Biden's State of the Union Address",
'ktrain': "Research papers about ktrain library, a toolkit for machine learning, text classification, and computer vision."
},
llm=llm
)
# Example of router
filter_dict = router.route('Tell me about image classification')
print()
print(filter_dict)```json
{"category":"ktrain"}
```
{'folder': 'ktrain'}
# Use router with ask() - Method 1: Direct parameter
result = llm.ask(
"What did Biden say about the economy?",
router=router
)```json
{"category":"sotu"}
```Biden discussed a new economic vision focused on investing in America, educating Americans, and growing the workforce. He criticized the trickle-down economic theory, stating it led to weaker economic growth, lower wages, and a widening wealth gap. He emphasized the importance of infrastructure investment, asserting that it would help the U.S. compete globally, particularly against China. Biden highlighted job creation through significant investments from companies like Ford and GM in electric vehicles. He acknowledged the struggles families face due to inflation and stated that his top priority is to get prices under control.
# Use router with RAG pipeline - Method 2: Direct on pipeline
rag_pipeline = llm.load_rag_pipeline()
result = rag_pipeline.ask(
"How do I use ktrain for text classification?",
router=router
)```json
{"category":"ktrain"}
```To use ktrain for text classification, you can follow these simplified steps:
1. **Load and Preprocess Data**: Use ktrain's preprocessing functions to load your text data and preprocess it. This typically involves tokenization and converting texts into a format that the model can understand.
2. **Create Model**: Define your model using ktrain's built-in functions. You can customize it according to your needs, such as choosing the architecture or adjusting hyperparameters.
3. **Train the Model**: Use ktrain's training functions to fit the model on your preprocessed data. You'll specify the number of epochs and other training parameters.
4. **Evaluate the Model**: After training, you can evaluate your model's performance using ktrain's evaluation tools, which can include generating classification reports.
5. **Make Predictions**: Finally, use the trained model to make predictions on new, unseen text data, leveraging the preprocessor instance created earlier.
This process can typically be done in just a few lines of code, making ktrain a low-code solution for text classification tasks. For detailed code examples, refer to the ktrain GitHub repository.
Example: Deciding On Follow-Up Questions
rag_pipeline.needs_followup('What is ktrain?')No
False
rag_pipeline.needs_followup('What is the capital of France?')No
False
rag_pipeline.needs_followup("How was Paul Grahams life different before, during, and after YC?")yes
True
rag_pipeline.needs_followup("Compare and contrast the customer segments and geographies of Lyft and Uber that grew the fastest.")yes
True
rag_pipeline.needs_followup("Compare and contrast Uber and Lyft.")yes
True
Example: Generating Follow-Up Questions
question = "Compare and contrast the customer segments and geographies of Lyft and Uber that grew the fastest."
subquestions = rag_pipeline.decompose_question(question, parse=False)
print()
print(subquestions)```json
{
"items": [
{
"sub_question": "What are the customer segments of Lyft that grew the fastest",
},
{
"sub_question": "What are the customer segments of Uber that grew the fastest",
},
{
"sub_question": "Which geographies showed the fastest growth for Lyft",
},
{
"sub_question": "Which geographies showed the fastest growth for Uber",
}
]
}
```
['What are the customer segments of Lyft that grew the fastest', 'What are the customer segments of Uber that grew the fastest', 'Which geographies showed the fastest growth for Lyft', 'Which geographies showed the fastest growth for Uber']