pipelines.rag

A pipeline module for Retrieval Augmented Generation (RAG)

source

RAGPipeline

 RAGPipeline (llm, qa_template:str="Use the following pieces of context
              delimited by three backticks to answer the question at the
              end. If you don't know the answer, just say that you don't
              know, don't try to make up an
              answer.\n\n```{context}```\n\nQuestion: {question}\nHelpful
              Answer:")

Retrieval-Augmented Generation pipeline for answering questions based on source documents.


source

RAGPipeline.ask

 RAGPipeline.ask (question:str, contexts:Optional[list]=None,
                  qa_template:Optional[str]=None,
                  filters:Optional[Dict[str,str]]=None,
                  where_document=None, folders:Optional[list]=None,
                  limit:Optional[int]=None,
                  score_threshold:Optional[float]=None, table_k:int=1,
                  table_score_threshold:float=0.35, selfask:bool=False,
                  router=None, **kwargs)

Answer a question using RAG approach.

Args: question: Question to answer contexts: Optional list of contexts. If None, retrieve from vectordb qa_template: Optional custom QA prompt template filters: Filter sources by metadata values where_document: Filter sources by document content folders: Folders to search limit: Number of sources to consider score_threshold: Minimum similarity score table_k: Maximum number of tables to consider table_score_threshold: Minimum similarity score for tables selfask: Use agentic Self-Ask prompting strategy **kwargs: Additional arguments passed to LLM.prompt

Returns: Dictionary with keys: answer, source_documents, question

Type Default Details
question str question as string
contexts Optional None optional list of contexts to answer question. If None, retrieve from vectordb.
qa_template Optional None question-answering prompt template to use
filters Optional None filter sources by metadata values using Chroma metadata syntax (e.g., {‘table’:True})
where_document NoneType None filter sources by document content (syntax varies by store type)
folders Optional None folders to search (needed because LangChain does not forward “where” parameter)
limit Optional None Number of sources to consider. If None, use LLM.rag_num_source_docs.
score_threshold Optional None minimum similarity score of source. If None, use LLM.rag_score_threshold.
table_k int 1 maximum number of tables to consider when generating answer
table_score_threshold float 0.35 minimum similarity score for table to be considered in answer
selfask bool False If True, use an agentic Self-Ask prompting strategy.
router NoneType None Optional KVRouter instance for automatic filtering
kwargs VAR_KEYWORD
Returns Dict

source

RAGPipeline.needs_followup

 RAGPipeline.needs_followup (question:str, parse=True, **kwargs)

Decide if follow-up questions are needed


source

RAGPipeline.decompose_question

 RAGPipeline.decompose_question (question:str, parse=True, **kwargs)

Decompose a question into subquestions


source

KVRouter

 KVRouter (field_name:str, field_descriptions:Dict[str,str], llm,
           router_prompt:str="Given the following query/question, select
           the most appropriate category that would contain the relevant
           information.\n\nQuery: {question}\n\nAvailable
           categories:\n{categories}\n\nSelect the best category from the
           list above, or 'none' if no category is appropriate.")

Key-Value Router for intelligent filtering based on query content.

Uses an LLM to select the most appropriate field value for filtering based on the query/question content.


source

CategorySelection

 CategorySelection (category:str)

Pydantic model for category selection response.


source

KVRouter.route

 KVRouter.route (question:str)

Select the best field value for the given question.

Args: question: The user’s question/query

Returns: Dictionary for filters parameter, or None if no appropriate category Example: {‘folder’: ‘sotu’} or None


source

Example: Using Query Routing with RAG

In this example, we use the KVRouter to route RAG queries to the correct set of ingested documents.

First, when we ingest documents, we assign a folder field to each document chunk. (You can also use the text_callables parameter to assign a field value based on text content.)

from onprem import LLM
from onprem.pipelines import KVRouter
import tempfile
# Setup LLM and ingest with custom metadata
llm = LLM('openai/gpt-4o-mini', vectordb_path=tempfile.mkdtemp())
def set_folder(filepath):
    if 'sotu' in filepath:
        return 'sotu'
    elif 'ktrain_paper' in filepath:
        return 'ktrain'
    else:
        return 'na'
        
llm.ingest('tests/sample_data/sotu', file_callables={'folder': set_folder})
llm.ingest('tests/sample_data/ktrain_paper', file_callables={'folder': set_folder})
Creating new vectorstore at /tmp/tmpzazbew9_/dense
Loading documents from tests/sample_data/sotu
Loading new documents: 100%|█████████████████████| 1/1 [00:00<00:00, 215.95it/s]
Processing and chunking 1 new documents: 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 994.15it/s]
Split into 43 chunks of text (max. 1000 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.18it/s]
Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods
Appending to existing vectorstore at /tmp/tmpzazbew9_/dense
Loading documents from tests/sample_data/ktrain_paper

Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00,  7.19it/s]
Processing and chunking 6 new documents: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1353.87it/s]
Split into 22 chunks of text (max. 1000 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.80it/s]
Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods

Next, we setup a KVRouter that returns the best key-value paper (in this case, a specific folder value) based on the question or query. The key-value pair is then used to filter the documents appropriately when retrieving source documents for answer generation. The router can be supplied direclty to the ask method so that only docuents in the appropriate folder are considered when generating answers.

# Create router
router = KVRouter(
  field_name='folder',
  field_descriptions={
      'sotu': "Biden's State of the Union Address",
      'ktrain': "Research papers about ktrain library, a toolkit for machine learning, text classification, and computer vision."
  },
  llm=llm
)

# Example of router
filter_dict = router.route('Tell me about image classification')
print()
print(filter_dict)
```json
{"category":"ktrain"}
```
{'folder': 'ktrain'}
# Use router with ask() - Method 1: Direct parameter
result = llm.ask(
  "What did Biden say about the economy?",
  router=router
)
```json
{"category":"sotu"}
```Biden discussed a new economic vision focused on investing in America, educating Americans, and growing the workforce. He criticized the trickle-down economic theory, stating it led to weaker economic growth, lower wages, and a widening wealth gap. He emphasized the importance of infrastructure investment, asserting that it would help the U.S. compete globally, particularly against China. Biden highlighted job creation through significant investments from companies like Ford and GM in electric vehicles. He acknowledged the struggles families face due to inflation and stated that his top priority is to get prices under control.
# Use router with RAG pipeline - Method 2: Direct on pipeline
rag_pipeline = llm.load_rag_pipeline()
result = rag_pipeline.ask(
  "How do I use ktrain for text classification?",
  router=router
)
```json
{"category":"ktrain"}
```To use ktrain for text classification, you can follow these simplified steps:

1. **Load and Preprocess Data**: Use ktrain's preprocessing functions to load your text data and preprocess it. This typically involves tokenization and converting texts into a format that the model can understand.

2. **Create Model**: Define your model using ktrain's built-in functions. You can customize it according to your needs, such as choosing the architecture or adjusting hyperparameters.

3. **Train the Model**: Use ktrain's training functions to fit the model on your preprocessed data. You'll specify the number of epochs and other training parameters.

4. **Evaluate the Model**: After training, you can evaluate your model's performance using ktrain's evaluation tools, which can include generating classification reports.

5. **Make Predictions**: Finally, use the trained model to make predictions on new, unseen text data, leveraging the preprocessor instance created earlier.

This process can typically be done in just a few lines of code, making ktrain a low-code solution for text classification tasks. For detailed code examples, refer to the ktrain GitHub repository.

Example: Deciding On Follow-Up Questions

rag_pipeline.needs_followup('What is ktrain?')
No
False
rag_pipeline.needs_followup('What is the capital of France?')
No
False
rag_pipeline.needs_followup("How was Paul Grahams life different before, during, and after YC?")
yes
True
rag_pipeline.needs_followup("Compare and contrast the customer segments and geographies of Lyft and Uber that grew the fastest.")
yes
True
rag_pipeline.needs_followup("Compare and contrast Uber and Lyft.")
yes
True

Example: Generating Follow-Up Questions

question = "Compare and contrast the customer segments and geographies of Lyft and Uber that grew the fastest."
subquestions = rag_pipeline.decompose_question(question, parse=False)
print()
print(subquestions)
```json
{
    "items": [
        {
            "sub_question": "What are the customer segments of Lyft that grew the fastest",
        },
        {
            "sub_question": "What are the customer segments of Uber that grew the fastest",
        },
        {
            "sub_question": "Which geographies showed the fastest growth for Lyft",
        },
        {
            "sub_question": "Which geographies showed the fastest growth for Uber",
        }
    ]
}
```
['What are the customer segments of Lyft that grew the fastest', 'What are the customer segments of Uber that grew the fastest', 'Which geographies showed the fastest growth for Lyft', 'Which geographies showed the fastest growth for Uber']