pipelines.rag

A pipeline module for Retrieval Augmented Generation (RAG)

RAGPipeline


def RAGPipeline(
    llm,
    qa_template:str="Use the following pieces of context delimited by three backticks to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.\n\n```{context}```\n\nQuestion: {question}\nHelpful Answer:"
):

Retrieval-Augmented Generation pipeline for answering questions based on source documents.

source

RAGPipeline.ask


def ask(
    question:str, # question as string
    contexts:Optional=None, # optional list of contexts to answer question. If None, retrieve from vectordb.
    qa_template:Optional=None, # question-answering prompt template to use
    filters:Optional=None, # filter sources by metadata values using Chroma metadata syntax (e.g., {'table':True})
    where_document:NoneType=None, # filter sources by document content (syntax varies by store type)
    folders:Optional=None, # folders to search (needed because LangChain does not forward "where" parameter)
    limit:Optional=None, # Number of sources to consider. If None, use `LLM.rag_num_source_docs`.
    score_threshold:Optional=None, # minimum similarity score of source. If None, use `LLM.rag_score_threshold`.
    table_k:int=1, # maximum number of tables to consider when generating answer
    table_score_threshold:float=0.35, # minimum similarity score for table to be considered in answer
    selfask:bool=False, # If True, use an agentic Self-Ask prompting strategy.
    router:NoneType=None, # Optional KVRouter instance for automatic filtering
    kwargs:VAR_KEYWORD
)->Dict:

Answer a question using RAG approach. Additional kwargs arguments passed to LLM.prompt Returns dictionary with keys: answer, source_documents, question.

source

RAGPipeline.semantic_search


def semantic_search(
    query:str, # search query as string
    limit:int=4, # number of sources to retrieve
    score_threshold:float=0.0, # minimum threshold for score
    filters:Optional=None, # metadata filters
    where_document:NoneType=None, # filter search results based syntax of underlying store
    folders:Optional=None, # list of folders to consider
    kwargs:VAR_KEYWORD
)->List:

Perform a semantic search of the vector DB.

The where_document parameter varies depending on the value of LLM.store_type. If LLM.store_type is ‘dense’, then where_document should be a dictionary in Chroma syntax (e.g., {“$contains”: “Canada”}) to filter results. If LLM.store_type is ‘sparse’, then where_document should be a boolean search string to filter query in Lucene syntax.

source

RAGPipeline.needs_followup


def needs_followup(
    question:str, parse:bool=True, kwargs:VAR_KEYWORD
):

Decide if follow-up questions are needed

source

RAGPipeline.decompose_question


def decompose_question(
    question:str, parse:bool=True, kwargs:VAR_KEYWORD
):

Decompose a question into subquestions

source

KVRouter


def KVRouter(
    field_name:str, field_descriptions:Dict, llm,
    router_prompt:str="Given the following query/question, select the most appropriate category that would contain the relevant information.\n\nQuery: {question}\n\nAvailable categories:\n{categories}\n\nSelect the best category from the list above, or 'none' if no category is appropriate.\nDo not provide an explanation for the categorization. Only output the category as a single string"
):

Key-Value Router for intelligent filtering based on query content.

Uses an LLM to select the most appropriate field value for filtering based on the query/question content.

source

CategorySelection


def CategorySelection(
    data:Any
)->None:

Pydantic model for category selection response.

source

KVRouter.route


def route(
    question:str, kwargs:VAR_KEYWORD
)->Optional:

Select the best field value for the given question. Extra **kwargs supplied to LLM.prompt.

Args: question: The user’s question/query

Returns: Dictionary for filters parameter, or None if no appropriate category Example: {‘folder’: ‘sotu’} or None

source

KVRouter.route_and_search


def route_and_search(
    query:str, rag_pipeline, search_kwargs:VAR_KEYWORD
)->List:

Convenience method that routes and performs semantic search.

Args: query: The search query rag_pipeline: RAGPipeline instance to search with **search_kwargs: Additional arguments passed to semantic_search

Returns: List of Document objects

Example: Using Query Routing with RAG

In this example, we use the KVRouter to route RAG queries to the correct set of ingested documents.

First, when we ingest documents, we assign a folder field to each document chunk using the file_callables argument. (You can also use the text_callables parameter to assign a field value based on text content.)

from onprem import LLM
from onprem.pipelines import KVRouter
import tempfile

# Setup LLM and ingest with custom metadata
llm = LLM('openai/gpt-4o-mini', vectordb_path=tempfile.mkdtemp())
def set_folder(filepath):
    if 'sotu' in filepath:
        return 'sotu'
    elif 'ktrain_paper' in filepath:
        return 'ktrain'
    else:
        return 'na'
        
llm.ingest('tests/sample_data/sotu', file_callables={'folder': set_folder})
llm.ingest('tests/sample_data/ktrain_paper', file_callables={'folder': set_folder})

Creating new vectorstore at /tmp/tmpzazbew9_/dense
Loading documents from tests/sample_data/sotu

Loading new documents: 100%|█████████████████████| 1/1 [00:00<00:00, 215.95it/s]
Processing and chunking 1 new documents: 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 994.15it/s]

Split into 43 chunks of text (max. 1000 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.18it/s]

Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods
Appending to existing vectorstore at /tmp/tmpzazbew9_/dense
Loading documents from tests/sample_data/ktrain_paper


Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00,  7.19it/s]
Processing and chunking 6 new documents: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1353.87it/s]

Split into 22 chunks of text (max. 1000 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.80it/s]

Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods

Next, we setup a KVRouter that returns the best key-value pair (in this case, a specific folder value) based on the question or query. The key-value pair is then used to filter the documents appropriately when retrieving source documents for answer generation. The router can be supplied direclty to the ask method so that only documents in the appropriate folder are considered when generating answers.

# Create router
router = KVRouter(
  field_name='folder',
  field_descriptions={
      'sotu': "Biden's State of the Union Address",
      'ktrain': "Research papers about ktrain library, a toolkit for machine learning, text classification, and computer vision."
  },
  llm=llm
)

# Example of router
filter_dict = router.route('Tell me about image classification')
print()
print(filter_dict)

```json
{"category":"ktrain"}
```
{'folder': 'ktrain'}

# Use router with ask() - Method 1: Direct parameter
result = llm.ask(
  "What did Biden say about the economy?",
  router=router
)

```json
{"category":"sotu"}
```Biden discussed a new economic vision focused on investing in America, educating Americans, and growing the workforce. He criticized the trickle-down economic theory, stating it led to weaker economic growth, lower wages, and a widening wealth gap. He emphasized the importance of infrastructure investment, asserting that it would help the U.S. compete globally, particularly against China. Biden highlighted job creation through significant investments from companies like Ford and GM in electric vehicles. He acknowledged the struggles families face due to inflation and stated that his top priority is to get prices under control.

# Use router with RAG pipeline - Method 2: Direct on pipeline
rag_pipeline = llm.load_rag_pipeline()
result = rag_pipeline.ask(
  "How do I use ktrain for text classification?",
  router=router
)

```json
{"category":"ktrain"}
```To use ktrain for text classification, you can follow these simplified steps:

1. **Load and Preprocess Data**: Use ktrain's preprocessing functions to load your text data and preprocess it. This typically involves tokenization and converting texts into a format that the model can understand.

2. **Create Model**: Define your model using ktrain's built-in functions. You can customize it according to your needs, such as choosing the architecture or adjusting hyperparameters.

3. **Train the Model**: Use ktrain's training functions to fit the model on your preprocessed data. You'll specify the number of epochs and other training parameters.

4. **Evaluate the Model**: After training, you can evaluate your model's performance using ktrain's evaluation tools, which can include generating classification reports.

5. **Make Predictions**: Finally, use the trained model to make predictions on new, unseen text data, leveraging the preprocessor instance created earlier.

This process can typically be done in just a few lines of code, making ktrain a low-code solution for text classification tasks. For detailed code examples, refer to the ktrain GitHub repository.

Example: Deciding On Follow-Up Questions

rag_pipeline.needs_followup('What is ktrain?')

No

False

rag_pipeline.needs_followup('What is the capital of France?')

No

False

rag_pipeline.needs_followup("How was Paul Grahams life different before, during, and after YC?")

yes

True

rag_pipeline.needs_followup("Compare and contrast the customer segments and geographies of Lyft and Uber that grew the fastest.")

yes

True

rag_pipeline.needs_followup("Compare and contrast Uber and Lyft.")

yes

True

Example: Generating Follow-Up Questions

question = "Compare and contrast the customer segments and geographies of Lyft and Uber that grew the fastest."
subquestions = rag_pipeline.decompose_question(question, parse=False)
print()
print(subquestions)

```json
{
    "items": [
        {
            "sub_question": "What are the customer segments of Lyft that grew the fastest",
        },
        {
            "sub_question": "What are the customer segments of Uber that grew the fastest",
        },
        {
            "sub_question": "Which geographies showed the fastest growth for Lyft",
        },
        {
            "sub_question": "Which geographies showed the fastest growth for Uber",
        }
    ]
}
```
['What are the customer segments of Lyft that grew the fastest', 'What are the customer segments of Uber that grew the fastest', 'Which geographies showed the fastest growth for Lyft', 'Which geographies showed the fastest growth for Uber']