Talk to Your Documents

This example of OnPrem.LLM demonstrates retrieval augmented generation or RAG.

Setup the LLM instance

In this notebook, we will use a model called Zephyr-7B-beta, which performs well on RAG tasks. When selecting a model, it is important to inspect the model’s home page and identify the correct prompt format. The prompt format for this model is located here, and we will supply it directly to the LLM constructor along with the URL to the specific model file we want (i.e., zephyr-7b-beta.Q4_K_M.gguf). We will offload layers to our GPU(s) to speed up inference using the n_gpu_layers parameter. (For more information on GPU acceleration, see here.) For the purposes of this notebook, we also supply temperature=0 so that there is no variability in outputs. You can increase this value for more creativity in the outputs. Finally, we will choose a non-default location for our vector database.

from onprem import LLM
import tempfile

vectordb_path = tempfile.mkdtemp()

llm = LLM(model_url='https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_K_M.gguf', 
          prompt_template= "<|system|>\n</s>\n<|user|>\n{prompt}</s>\n<|assistant|>",
          n_gpu_layers=-1,
          temperature=0,
          vectordb_path=vectordb_path)
llm.ingest("./tests/sample_data/")
Creating new vectorstore at /tmp/tmpjo200ika
Loading documents from ./sample_data/
Loading new documents: 100%|██████████████████████| 3/3 [00:00<00:00,  7.80it/s]
Loaded 12 new documents from ./sample_data/
Split into 153 chunks of text (max. 500 chars each)
Creating embeddings. May take some minutes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.61s/it]
Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods

Asking Questions to Your Documents

result = llm.ask("What is ktrain?")

Ktrain is a low-code library for augmented machine learning that aims to democratize machine learning by facilitating the full machine learning workflow from curating and preprocessing inputs to training, tuning, troubleshooting, and applying models. It places less emphasis on automating feature engineering compared to other automated machine learning tools like Auto-WEKA and H2O Driverless AI, but instead focuses on partially or fully automating other aspects of the machine learning workflow. Ktrain allows users to make choices that best fit their unique application requirements while also automating certain tasks algorithmically or through setting well-performing defaults. Its goal is to augment and complement human engineers rather than attempting to entirely replace them, thereby better exploiting the strengths of both humans and machines.

The answer is stored in results['answer']. The documents retrieved from the vector store used to generate the answer are stored in results['source_documents'] above.

print(result["source_documents"][0])
page_content='lection (He et al., 2019). By contrast, ktrain places less emphasis on this aspect of au-\ntomation and instead focuses on either partially or fully automating other aspects of the\nmachine learning (ML) workflow. For these reasons, ktrain is less of a traditional Au-\n2' metadata={'author': '', 'creationDate': "D:20220406214054-04'00'", 'creator': 'LaTeX with hyperref', 'file_path': '/home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf', 'format': 'PDF 1.4', 'keywords': '', 'modDate': "D:20220406214054-04'00'", 'page': 1, 'producer': 'dvips + GPL Ghostscript GIT PRERELEASE 9.22', 'source': '/home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf', 'subject': '', 'title': '', 'total_pages': 9, 'trapped': ''}

Chatting with Your Documents

Unlike LLM.ask, the LLM.chat method retains conversational memory at the expense of a larger context and an extra call to the LLM.

result = llm.chat("What is ktrain?")
 Ktrain is a low-code library for augmented machine learning that facilitates the full machine learning workflow from curating and preprocessing inputs to training, tuning, troubleshooting, and applying models. It automates or semi-automates certain aspects of the machine learning process, making it well-suited for domain experts who may have less experience with machine learning and software coding.
result = llm.chat("Does it support image classification?")
 Can ktrain be used for image classification tasks in augmented machine learning? Yes, ktrain can be used for image classification tasks in augmented machine learning as it supports various types of data including images. The library provides a standard template for building supervised learning models that includes loading and preprocessing data, training and tuning models, evaluating and applying models, and visualizing results. Ktrain is designed to reduce cognitive load by providing a unified interface to different machine learning tasks and facilitating the full machine learning workflow from curating and preprocessing inputs to applying models. This makes it well-suited for domain experts who may have less experience with machine learning and software coding.
print(result["answer"])
 Yes, ktrain can be used for image classification tasks in augmented machine learning as it supports various types of data including images. The library provides a standard template for building supervised learning models that includes loading and preprocessing data, training and tuning models, evaluating and applying models, and visualizing results. Ktrain is designed to reduce cognitive load by providing a unified interface to different machine learning tasks and facilitating the full machine learning workflow from curating and preprocessing inputs to applying models. This makes it well-suited for domain experts who may have less experience with machine learning and software coding.