from onprem import LLM
import tempfile
= tempfile.mkdtemp()
vectordb_path
= LLM(use_larger=True, n_gpu_layers=35, vectordb_path=vectordb_path) llm
Talk to Your Documents
This example of OnPrem.LLM demonstrates retrieval augmented generation or RAG.
In these examples, we will accelerate inference using a GPU. We use an NVIDIA Titan V GPU with a modest 12GB of VRAM. For GPU acceleration, make sure you installed llama-cpp-python
with CUBLAS support, as described here.
After that, you just need to supply the n_gpu_layers
argument to LLM
for GPU-accelerated responses.
We will also use supply use_larger=True
to LLM
to use the slighly larger default model.
Setup the LLM
instance
"./sample_data/") llm.ingest(
Creating new vectorstore at /tmp/tmpsmcnzlzp
Loading documents from ./sample_data/
Loaded 12 new documents from ./sample_data/
Split into 153 chunks of text (max. 500 chars each)
Creating embeddings. May take some minutes...
Ingestion complete! You can now query your documents using the LLM.ask method
Loading new documents: 100%|██████████████████████| 3/3 [00:00<00:00, 23.79it/s]
Asking Questions to Your Documents
= llm.ask("What is ktrain?") result
ggml_init_cublas: found 2 CUDA devices:
Device 0: NVIDIA TITAN V, compute capability 7.0
Device 1: NVIDIA TITAN V, compute capability 7.0
llama.cpp: loading model from /home/amaiya/onprem_data/wizardlm-13b-v1.2.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA TITAN V) as main device
llama_model_load_internal: mem required = 3074.87 MB (+ 1608.00 MB per state)
llama_model_load_internal: allocating batch_size x (640 kB + n_ctx x 160 B) = 480 MB VRAM for the scratch buffer
llama_model_load_internal: offloading 35 repeating layers to GPU
llama_model_load_internal: offloaded 35/43 layers to GPU
llama_model_load_internal: total VRAM used: 6437 MB
llama_new_context_with_model: kv self size = 1600.00 MB
Ktrain is a low-code library for augmented machine learning that facilitates the full machine learning workflow from data curating to model application, but allows users to make choices that best fit their unique application requirements. It is intended to democratize machine learning by enabling beginners and domain experts with minimal programming or data science experience to use ML platforms more effectively."
The answer is stored in results['answer']
. The documents retrieved from the vector store used to generate the answer are stored in results['source_documents']
above.
print(result["source_documents"][0])
page_content='lection (He et al., 2019). By contrast, ktrain places less emphasis on this aspect of au-\ntomation and instead focuses on either partially or fully automating other aspects of the\nmachine learning (ML) workflow. For these reasons, ktrain is less of a traditional Au-\n2' metadata={'author': '', 'creationDate': "D:20220406214054-04'00'", 'creator': 'LaTeX with hyperref', 'file_path': './sample_data/1/ktrain_paper.pdf', 'format': 'PDF 1.4', 'keywords': '', 'modDate': "D:20220406214054-04'00'", 'page': 1, 'producer': 'dvips + GPL Ghostscript GIT PRERELEASE 9.22', 'source': './sample_data/1/ktrain_paper.pdf', 'subject': '', 'title': '', 'total_pages': 9, 'trapped': ''}
Chatting with Your Documents
Unlike LLM.ask
, the LLM.chat
method retains conversational memory at the expense of a larger context and an extra call to the LLM.
= llm.chat("What is ktrain?") result
ktrain is a low-code library designed to facilitate the full machine learning workow from curating and preprocessing inputs (i.e., ground-truth-labeled training data) to training, tuning, troubleshooting, and applying models. It's intended to democratize machine learning by enabling beginners and domain experts with minimal programming or data science experience to leverage the power of ML in their work. ktrain uses automation to augment and complement human engineers rather than replacing them, thereby exploiting the strengths of both humans and machines for better results. It is inspired by low-code (and no-code) open-source ML libraries such as fastai and ludwig, with custom models and data formats being supported as well.
= llm.chat("Does it support image classification?") result
Does ktrain support image classification?
Yes, ktrain supports image classification. It can be used with any machine learning model implemented in TensorFlow Keras (tf.keras) for this purpose.
print(result["answer"])
Yes, ktrain supports image classification. It can be used with any machine learning model implemented in TensorFlow Keras (tf.keras) for this purpose.