A Built-In Web App

OnPrem.LLM includes a built-in web app to easily access and use LLMs. After installing OnPrem.LLM, you can follow these steps to prepare the web app and start it:

Step 1: Ingest some documents using the Python API:

# run at Python prompt
from onprem import LLM
llm = LLM()

Step 2: Start the Web app:

# run at command-line
onprem --port 8000

Then, enter localhost:8000 (or <domain_name>:8000 if running on remote server) in your Web browser to access the application:


The Web app is implemented with streamlit: pip install streamlit. If it is not already installed, the onprem command will ask you to install it. Here is more information on the onprem command:

$:~/projects/github/onprem$ onprem --help
usage: onprem [-h] [-p PORT] [-a ADDRESS] [-v]

Start the OnPrem.LLM web app
Example: onprem --port 8000

optional arguments:
  -h, --help            show this help message and exit
  -p PORT, --port PORT  Port to use; default is 8501
  -a ADDRESS, --address ADDRESS
                        Address to bind; default is
  -v, --version         Print a version

The app requires a file called webapp.yml exists in the onprem_data folder in the user’s home directory. This file stores information used by the Web app such as the model to use. If one does not exist, then a default one will be created for you and is also shown below:

# Default YAML configuration
  # model url (or model file name if previously downloaded)
  model_url: https://huggingface.co/TheBloke/WizardLM-13B-V1.2-GGUF/resolve/main/wizardlm-13b-v1.2.Q4_K_M.gguf
  # number of layers offloaded to GPU
  n_gpu_layers: 32
  # path to vector db folder
  vectordb_path: {datadir}/vectordb
  # path to model download folder
  model_download_path: {datadir}
  # number of source documents used by LLM.ask and LLM.chat
  rag_num_source_docs: 6
  # minimum similarity score for source to be considered by LLM.ask/LLM.chat
  rag_score_threshold: 0.0
  # verbosity of Llama.cpp
  verbose: TRUE
  # prompt_template used with LLM.prompt (e.g, for models that accept a system prompt)
  # title of application
  title: OnPrem.LLM
  # subtitle in "Talk to Your Documents" screen
  # path to markdown file with contents that will be inserted below rag_title
  # path to folder containing raw documents (i.e., absolute path of folder you supplied to LLM.ingest)
  # base url (leave blank unless you're running your own separate web server to serve source documents)

You can edit the file based on your requirements. Variables in the llm section are automatically passed to the onprem.LLM constructor, which, in turn, passes extra **kwargs to llama-cpp-python. For instance, you can add a temperature variable in the llm section to adjust temperature of the model in the web app (e.g., lower values closer to 0.0 for more deterministic output and higher values for more creativity).

The default model in the auto-created YAML file is a 13B parameter model. If this is too large and slow for your system, you can edit model_url above to use a 7B parameter model or 3B parameter model with faster speed at the expense of some performance. Of course, you can also edit model_url to use a larger model, as well. Any model in GGUF format can be used.

Talk To Your Documents

The Web app has two screens. The first screen (shown above) is a UI for retrieval augmented generation or RAG (i.e., chatting with documents). Sources considered by the LLM when generating answers are displayed and ranked by answer-to-source similarity. Hovering over the question marks in the sources will display the snippets of text from a document considered by the LLM when generating answers.

Hover Example:


Source Hyperlinks: On Linux and Mac systems where Python is installed in your home directory (e.g., ~/mambaforge, ~/anaconda3), displayed sources for the answer should automatically appear as hyperlinks to the original documents (e.g, PDFs, TXTs, etc.) if you populate the rag_source_path variable in webapp.yml with the the absolute path of the folder supplied to LLM.ingest. You should leave rag_base_url blank in this case.

Use Prompts to Solve Problems

The second screen is a UI for general prompting and allows you to supply prompts to the LLM to solve problems.

Information Extraction Example:


Have fun!