A Built-In Web App

OnPrem.LLM includes a built-in web app to easily access and use LLMs. After installing OnPrem.LLM, you can follow these steps to prepare the web app and start it:

Step 1: Ingest some documents using the Python API:

# run at Python prompt
from onprem import LLM
llm = LLM()

Step 2: Start the Web app:

# run at command-line
onprem --port 8000

Then, enter localhost:8000 (or <domain_name>:8000 if running on remote server) in your Web browser to access the application:


The Web app is implemented with streamlit: pip install streamlit. If it is not already installed, the onprem command will ask you to install it. Here is more information on the onprem command:

$:~/projects/github/onprem$ onprem --help
usage: onprem [-h] [-p PORT] [-a ADDRESS] [-v]

Start the OnPrem.LLM web app
Example: onprem --port 8000

optional arguments:
  -h, --help            show this help message and exit
  -p PORT, --port PORT  Port to use; default is 8501
  -a ADDRESS, --address ADDRESS
                        Address to bind; default is
  -v, --version         Print a version

The app requires a file called webapp.yml exists in the onprem_data folder in the user’s home directory. This file stores information used by the Web app such as the model to use. If one does not exist, then a default one will be created for you and is also shown below:

# Default YAML configuration
  # model url (or model file name if previously downloaded)
  # if changing, be sure to update the prompt_template variable below
  model_url: https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_K_M.ggufuf
  # number of layers offloaded to GPU
  n_gpu_layers: 32
  # path to vector db folder
  vectordb_path: {datadir}/vectordb
  # path to model download folder
  model_download_path: {datadir}
  # number of source documents used by LLM.ask and LLM.chat
  rag_num_source_docs: 6
  # minimum similarity score for source to be considered by LLM.ask/LLM.chat
  rag_score_threshold: 0.0
  # verbosity of Llama.cpp
  # additional parameters added in the "llm" YAML section will be fed directly to LlamaCpp (e.g., temperature)
  #temperature: 0.0
  # The default prompt_template is specifically for Zephyr-7B.
  # It will need to be changed if you change the model_url above.
  prompt_template: <|system|>\n</s>\n<|user|>\n{prompt}</s>\n<|assistant|>
  # title of application
  title: OnPrem.LLM
  # subtitle in "Talk to Your Documents" screen
  # path to markdown file with contents that will be inserted below rag_title
  # path to folder containing raw documents (i.e., absolute path of folder you supplied to LLM.ingest)
  # base url (leave blank unless you're running your own separate web server to serve source documents)

You can edit the file based on your requirements. Variables in the llm section are automatically passed to the onprem.LLM constructor, which, in turn, passes extra **kwargs to llama-cpp-python. For instance, you can add a temperature variable in the llm section to adjust temperature of the model in the web app (e.g., lower values closer to 0.0 for more deterministic output and higher values for more creativity).

The default model 7B-parameter model called Zephyr-7B To change the model, oo large and slow for your system, you cdifferent model. model in GGUF format can be used.

Note that some models have particular prompt formats. For instance, if using the default Zephyr-7B model above, as described on the model’s home page, the prompt_template in the YAML file must be set to:

  prompt_template: <|system|>\n</s>\n<|user|>\n{prompt}</s>\n<|assistant|>

If changing models, don’t forget to update the prompt_template variable with the prompt format approrpriate for your chosen model.

Talk To Your Documents

The Web app has two screens. The first screen (shown above) is a UI for retrieval augmented generation or RAG (i.e., chatting with documents). Sources considered by the LLM when generating answers are displayed and ranked by answer-to-source similarity. Hovering over the question marks in the sources will display the snippets of text from a document considered by the LLM when generating answers.

Hover Example:


Source Hyperlinks: On Linux and Mac systems where Python is installed in your home directory (e.g., ~/mambaforge, ~/anaconda3), displayed sources for the answer should automatically appear as hyperlinks to the original documents (e.g, PDFs, TXTs, etc.) if you populate the rag_source_path variable in webapp.yml with the the absolute path of the folder supplied to LLM.ingest. You should leave rag_base_url blank in this case.

Use Prompts to Solve Problems

The second screen is a UI for general prompting and allows you to supply prompts to the LLM to solve problems.

Information Extraction Example:


Have fun!