llm

Core functionality for LLMs

LLM

 LLM (model_url:Optional[str]=None, model_id:Optional[str]=None,
      default_model:str='zephyr', default_engine:str='llama.cpp',
      prompt_template:Optional[str]=None,
      model_download_path:Optional[str]=None,
      vectordb_path:Optional[str]=None, store_type:str='dense',
      vectorstore=None, max_tokens:int=512,
      n_gpu_layers:Optional[int]=None, n_ctx:int=3900, n_batch:int=1024,
      stop:list=[], mute_stream:bool=False, callbacks=[],
      embedding_model_name:str='sentence-transformers/all-MiniLM-L6-v2',
      embedding_model_kwargs:Optional[dict]=None,
      embedding_encode_kwargs:dict={'normalize_embeddings': False},
      rag_num_source_docs:int=4, rag_score_threshold:float=0.0,
      check_model_download:bool=True, confirm:bool=True,
      verbose:bool=True, **kwargs)

*LLM Constructor. Extra kwargs (e.g., temperature) are fed directly the LangChain LLM or Transformer Pipeline.

Args:

model_url: URL to .GGUF model (or the filename if already been downloaded to model_download_path). To use an OpenAI-compatible REST API (e.g., vLLM, OpenLLM, Ollama), supply the URL (e.g., http://localhost:8080/v1). To use a cloud-based OpenAI model, replace URL with: openai://<name_of_model> (e.g., openai://gpt-3.5-turbo). To use Azure OpenAI, replace URL with: with: azure://<deployment_name>. If None, use the model indicated by default_model.
model_id: Name of or path to Hugging Face model (e.g., in SafeTensor format). Hugging Face Transformers is used for LLM generation instead of llama-cpp-python. Mutually-exclusive with model_url and default_model. The n_gpu_layers and model_download_path parameters are ignored if model_id is supplied.
default_model: One of {‘mistral’, ‘zephyr’, ‘llama’}, where mistral is Mistral-Instruct-7B-v0.2, zephyr is Zephyr-7B-beta, and llama is Llama-3.1-8B.
default_engine: The engine used to run the default_model. One of {‘llama.cpp’, ‘transformers’}.
prompt_template: Optional prompt template (must have a variable named “prompt”). Prompt templates are not typically needed when using the model_id parameter, as transformers sets it automatically.
model_download_path: Path to download model. Default is onprem_data in user’s home directory.
vectordb_path: Path to vector database (created if it doesn’t exist). Default is onprem_data/vectordb in user’s home directory.
store_type: One of dense for the default dense vector database (i.e., chroma) or sparse for the sparse vector store (i.e., a keyword search engine).
(Documents stored in sparse vector databases are converted to dense vectors at inference time when used with LLM.ask.)
vectorstore: an onprem.ingest.stores.base.VectorStore instance.
max_tokens: The maximum number of tokens to generate.
n_gpu_layers: Number of layers to be loaded into gpu memory. Default is None. Only used for llama-cpp backend.
n_ctx: Token context window. Only used for llama-cpp backend. For Ollama backend, explicitly supply num_ctx instead which is passed to LiteLLM. Hugging Face Transformers backend (i.e., when using the model_id parameter) sets context window automatically.
n_batch: Number of tokens to process in parallel. Only used for llama-cpp backend.
stop: a list of strings to stop generation when encountered (applied to all calls to LLM.prompt)
mute_stream: Mute ChatGPT-like token stream output during generation
callbacks: Callbacks to supply model
embedding_model_name: name of sentence-transformers model. Used for LLM.ingest and LLM.ask.
embedding_model_kwargs: arguments to embedding model (e.g., {device':'cpu'}). If None, uses GPU if available.
embedding_encode_kwargs: arguments to encode method of embedding model (e.g., {'normalize_embeddings': False}).
rag_num_source_docs: The maximum number of documents retrieved and fed to LLM.ask and LLM.chat to generate answers.
rag_score_threshold: Minimum similarity score for source to be considered by LLM.ask and LLM.chat.
confirm: whether or not to confirm with user before downloading a model
verbose: Verbosity*

source

LLM.download_model

 LLM.download_model (model_url:Optional[str]=None,
                     default_model:str='zephyr',
                     model_download_path:Optional[str]=None,
                     confirm:bool=True, ssl_verify:bool=True)

*Download an LLM in GGML format supported by lLama.cpp.

Args:

model_url: URL of model. If None, then use default_model.
default_model: One of {‘mistral’, ‘zephyr’, ‘llama’}, where mistral is Mistral-Instruct-7B-v0.2, zephyr is Zephyr-7B-beta, and llama is Llama-3.1-8B.
model_download_path: Path to download model. Default is onprem_data in user’s home directory.
confirm: whether or not to confirm with user before downloading
ssl_verify: If True, SSL certificates are verified. You can set to False if corporate firewall gives you problems.*

source

LLM.load_llm

 LLM.load_llm ()

Loads the LLM from the model path.

source

LLM.load_vectorstore

 LLM.load_vectorstore (custom_vectorstore=None, reset=False)

Get VectorStore instance. Use the vectorstore’s methods directly instead of accessing the underlying database. Supply custom_vectorstore to use your own VectorStore instance (i.e., subclass DenseStore or SparseStore). Supply reset=True to reload the default vectorstore.

source

LLM.load_chatbot

 LLM.load_chatbot ()

Prepares and loads a langchain.chains.ConversationChain instance

source

LLM.query

 LLM.query (*args, **kwargs)

Perform a semantic search of vectorstore.

source

LLM.prompt

 LLM.prompt (prompt:Union[str,List[Dict]],
             output_parser:Optional[Any]=None,
             image_path_or_url:Optional[str]=None,
             prompt_template:Optional[str]=None, stop:list=[],
             truncate_prompt:bool=False, truncate_strategy:str='start',
             **kwargs)

*Send prompt to LLM to generate a response. Extra keyword arguments are sent directly to the model invocation.

Args:

prompt: The prompt to supply to the model. Either a string or OpenAI-style list of dictionaries representing messages (e.g., “human”, “system”).
image_path_or_url: Path or URL to an image file
prompt_template: Optional prompt template (must have a variable named “prompt”). This value will override any prompt_template value supplied to LLM constructor.
stop: a list of strings to stop generation when encountered. This value will override the stop parameter supplied to LLM constructor.
truncate_prompt: Truncate long string prompts. Only applies to llama-cpp-python and transformers LLMs.
truncate_strategy: Either ‘first’ (keep latest) or ’last(keep earliest). Ignored iftruncate_prompt=False`.*

source

LLM.pydantic_prompt

 LLM.pydantic_prompt (prompt:str, pydantic_model=None,
                      attempt_fix:bool=False, fix_llm=None, stop:list=[],
                      **kwargs)

*Accept a prompt as string and Pydantic model describing the desired output. Output will be a Pydantic object in the requested format.

Args:

prompt: The prompt to supply to the model. Either a string or OpenAI-style list of dictionaries representing messages (e.g., “human”, “system”).
pydantic_model: A Pydanatic model (sublass of pydantic.BaseModel that describes the desired output format. Output will be a desired Pydantic object. If put_format=None, then output is a string.
attempt_fix: Use an LLM call in attempt to correct malformed or incomplete outputs
fix_llm: LLM to use for fixing (e.g., langchain_openai.ChatOpenAI()). If None, then existing LLM.llm used.
stop: a list of strings to stop generation when encountered. This value will override the stop parameter supplied to LLM constructor.*

source

LLM.ingest

 LLM.ingest (source_directory:str, chunk_size:int=500,
             chunk_overlap:int=50, ignore_fn:Optional[Callable]=None,
             batch_size:int=1000, **kwargs)

Ingests all documents in source_folder into vector database. Previously-ingested documents are ignored. Extra kwargs fed to load_single_document, load_documents, and/or [chunk_documents`](https://amaiya.github.io/onprem/ingest.base.html#chunk_documents).

	Type	Default	Details
source_directory	str		path to folder containing documents
chunk_size	int	500	text is split to this many characters by `langchain.text_splitter.RecursiveCharacterTextSplitter`
chunk_overlap	int	50	character overlap between chunks in `langchain.text_splitter.RecursiveCharacterTextSplitter`
ignore_fn	Optional	None	callable that accepts the file path and returns True for ignored files
batch_size	int	1000	batch size used when processing documents(e.g, creating embeddings).
kwargs	VAR_KEYWORD

source

LLM.ask

 LLM.ask (question:str, selfask:bool=False, qa_template='"Use the
          following pieces of context delimited by three backticks to
          answer the question at the end. If you don\'t know the answer,
          just say that you don\'t know, don\'t try to make up an
          answer.\n\n```{context}```\n\nQuestion: {question}\nHelpful
          Answer:', filters:Optional[Dict[str,str]]=None,
          where_document=None, folders:Optional[list]=None,
          limit:Optional[int]=None, score_threshold:Optional[float]=None,
          table_k:int=1, table_score_threshold:float=0.35, **kwargs)

Answer a question based on source documents fed to the LLM.ingest method. Extra keyword arguments are sent directly to LLM.prompt. Returns a dictionary with keys: answer, source_documents, question

	Type	Default	Details
question	str		question as sting
selfask	bool	False	If True, use an agentic Self-Ask prompting strategy.
qa_template	str	“Use the following pieces of context delimited by three backticks to answer the question at the end. If you don’t know the answer, just say that you don’t know, don’t try to make up an answer. `{context}` Question: {question} Helpful Answer:	question-answering prompt template to tuse
filters	Optional	None	filter sources by metadata values using Chroma metadata syntax (e.g., {‘table’:True})
where_document	NoneType	None	filter sources by document content (syntax varies by store type)
folders	Optional	None	folders to search (needed because LangChain does not forward “where” parameter)
limit	Optional	None	Number of sources to consider. If None, use `LLM.rag_num_source_docs`.
score_threshold	Optional	None	minimum similarity score of source. If None, use `LLM.rag_score_threshold`.
table_k	int	1	maximum number of tables to consider when generating answer
table_score_threshold	float	0.35	minimum similarity score for table to be considered in answer
kwargs	VAR_KEYWORD

source

LLM.chat

 LLM.chat (prompt:str, prompt_template=None, **kwargs)

*Chat with LLM.

Args:

question: a question you want to ask*

Example Usage

We’ll use a small 3B-parameter model here for testing purposes. The vector database is stored under ~/onprem_data by default. In this example, we will store the vector store in temporary folders.

import tempfile

vectordb_path = tempfile.mkdtemp()

url = 'https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_K_M.gguf'
llm = LLM(model_url=url,
          prompt_template = "<|system|>\n</s>\n<|user|>\n{prompt}</s>\n<|assistant|>", verbose=False, confirm=False)

llama_new_context_with_model: n_ctx_per_seq (3904) < n_ctx_train (32768) -- the full capacity of the model will not be utilized

assert os.path.isfile(
    os.path.join(U.get_datadir(), os.path.basename(url))
), "missing model"

prompt = """List three cute names for a cat."""

saved_output = llm.prompt(prompt)


1. Luna - this name means "moon" in Latin and is perfect for a cat with soft, moon-like fur or bright green eyes that seem to glow like the full moon.

2. Willow - named after the delicate branches of a willow tree, this name would suit a sweet, gentle kitty who loves to snuggle and purr contentedly in your lap.

3. Marshmallow - if you have a fluffy cat with a round tummy and a plump body, why not call her Marshmallow? This adorable name is sure to melt your heart as soon as you see her cute little face.

llm.ingest("./tests/sample_data/ktrain_paper/", chunk_size=500, chunk_overlap=50)

Appending to existing vectorstore at /home/amaiya/onprem_data/vectordb
Loading documents from ./sample_data/1/

Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00,  3.52it/s]

Loaded 6 new documents from ./sample_data/1/
Split into 41 chunks of text (max. 500 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...

100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.18it/s]

Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods

question = """What is ktrain?"""
result = llm.ask(question)
print("\n\nReferences:\n\n")
for i, document in enumerate(result["source_documents"]):
    print(f"\n{i+1}.> " + document.metadata["source"] + ":")
    print(document.page_content)


Ktrain is a Python library for machine learning that aims to provide a simple and unified interface for easily executing the three main steps of the machine learning process - preparing data, training models, and evaluating results - regardless of the type of data being used (such as text, images, or graphs). It is designed to help beginners and domain experts with limited programming or data science experience to build sophisticated machine learning models with minimal coding, while also serving as a useful toolbox for more experienced users. Ktrain follows a standard template for supervised learning tasks and supports custom models and data formats. It is licensed under the Apache license and can be found on GitHub at https://github.com/amaiya/ktrain. The text material mentions that ktrain was inspired by other low-code (and no-code) open-source ML libraries such as fastai and ludwig, and aims to further democratize machine learning by making it more accessible to a wider range of users.

References:



1.> /home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf:
transferred to, and executed on new data in a production environment.
ktrain is a Python library for machine learning with the goal of presenting a simple,
uniﬁed interface to easily perform the above steps regardless of the type of data (e.g., text
vs. images vs. graphs). Moreover, each of the three steps above can be accomplished in
©2022 Arun S. Maiya.
License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are

2.> /home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf:
custom models and data formats, as well. Inspired by other low-code (and no-code) open-
source ML libraries such as fastai (Howard and Gugger, 2020) and ludwig (Molino et al.,
2019), ktrain is intended to help further democratize machine learning by enabling begin-
ners and domain experts with minimal programming or data science experience to build
sophisticated machine learning models with minimal coding. It is also a useful toolbox for

3.> /home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf:
ktrain.Learner instance, which is an abstraction to facilitate training.
1. https://www.fast.ai/2018/07/16/auto-ml2/
2

4.> /home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf:
Apache license, and available on GitHub at: https://github.com/amaiya/ktrain.
2. Building Models
Supervised learning tasks in ktrain follow a standard, easy-to-use template.
STEP 1: Load and Preprocess Data. This step involves loading data from diﬀerent
sources and preprocessing it in a way that is expected by the model. In the case of text,
this may involve language-speciﬁc preprocessing (e.g., tokenization). In the case of images,