import tempfile
llm
LLM
LLM (model_url:Optional[str]=None, model_id:Optional[str]=None, default_model:str='mistral', default_engine:str='llama.cpp', n_gpu_layers:Optional[int]=None, prompt_template:Optional[str]=None, model_download_path:Optional[str]=None, vectordb_path:Optional[str]=None, max_tokens:int=512, n_ctx:int=3900, n_batch:int=1024, stop:list=[], mute_stream:bool=False, callbacks=[], embedding_model_name:str='sentence-transformers/all-MiniLM-L6-v2', embedding_model_kwargs:dict={'device': 'cpu'}, embedding_encode_kwargs:dict={'normalize_embeddings': False}, rag_num_source_docs:int=4, rag_score_threshold:float=0.0, check_model_download:bool=True, confirm:bool=True, verbose:bool=True, **kwargs)
*LLM Constructor. Extra kwargs
(e.g., temperature) are fed directly to langchain.llms.LlamaCpp
(if model_url
is supplied) or transformers.pipeline
(if model_id
is supplied).
Args:
- model_url: URL to
.GGUF
model (or the filename if already been downloaded tomodel_download_path
). To use an OpenAI-compatible REST API (e.g., vLLM, OpenLLM, Ollama), supply the URL (e.g.,http://localhost:8080/v1
). To use a cloud-based OpenAI model, replace URL with:openai://<name_of_model>
(e.g.,openai://gpt-3.5-turbo
). To use Azure OpenAI, replace URL with: with:azure://<deployment_name>
. If None, use the model indicated bydefault_model
. - model_id: Name of or path to Hugging Face model (e.g., in SafeTensor format). Hugging Face Transformers is used for LLM generation instead of llama-cpp-python. Mutually-exclusive with
model_url
anddefault_model
. Then_gpu_layers
andmodel_download_path
parameters are ignored ifmodel_id
is supplied. - default_model: One of {‘mistral’, ‘zephyr’, ‘llama’}, where mistral is Mistral-Instruct-7B-v0.2, zephyr is Zephyr-7B-beta, and llama is Llama-3.1-8B.
- default_engine: The engine used to run the
default_model
. One of {‘llama.cpp’, ‘transformers’}. - n_gpu_layers: Number of layers to be loaded into gpu memory. Default is
None
. - prompt_template: Optional prompt template (must have a variable named “prompt”). Prompt templates are not typically needed when using the
model_id
parameter, as transformers sets it automatically. - model_download_path: Path to download model. Default is
onprem_data
in user’s home directory. - vectordb_path: Path to vector database (created if it doesn’t exist). Default is
onprem_data/vectordb
in user’s home directory. - max_tokens: The maximum number of tokens to generate.
- n_ctx: Token context window. (Llama2 models have max of 4096.)
- n_batch: Number of tokens to process in parallel.
- stop: a list of strings to stop generation when encountered (applied to all calls to
LLM.prompt
) - mute_stream: Mute ChatGPT-like token stream output during generation
- callbacks: Callbacks to supply model
- embedding_model_name: name of sentence-transformers model. Used for
LLM.ingest
andLLM.ask
. - embedding_model_kwargs: arguments to embedding model (e.g.,
{device':'cpu'}
). - embedding_encode_kwargs: arguments to encode method of embedding model (e.g.,
{'normalize_embeddings': False}
). - rag_num_source_docs: The maximum number of documents retrieved and fed to
LLM.ask
andLLM.chat
to generate answers. - rag_score_threshold: Minimum similarity score for source to be considered by
LLM.ask
andLLM.chat
. - confirm: whether or not to confirm with user before downloading a model
- verbose: Verbosity*
AnswerConversationBufferMemory
AnswerConversationBufferMemory (*args:Any, chat_memory:langchain_core.chat_history.B aseChatMessageHistory=None, output_key:Optional[str]=None, input_key:Optional[str]=None, return_messages:bool=False, human_prefix:str='Human', ai_prefix:str='AI', memory_key:str='history')
*.. deprecated:: 0.3.1 Please see the migration guide at: https://python.langchain.com/docs/versions/migrating_memory/ It will not be removed until langchain==1.0.0.
A basic memory implementation that simply stores the conversation history.
This stores the entire conversation history in memory without any additional processing.
Note that additional processing may be required in some situations when the conversation history is too large to fit in the context window of the model.*
LLM.download_model
LLM.download_model (model_url:Optional[str]=None, default_model:str='mistral', model_download_path:Optional[str]=None, confirm:bool=True, ssl_verify:bool=True)
*Download an LLM in GGML format supported by lLama.cpp.
Args:
- model_url: URL of model. If None, then use default_model.
- default_model: One of {‘mistral’, ‘zephyr’, ‘llama’}, where mistral is Mistral-Instruct-7B-v0.2, zephyr is Zephyr-7B-beta, and llama is Llama-3.1-8B.
- model_download_path: Path to download model. Default is
onprem_data
in user’s home directory. - confirm: whether or not to confirm with user before downloading
- ssl_verify: If True, SSL certificates are verified. You can set to False if corporate firewall gives you problems.*
LLM.load_llm
LLM.load_llm ()
Loads the LLM from the model path.
LLM.load_ingester
LLM.load_ingester ()
Get Ingester
instance. You can access the langchain_chroma.Chroma
instance with load_ingester().get_db()
.
LLM.load_chatqa
LLM.load_chatqa ()
Prepares and loads a langchain.chains.ConversationalRetrievalChain
instance
LLM.query
LLM.query (query:str, k:int=4, score_threshold:float=0.0, filters:Optional[Dict[str,str]]=None, where_document:Optional[Dict[str,str]]=None, **kwargs)
Perform a semantic search of the vector DB
Type | Default | Details | |
---|---|---|---|
query | str | query string | |
k | int | 4 | max number of results to return |
score_threshold | float | 0.0 | minimum score for document to be considered as answer source |
filters | Optional | None | metadata filters (e.g., page=3) |
where_document | Optional | None | selections on document content in Chroma syntax (e.g., {“$contains”: “Canada”}) |
kwargs |
LLM.prompt
LLM.prompt (prompt:Union[str,List[Dict]], output_parser:Optional[Any]=None, image_path_or_url:Optional[str]=None, prompt_template:Optional[str]=None, stop:list=[], **kwargs)
*Send prompt to LLM to generate a response. Extra keyword arguments are sent directly to the model invocation.
Args:
- prompt: The prompt to supply to the model. Either a string or OpenAI-style list of dictionaries representing messages (e.g., “human”, “system”).
- image_path_or_url: Path or URL to an image file
- prompt_template: Optional prompt template (must have a variable named “prompt”). This value will override any
prompt_template
value supplied toLLM
constructor. - stop: a list of strings to stop generation when encountered. This value will override the
stop
parameter supplied toLLM
constructor.*
LLM.ingest
LLM.ingest (source_directory:str, chunk_size:int=500, chunk_overlap:int=50, ignore_fn:Optional[Callable]=None, **kwargs)
Ingests all documents in source_folder
into vector database. Previously-ingested documents are ignored. Extra kwargs fed to load_single_document
.
Type | Default | Details | |
---|---|---|---|
source_directory | str | path to folder containing documents | |
chunk_size | int | 500 | text is split to this many characters by langchain.text_splitter.RecursiveCharacterTextSplitter |
chunk_overlap | int | 50 | character overlap between chunks in langchain.text_splitter.RecursiveCharacterTextSplitter |
ignore_fn | Optional | None | callable that accepts the file path and returns True for ignored files |
kwargs |
LLM.ask
LLM.ask (question:str, qa_template='"Use the following pieces of context delimited by three backticks to answer the question at the end. If you don\'t know the answer, just say that you don\'t know, don\'t try to make up an answer.\n\n```{context}```\n\nQuestion: {question}\nHelpful Answer:', filters:Optional[Dict[str,str]]=None, where_document:Optional[Dict[str,str]]=None, **kwargs)
Answer a question based on source documents fed to the LLM.ingest
method. Extra keyword arguments are sent directly to LLM.prompt
. Returns a dictionary with keys: answer
, source_documents
, question
Type | Default | Details | |
---|---|---|---|
question | str | question as sting | |
qa_template | str | “Use the following pieces of context delimited by three backticks to answer the question at the end. If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.{context} Question: {question} Helpful Answer: |
question-answering prompt template to tuse |
filters | Optional | None | filter sources by metadata values (Chroma syntax) |
where_document | Optional | None | filter sources by document content in Chroma syntax (e.g., {“$contains”: “Canada”}) |
kwargs |
LLM.chat
LLM.chat (question:str, **kwargs)
*Chat with documents fed to the ingest
method. Unlike LLM.ask
, LLM.chat
includes conversational memory. Extra keyword arguments are sent directly to the model invocation.
Args:
- question: a question you want to ask
Returns:
- A dictionary with keys:
answer
,source_documents
,question
,chat_history
*
Example Usage
We’ll use a small 3B-parameter model here for testing purposes. The vector database is stored under ~/onprem_data
by default. In this example, we will store the vector store in temporary folders.
= tempfile.mkdtemp() vectordb_path
= 'https://huggingface.co/TheBloke/zephyr-7B-beta-GGUF/resolve/main/zephyr-7b-beta.Q4_K_M.gguf'
url = LLM(model_url=url,
llm = "<|system|>\n</s>\n<|user|>\n{prompt}</s>\n<|assistant|>", verbose=False, confirm=False) prompt_template
llama_new_context_with_model: n_ctx_per_seq (3904) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
assert os.path.isfile(
os.path.join(U.get_datadir(), os.path.basename(url))"missing model" ),
= """List three cute names for a cat.""" prompt
= llm.prompt(prompt) saved_output
1. Luna - this name means "moon" in Latin and is perfect for a cat with soft, moon-like fur or bright green eyes that seem to glow like the full moon.
2. Willow - named after the delicate branches of a willow tree, this name would suit a sweet, gentle kitty who loves to snuggle and purr contentedly in your lap.
3. Marshmallow - if you have a fluffy cat with a round tummy and a plump body, why not call her Marshmallow? This adorable name is sure to melt your heart as soon as you see her cute little face.
"./tests/sample_data/ktrain_paper/", chunk_size=500, chunk_overlap=50) llm.ingest(
Appending to existing vectorstore at /home/amaiya/onprem_data/vectordb
Loading documents from ./sample_data/1/
Loading new documents: 100%|██████████████████████| 1/1 [00:00<00:00, 3.52it/s]
Loaded 6 new documents from ./sample_data/1/
Split into 41 chunks of text (max. 500 chars each for text; max. 2000 chars for tables)
Creating embeddings. May take some minutes...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.18it/s]
Ingestion complete! You can now query your documents using the LLM.ask or LLM.chat methods
= """What is ktrain?"""
question = llm.ask(question)
result print("\n\nReferences:\n\n")
for i, document in enumerate(result["source_documents"]):
print(f"\n{i+1}.> " + document.metadata["source"] + ":")
print(document.page_content)
Ktrain is a Python library for machine learning that aims to provide a simple and unified interface for easily executing the three main steps of the machine learning process - preparing data, training models, and evaluating results - regardless of the type of data being used (such as text, images, or graphs). It is designed to help beginners and domain experts with limited programming or data science experience to build sophisticated machine learning models with minimal coding, while also serving as a useful toolbox for more experienced users. Ktrain follows a standard template for supervised learning tasks and supports custom models and data formats. It is licensed under the Apache license and can be found on GitHub at https://github.com/amaiya/ktrain. The text material mentions that ktrain was inspired by other low-code (and no-code) open-source ML libraries such as fastai and ludwig, and aims to further democratize machine learning by making it more accessible to a wider range of users.
References:
1.> /home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf:
transferred to, and executed on new data in a production environment.
ktrain is a Python library for machine learning with the goal of presenting a simple,
unified interface to easily perform the above steps regardless of the type of data (e.g., text
vs. images vs. graphs). Moreover, each of the three steps above can be accomplished in
©2022 Arun S. Maiya.
License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are
2.> /home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf:
custom models and data formats, as well. Inspired by other low-code (and no-code) open-
source ML libraries such as fastai (Howard and Gugger, 2020) and ludwig (Molino et al.,
2019), ktrain is intended to help further democratize machine learning by enabling begin-
ners and domain experts with minimal programming or data science experience to build
sophisticated machine learning models with minimal coding. It is also a useful toolbox for
3.> /home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf:
ktrain.Learner instance, which is an abstraction to facilitate training.
1. https://www.fast.ai/2018/07/16/auto-ml2/
2
4.> /home/amaiya/projects/ghub/onprem/nbs/sample_data/1/ktrain_paper.pdf:
Apache license, and available on GitHub at: https://github.com/amaiya/ktrain.
2. Building Models
Supervised learning tasks in ktrain follow a standard, easy-to-use template.
STEP 1: Load and Preprocess Data. This step involves loading data from different
sources and preprocessing it in a way that is expected by the model. In the case of text,
this may involve language-specific preprocessing (e.g., tokenization). In the case of images,