from onprem import LLM
llm helpers
parse_code_markdown
parse_code_markdown (text:str, only_last:bool)
Parsing embedded code out of markdown string
parse_json_markdown
parse_json_markdown (text:str)
Parse json embedded in markdown into dictionary
decompose_question
decompose_question (question:str, llm, parse=True, **kwargs)
Decompose a question into subquestions
needs_followup
needs_followup (question:str, llm, parse=True, **kwargs)
Decide if follow-up questions are needed
extract_title
extract_title (docs_or_text:Union[List[langchain_core.documents.base.Doc ument],str], llm, max_words=1024, retries=1, **kwargs)
*Extract or infer the title for the given text
Args - docs_or_text: Either a list of LangChain Document objects or a single text string - llm: An onprem.LLM instance - max_words: Maximum words to consider - retries: Number of tries to correctly extract title*
Title
Title (title:str)
*Usage docs: https://docs.pydantic.dev/2.10/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
caption_tables
caption_tables (docs:List[langchain_core.documents.base.Document], llm, max_chars=4096, max_tables=3, retries=1, attempt_exact=False, only_caption_missing=False, **kwargs)
*Given a list of Documents, auto-caption or summarize any tables within list.
Args - docs_or_text: A list of LangChain Document objects - llm: An onprem.LLM instance - max_chars: Maximum characters to consider - retries: Number of tries to correctly auto-caption table - attempt_exact: Try to exact existing caption if it exists. - only_caption_missing: Only caption tables without a caption*
caption_table_text
caption_table_text (table_text:str, llm, max_chars=4096, retries=1, attempt_exact=False, **kwargs)
Caption table text
TableSummary
TableSummary (summary:str)
*Usage docs: https://docs.pydantic.dev/2.10/concepts/models/
A base class for creating Pydantic models.
Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__
[Signature
][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: Metadata for generic models; contains data used for a similar purpose to
__args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.*
Examples
= LLM(default_model='llama', n_gpu_layers=-1, verbose=False, mute_stream=True) llm
llama_new_context_with_model: n_ctx_per_seq (3904) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
Deciding on Follow-Up Questions
'What is ktrain?', llm=llm) needs_followup(
False
'What is the capital of France?', llm=llm) needs_followup(
False
"How was Paul Grahams life different before, during, and after YC?", llm) needs_followup(
True
"Compare and contrast the customer segments and geographies of Lyft and Uber that grew the fastest.", llm) needs_followup(
True
"Compare and contrast Uber and Lyft.", llm) needs_followup(
True
Generating Follow-Up Questions
= "Compare and contrast the customer segments and geographies of Lyft and Uber that grew the fastest."
question = decompose_question(question, llm=llm, parse=False)
subquestions print()
print(subquestions)
['What are the customer segments that drove growth for Uber', 'What are the geographies where Uber grew the fastest', 'What are the customer segments that drove growth for Lyft', 'What are the geographies where Lyft grew the fastest']
Extract Titles
from onprem.ingest import load_single_document
= load_single_document('tests/sample_data/ktrain_paper/ktrain_paper.pdf')
docs = extract_title(docs, llm=llm)
title print(title)
ktrain: A Low-Code Library for Augmented Machine Learning
Auto-Caption Tables
= load_single_document('tests/sample_data/ktrain_paper/ktrain_paper.pdf', infer_table_structure=True)
docs = [d for d in docs if 'table' in d.metadata][0] table_doc
=False) caption_tables([table_doc], llm, only_caption_missing
print(table_doc.page_content)
Comparison of AutoML Libraries for Different Data Types
The following table in markdown format has the caption: Table 1: A comparison of ML tasks supported out-of-the-box in popular low-code and AutoML libraries for tabular, image, audio, text and graph data..
|Task|ktrain|fastai|Ludwig|AutoKeras|AutoGluon|
|---|---|---|---|---|---|
|Tabular: Classification/Regression|✓|✓|✓|✓|✓|
|Tabular: Causal Machine Learning|✓|None|None|None|None|
|Tabular: Time Series Forecasting|None|None|✓|✓|None|
|Tabular: Collaborative Filtering|None|✓|None|None|None|
|Image: Classification/Regression|✓|✓|✓|✓|✓|
|Image: Object Detection|prefitted*|✓|None|None|✓|
|Image: Image Captioning|prefitted*|None|✓|None|None|
|Image: Segmentation|None|✓|None|None|None|
|Image: GANs|None|✓|None|None|None|
|Image: Keypoint/Pose Estimation|None|✓|None|None|None|
|Audio: Classification/Regression|None|None|✓|None|None|
|Audio: Speech Transcription|prefitted*|None|✓|None|None|
|Text: Classification/Regression|✓|✓|✓|✓|✓|
|Text: Sequence-Tagging|✓|None|✓|None|None|
|Text: Unsupervised Topic Modeling|✓|None|None|None|None|
|Text: Semantic Search|✓|None|None|None|None|
|Text: End-to-End Question-Answering|✓*|None|None|None|None|
|Text: Zero-Shot Learning|✓|None|None|None|None|
|Text: Language Translation|prefitted*|None|✓|None|None|
|Text: Summarization|prefitted*|None|✓|None|None|
|Text: Text Extraction|✓|None|None|None|None|
|Text: QA-Based Information Extraction|✓*|None|None|None|None|
|Text: Keyphrase Extraction|✓|None|None|None|None|
|Graph: Node Classification|✓|None|None|None|None|
|Graph: Link Prediction|✓|None|None|None|None|
The caption_tables
function pre-pended the table text with an alternative caption in this example. You can skip over tables that already have captions by supplying only_caption_missing=True
.