Summarization

The pipelines modules in OnPrem.LLM includes the Summarizer to summarize one or more documents with an LLM. This notebook shows a couple of examples.

The Summarizer runs multiple intermediate prompts and inferences, so we will set verbose-False and mute_stream=True. We will also set temperature=0 for more consistency in outputs and use the default 7B model (i.e., Mistral-7B-Instruct-v0.2). You can experiment with different, newer models to improve results.

from onprem import LLM
from onprem.pipelines import Summarizer
llm = LLM(n_gpu_layers=-1, verbose=False, mute_stream=True, temperature=0) # set based on your system
summarizer = Summarizer(llm)

Next, let’s download the ktrain paper and summarize it.

!wget --user-agent="Mozilla" https://arxiv.org/pdf/2004.10703.pdf -O /tmp/ktrain.pdf -q
text = summarizer.summarize('/tmp/ktrain.pdf', max_chunks_to_use=5)
print(text['output_text'])
/home/amaiya/mambaforge/envs/llm/lib/python3.9/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The class `LLMChain` was deprecated in LangChain 0.1.17 and will be removed in 0.3.0. Use RunnableSequence, e.g., `prompt | llm` instead.
  warn_deprecated(
 ktrain is an open-source, low-code machine learning library for Python. It simplifies various machine learning tasks, including supervised and non-supervised tasks. The library also offers explainable AI capabilities using libraries like shap, eli5 with lime. Additionally, it provides a simple prediction API and supports saving and reloading predictor instances for deployment to production environments.

For faster summarizations, we set max_chunks_to_use=5, so that only the first five chunks of 1000 characters are considered (where chunk_size=1000 is set as the default). You can set max_chunks_to_use to None (or omit the parameter) to consider the entire document when generating the summarization, as shown in the next example.

Next, let’s download an example blog post about LLMs and summarize it.

from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()
with open('/tmp/blog.txt', 'w') as f:
    f.write(docs[0].page_content)

text = summarizer.summarize('/tmp/blog.txt') # this takes longer as it looks at ever piece of text in the blog post
WARNING:langchain_text_splitters.base:Created a chunk of size 1003, which is longer than the specified 1000
print(text['output_text'])
 This document discusses techniques and approaches for autonomous agents to plan, reason, act, and reflect. The document covers two main approaches: ReAct and Reflexion.

ReAct is a technique used in reinforcement learning tasks to improve performance by using past learning history as input.

Reflexion is a framework that equips agents with dynamic memory and self-reflection capabilities to improve reasoning skills.

The document also discusses various techniques for task decomposition, such as using LLMs with simple prompting, using task-specific instructions, or using human inputs. The document also mentions the use of two-shot examples to show failed trajectories and ideal reflection for guiding future changes in the plan. These reflections are then added into the agent’s working memory to be used as context for querying LLM.

The provided document discusses instructions for creating an architecture with specific core classes, functions, and methods. The architecture will be implemented as code following best practices for the requested languages. The code will be organized into different files, each containing appropriate imports, types, and dependencies. The files will be compatible with each other, and all parts of the architecture will be present in the files. For Python, the toolbelt preferences include pytest, dataclasses.

If there is a need, you can experiment with different parameters, as described in our documentation.