Summarization

The pipelines modules includes the Summarizer to summarize one or more documents with an LLM. This notebook shows a couple of examples.

The Summarizer runs multiple intermediate prompts and inferences, so we will set verbose-False and mute_stream=True. We will also set temperature=0 for more consistency in outputs and use the default 7B model (i.e., Wizard-Vicuna-7B-Uncensored). You can experiment with different, newer models to improve results.

from onprem import LLM
from onprem.pipelines import Summarizer
llm = LLM(n_gpu_layers=33, verbose=False, mute_stream=True, temperature=0) # set based on your system
summarizer = Summarizer(llm)

Next, let’s download the ktrain paper and summarize it.

!wget --user-agent="Mozilla" https://arxiv.org/pdf/2004.10703.pdf -O /tmp/ktrain.pdf -q
text = summarizer.summarize('/tmp/ktrain.pdf', max_chunks_to_use=5)
print(text['output_text'])
 K-TRAIN is a low-code Python library that provides an unified interface to build, train, inspect, and apply sophisticated state-of-the-art models for text data, vision data, graph data, and tabular data. It reduces cognitive load and allows users to focus on more important tasks that may require domain expertise or are less amenable to automation. K-TRAIN's evaluate() method computes detailed validation metrics and shows a report of the learner, while view_top_losses() allows for inspecting examples with the highest validation loss. Making predictions on new data is simple with ktrain's Predictor class, which encapsulates both the model and preprocessing steps required to transform raw data into the format expected by the model. Explanations for decisions made by the model are available through explain() method. Non-supervised ML tasks such as training unsupervised topic models can also be done with ktrain's low-code approach.

For faster summarizations, we set max_chunks_to_use=5, so that only the first five chunks of 1000 characters are considered (where chunk_size=1000 is set as the default). You can set max_chunks_to_use to None (or omit the parameter) to consider the entire document when generating the summarization, as shown in the next example.

Next, let’s download an example blog post about LLMs and summarize it.

from langchain.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
docs = loader.load()
with open('/tmp/blog.txt', 'w') as f:
    f.write(docs[0].page_content)

text = summarizer.summarize('/tmp/blog.txt') # this takes longer as it looks at ever piece of text in the blog post
print(text['output_text'])
Created a chunk of size 1003, which is longer than the specified 1000
 The development of large language models (LLMs) has enabled the creation of autonomous agents that can reason and act in complex environments. These LLM-powered agents are capable of generating text, solving problems, and integrating external knowledge sources to perform a wide range of tasks. However, they still face limitations such as finite context length, task decomposition, natural language interface reliability, and planning robustness. To overcome these challenges, researchers have developed various approaches to prompting LLMs for specific tasks, integrating external knowledge sources into LLMs, and improving the performance of LLM-powered agents through self-reflection and incremental improvement. These documents demonstrate the potential of LLMs in enabling autonomous agents that can perform complex tasks with human-like intelligence.

If there is a need, you can experiment with different parameters, as described in our documentation.