pipelines.summarizer

Pipelines for specific tasks like summarization

Summarizer

 Summarizer (llm, prompt_template:Optional[str]=None,
             map_prompt:Optional[str]=None,
             reduce_prompt:Optional[str]=None,
             refine_prompt:Optional[str]=None, **kwargs)

*Summarizer summarizes one or more documents

Args:

llm: An onprem.LLM object
prompt_template: A model specific prompt_template with a single placeholder named “{prompt}”. All prompts (e.g., Map-Reduce prompts) are wrapped within this prompt. If supplied, overrides the prompt_template supplied to the LLM constructor.
map_prompt: Map prompt for Map-Reduce summarization. If None, default is used.
reduce_prompt: Reduce prompt for Map-Reduce summarization. If None, default is used.
refine_prompt: Refine prompt for Refine-based summarization. If None, default is used.*

source

Summarizer.summarize

 Summarizer.summarize (fpath:str, strategy:str='map_reduce',
                       chunk_size:int=1000, chunk_overlap:int=0,
                       token_max:int=2000,
                       max_chunks_to_use:Optional[int]=None)

Summarize one or more documents (e.g., PDFs, MS Word, MS Powerpoint, plain text) using either Langchain’s Map-Reduce strategy or Refine strategy. The max_chunks parameter may be useful for documents that have abstracts or informative introductions. If max_chunks=None, all chunks are considered for summarizer.

	Type	Default	Details
fpath	str		path to either a folder of documents or a single file
strategy	str	map_reduce	One of {‘map_reduce’, ‘refine’}
chunk_size	int	1000	Number of characters of each chunk to summarize
chunk_overlap	int	0	Number of characters that overlap between chunks
token_max	int	2000	Maximum number of tokens to group documents into
max_chunks_to_use	Optional	None	Maximum number of chunks (starting from beginning) to use

source

Summarizer.summarize_by_concept

 Summarizer.summarize_by_concept (fpath:str, concept_description:str,
                                  similarity_threshold:float=0.0,
                                  max_chunks:int=4,
                                  similarity_method:str='tfidf',
                                  summary_prompt:str='What does the
                                  following context say with respect
                                  "{concept_description}"?
                                  \n\nCONTEXT:\n{text}')

Summarize document with respect to concept described by concept_description. Returns a tuple of the form (summary, sources).

	Type	Default	Details
fpath	str		path to file
concept_description	str		Summaries are generated with respect to the described concept.
similarity_threshold	float	0.0	Minimum similarity for consideration. Tip: Increase this when using similarity_method=“senttransform” to mitigate hallucination. A value of 0.0 is sufficient for TF-IDF or should be kept near-zero.
max_chunks	int	4	Only this many snippets above similarity_threshold are considered.
similarity_method	str	tfidf	One of “senttransform” (sentence-transformer embeddings) or “tfidf” (TF-IDF)
summary_prompt	str	What does the following context say with respect “{concept_description}”? CONTEXT: {text}	The prompt used for summarization. Should have exactly two variables, {concept_description} and {text}.