pipelines.summarizer
Pipelines for specific tasks like summarization
Summarizer
Summarizer (llm, prompt_template:Optional[str]=None, map_prompt:Optional[str]=None, reduce_prompt:Optional[str]=None, refine_prompt:Optional[str]=None, **kwargs)
*Summarizer
summarizes one or more documents
Args:
- llm: An
onprem.LLM
object - prompt_template: A model specific prompt_template with a single placeholder named “{prompt}”. All prompts (e.g., Map-Reduce prompts) are wrapped within this prompt. If supplied, overrides the
prompt_template
supplied to theLLM
constructor. - map_prompt: Map prompt for Map-Reduce summarization. If None, default is used.
- reduce_prompt: Reduce prompt for Map-Reduce summarization. If None, default is used.
- refine_prompt: Refine prompt for Refine-based summarization. If None, default is used.*
Summarizer.summarize
Summarizer.summarize (fpath:str, strategy:str='map_reduce', chunk_size:int=1000, chunk_overlap:int=0, token_max:int=2000, max_chunks_to_use:Optional[int]=None)
Summarize one or more documents (e.g., PDFs, MS Word, MS Powerpoint, plain text) using either Langchain’s Map-Reduce strategy or Refine strategy. The max_chunks
parameter may be useful for documents that have abstracts or informative introductions. If max_chunks=None
, all chunks are considered for summarizer.
Type | Default | Details | |
---|---|---|---|
fpath | str | path to either a folder of documents or a single file | |
strategy | str | map_reduce | One of {‘map_reduce’, ‘refine’} |
chunk_size | int | 1000 | Number of characters of each chunk to summarize |
chunk_overlap | int | 0 | Number of characters that overlap between chunks |
token_max | int | 2000 | Maximum number of tokens to group documents into |
max_chunks_to_use | Optional | None | Maximum number of chunks (starting from beginning) to use |
Summarizer.summarize_by_concept
Summarizer.summarize_by_concept (fpath:str, concept_description:str, similarity_threshold:float=0.0, max_chunks:int=4, similarity_method:str='tfidf', summary_prompt:str='What does the following context say with respect "{concept_description}"? \n\nCONTEXT:\n{text}')
Summarize document with respect to concept described by concept_description
. Returns a tuple of the form (summary, sources).
Type | Default | Details | |
---|---|---|---|
fpath | str | path to file | |
concept_description | str | Summaries are generated with respect to the described concept. | |
similarity_threshold | float | 0.0 | Minimum similarity for consideration. Tip: Increase this when using similarity_method=“senttransform” to mitigate hallucination. A value of 0.0 is sufficient for TF-IDF or should be kept near-zero. |
max_chunks | int | 4 | Only this many snippets above similarity_threshold are considered. |
similarity_method | str | tfidf | One of “senttransform” (sentence-transformer embeddings) or “tfidf” (TF-IDF) |
summary_prompt | str | What does the following context say with respect “{concept_description}”? CONTEXT: {text} |
The prompt used for summarization. Should have exactly two variables, {concept_description} and {text}. |