pipelines
Piplines for specific tasks like summarization
Summarizer
Summarizer (llm, prompt_template:Optional[str]=None, map_prompt:Optional[str]=None, reduce_prompt:Optional[str]=None, refine_prompt:Optional[str]=None, **kwargs)
Summarizer
summarizes one or more documents
Args:
- llm: An
onprem.LLM
object - prompt_template: A model specific prompt_template with a single placeholder named “{prompt}”. All prompts (e.g., Map-Reduce prompts) are wrapped within this prompt. If supplied, overrides the
prompt_template
supplied to theLLM
constructor. - map_prompt: Map prompt for Map-Reduce summarization. If None, default is used.
- reduce_prompt: Reduce prompt for Map-Reduce summarization. If None, default is used.
- refine_prompt: Refine prompt for Refine-based summarization. If None, default is used.
Summarizer.summarize
Summarizer.summarize (fpath:str, strategy:str='map_reduce', chunk_size:int=1000, chunk_overlap:int=0, token_max:int=2000, max_chunks_to_use:Optional[int]=None)
Summarize one or more documents (e.g., PDFs, MS Word, MS Powerpoint, plain text) using either Langchain’s Map-Reduce strategy or Refine strategy.
Args:
- fpath: A path to either a folder of documents or a single file.
- strategy: One of {‘map_reduce’, ‘refine’}.
- chunk_size: Number of characters of each chunk to summarize
- chunk_overlap: Number of characters that overlap between chunks
- token_max: Maximum number of tokens to group documents into
- max_chunks_to_use: Maximum number of chunks (starting from beginning) to use. Useful for documents that have abstracts or informative introductions. If None, all chunks are considered for summarizer.
Returns:
- str: a summary of your documents