pipelines

Piplines for specific tasks like summarization

source

Summarizer

 Summarizer (llm, prompt_template:Optional[str]=None,
             map_prompt:Optional[str]=None,
             reduce_prompt:Optional[str]=None,
             refine_prompt:Optional[str]=None, **kwargs)

Summarizer summarizes one or more documents

Args:

  • llm: An onprem.LLM object
  • prompt_template: A model specific prompt_template with a single placeholder named “{prompt}”. All prompts (e.g., Map-Reduce prompts) are wrapped within this prompt. If supplied, overrides the prompt_template supplied to the LLM constructor.
  • map_prompt: Map prompt for Map-Reduce summarization. If None, default is used.
  • reduce_prompt: Reduce prompt for Map-Reduce summarization. If None, default is used.
  • refine_prompt: Refine prompt for Refine-based summarization. If None, default is used.

source

Summarizer.summarize

 Summarizer.summarize (fpath:str, strategy:str='map_reduce',
                       chunk_size:int=1000, chunk_overlap:int=0,
                       token_max:int=2000,
                       max_chunks_to_use:Optional[int]=None)

Summarize one or more documents (e.g., PDFs, MS Word, MS Powerpoint, plain text) using either Langchain’s Map-Reduce strategy or Refine strategy.

Args:

  • fpath: A path to either a folder of documents or a single file.
  • strategy: One of {‘map_reduce’, ‘refine’}.
  • chunk_size: Number of characters of each chunk to summarize
  • chunk_overlap: Number of characters that overlap between chunks
  • token_max: Maximum number of tokens to group documents into
  • max_chunks_to_use: Maximum number of chunks (starting from beginning) to use. Useful for documents that have abstracts or informative introductions. If None, all chunks are considered for summarizer.

Returns:

  • str: a summary of your documents