Haystack docs home page

Summarizer

The Summarizer gives a short overview of a long Document. The Summarizer can give you a glimpse of what Documents your Retriever is returning.

You can use any summarization model from Hugging Face Transformers by providing the model name. By default, the Google Pegasus model is loaded.

Position in a PipelineAfter preprocessing in an indexing Pipeline or after the Retriever in a querying Pipeline
InputDocuments
OutputDocuments
ClassesTransformersSummarizer

Usage

To initialize and run a stand-alone Summarizer:

from haystack.nodes import TransformersSummarizer
from haystack import Document
docs = [Document("PG&E stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions.\
The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were scheduled to be affected by\
the shutoffs which were expected to last through at least midday tomorrow.")]
summarizer = TransformersSummarizer(model_name_or_path="google/pegasus-xsum")
summary = summarizer.predict(documents=docs, generate_single_summary=True)

The contents of summary should contain both the summarization and also the original document text:

[
{
"text": "California's largest electricity provider has turned off power to hundreds of thousands of customers.",
"meta": {
"context": "PGE stated it scheduled the blackouts in response to forecasts for high winds amid dry conditions."
},
...
}
]

To use a Summarizer in a pipeline:

from haystack import Pipeline
p = Pipeline()
p.add_node(component=retriever, name="ESRetriever1", inputs=["Query"])
p.add_node(component=summarizer, name="Summarizer", inputs=["ESRetriever1"])
res = p.run(query="What did Einstein work on?")