Domain adaptation and Summary Distillation for Unsupervised Query Focused Summarization

Abstract

Text summarizing is the task of reducing a document’s length while maintaining its essential information. In the age of information explosion, how to obtain the content that users needed from a large volume of information becomes particularly significant. Under such circumstances, query-focused abstractive summarization ( qfs ) becomes more dominant since it is able to focus on user needs while delivering fluent, concise, succinct paraphrased summaries. However, unlike generic summarization, which has achieved remarkable progress driven by a substantial amount of parallel data, the qfs struggles due to a deficiency of parallel corpus. Therefore, in this paper, we leverage a typical large generic summarization dataset to facilitate the pressing demands on unsupervised qfs . The large-scale query-free benchmark is automatically transformed into a query-focused dataset (Query-CNNDM) while preserving its informative summaries. We propose a simple yet effective unsupervised method, called D omain A daptation and S ummary D istillation method ( DASD ). In the model, to achieve the domain adaptation for unsupervised qfs , we design a query-aware gap sentence generation (q-GSG) strategy to equip the model with the capability of learning target textual knowledge and obtaining a good initialization at the target domain. As instance-specific regularization, we train a teacher model with the Query-CNNDM to generate pseudo-labels for summary distillation. Experimental results indicate that our DASD model achieves state-of-the-art performance on two benchmark datasets, Debatepedia and Wikiref, in a zero-shot setting and shows good generalization to the abstractive few-shot qfs .

Publication
IEEE Transactions on Knowledge and Data Engineering
Date
Links
PDF