Leveraging Information Bottleneck for Scientific Document Summarization

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Ju, J
Liu, M
Koh, HY
Jin, Y
Du, L
Pan, S
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2021
Size
File type(s)
Location

Dominican Republic

Abstract

This paper presents an unsupervised extractive approach to summarize scientific long documents based on the Information Bottleneck principle. Inspired by previous work which uses the Information Bottleneck principle for sentence compression, we extend it to document level summarization with two separate steps. In the first step, we use signal(s) as queries to retrieve the key content from the source document. Then, a pre-trained language model conducts further sentence search and edit to return the final extracted summaries. Importantly, our work can be flexibly extended to a multi-view framework by different signals. Automatic evaluation on three scientific document datasets verifies the effectiveness of the proposed framework. The further human evaluation suggests that the extracted summaries cover more content aspects than previous systems.

Journal Title
Conference Title

Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2021 Association for Computational Linguistics. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation

Ju, J; Liu, M; Koh, HY; Jin, Y; Du, L; Pan, S, Leveraging Information Bottleneck for Scientific Document Summarization, Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, 2021, pp. 4091-4098