An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics
File version
Accepted Manuscript (AM)
Author(s)
Ju, Jiaxin
Liu, Ming
Pan, Shirui
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
Long documents such as academic articles and business reports have been the standard format to detail out important issues and complicated subjects that require extra attention. An automatic summarization system that can effectively condense long documents into short and concise texts to encapsulate the most important information would thus be significant in aiding the reader’s comprehension. Recently, with the advent of neural architectures, significant research efforts have been made to advance automatic text summarization systems, and numerous studies on the challenges of extending these systems to the long document domain have emerged. In this survey, we provide a comprehensive overview of the research on long document summarization and a systematic evaluation across the three principal components of its research setting: benchmark datasets, summarization models, and evaluation metrics. For each component, we organize the literature within the context of long document summarization and conduct an empirical analysis to broaden the perspective on current research progress. The empirical analysis includes a study on the intrinsic characteristics of benchmark datasets, a multi-dimensional analysis of summarization models, and a review of the summarization evaluation metrics. Based on the overall findings, we conclude by proposing possible directions for future exploration in this rapidly growing field.
Journal Title
ACM Computing Surveys
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
DOI
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
© 2022, YEAR. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in 2022, https://doi.org/10.1145/3545176I
Item Access Status
Note
This publication has been entered in Griffith Research Online as an advanced online version.
Access the data
Related item(s)
Subject
Information and computing sciences
Persistent link to this record
Citation
Koh, HY; Ju, J; Liu, M; Pan, S, An Empirical Survey on Long Document Summarization: Datasets, Models and Metrics, ACM Computing Surveys