Application of ASR to a sociolinguistic corpus of Australian English

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Weiss, Maya
Gnevsheva, Ksenia
Travis, Catherine
Docherty, Gerard
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2024
Size
File type(s)
Location

Melbourne, Australia

License
Abstract

This study applies Automatic Speech Recognition (ASR) to a sociolinguistic corpus of Australian English. We compare a human transcription of excerpts from 20 urban and regional speakers with a transcription generated by Microsoft’s Azure AI Speech. The Word Error Rate is comparable to previous studies, and is not impacted by the sociolinguistic variables of speaker region and gender, nor the phonetic variable of vowel formants. Despite the overall low rate of transcription errors, our findings suggest that the quality of certain vowel categories that are particularly characteristic of Australian English can impact on the accuracy of the ASR-generated transcription.

Journal Title
Conference Title

Proceedings of the Nineteenth Australasian International Conference on Speech Science and Technology

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
DOI
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

This work is covered by copyright. You must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the document is available under a specified licence, refer to the licence for details of permitted re-use. If you believe that this work infringes copyright please make a copyright takedown request using the form at https://www.griffith.edu.au/copyright-matters.

Item Access Status
Note
Access the data
Related item(s)
Subject

Phonetics and speech science

Sociolinguistics

Persistent link to this record
Citation

Weiss, M; Gnevsheva, K; Travis, C; Docherty, G, Application of ASR to a sociolinguistic corpus of Australian English, Proceedings of the Nineteenth Australasian International Conference on Speech Science and Technology, 2024, pp. 27-31