Identifying and embedding transferability in data-driven representations of chemical space

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Gould, Tim
Chan, Bun
Dale, Stephen G
Vuckovic, Stefan
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2024
Size
File type(s)
Location
Abstract

Transferability, especially in the context of model generalization, is a paradigm of all scientific disciplines. However, the rapid advancement of machine learned model development threatens this paradigm, as it can be difficult to understand how transferability is embedded (or missed) in complex models developed using large training data sets. Two related open problems are how to identify, without relying on human intuition, what makes training data transferable; and how to embed transferability into training data. To solve both problems for ab initio chemical modelling, an indispensable tool in everyday chemistry research, we introduce a transferability assessment tool (TAT) and demonstrate it on a controllable data-driven model for developing density functional approximations (DFAs). We reveal that human intuition in the curation of training data introduces chemical biases that can hamper the transferability of data-driven DFAs. We use our TAT to motivate three transferability principles; one of which introduces the key concept of transferable diversity. Finally, we propose data curation strategies for general-purpose machine learning models in chemistry that identify and embed the transferability principles.

Journal Title

Chemical Science

Conference Title
Book Title
Edition
Volume

15

Issue

28

Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)

ARC

Grant identifier(s)

DP200100033

FT210100663

Rights Statement
Rights Statement

© 2024 The Author(s). Published by the Royal Society of Chemistry. This article is licensed under a Creative Commons Attribution 3.0 Unported Licence.

Item Access Status
Note
Access the data
Related item(s)
Subject

Chemical sciences

Persistent link to this record
Citation

Gould, T; Chan, B; Dale, SG; Vuckovic, S, Identifying and embedding transferability in data-driven representations of chemical space, Chemical Science, 2024, 15 (28), pp. 11122-11133

Collections