Identifying Structure-Property Relationships through SMILES Syntax Analysis with Self-Attention Mechanism
Author(s)
Zheng, Shuangjia
Yan, Xin
Yang, Yuedong
Xu, Jun
Griffith University Author(s)
Year published
2019
Metadata
Show full item recordAbstract
Recognizing substructures and their relations embedded in a molecular structure representation is a key process for structure–activity or structure–property relationship (SAR/SPR) studies. A molecular structure can be explicitly represented as either a connection table (CT) or linear notation, such as SMILES, which is a language describing the connectivity of atoms in the molecular structure. Conventional SAR/SPR approaches rely on partitioning the CT into a set of predefined substructures as structural descriptors. In this work, we propose a new method to identifying SAR/SPR through linear notation (for example, SMILES) ...
View more >Recognizing substructures and their relations embedded in a molecular structure representation is a key process for structure–activity or structure–property relationship (SAR/SPR) studies. A molecular structure can be explicitly represented as either a connection table (CT) or linear notation, such as SMILES, which is a language describing the connectivity of atoms in the molecular structure. Conventional SAR/SPR approaches rely on partitioning the CT into a set of predefined substructures as structural descriptors. In this work, we propose a new method to identifying SAR/SPR through linear notation (for example, SMILES) syntax analysis with self-attention mechanism, an interpretable deep learning architecture. The method has been evaluated by predicting chemical properties, toxicology, and bioactivity from experimental data sets. Our results demonstrate that the method yields superior performance compared with state-of-the-art models. Moreover, the method can produce chemically interpretable results, which can be used for a chemist to design and synthesize the activity- or property-improved compounds.
View less >
View more >Recognizing substructures and their relations embedded in a molecular structure representation is a key process for structure–activity or structure–property relationship (SAR/SPR) studies. A molecular structure can be explicitly represented as either a connection table (CT) or linear notation, such as SMILES, which is a language describing the connectivity of atoms in the molecular structure. Conventional SAR/SPR approaches rely on partitioning the CT into a set of predefined substructures as structural descriptors. In this work, we propose a new method to identifying SAR/SPR through linear notation (for example, SMILES) syntax analysis with self-attention mechanism, an interpretable deep learning architecture. The method has been evaluated by predicting chemical properties, toxicology, and bioactivity from experimental data sets. Our results demonstrate that the method yields superior performance compared with state-of-the-art models. Moreover, the method can produce chemically interpretable results, which can be used for a chemist to design and synthesize the activity- or property-improved compounds.
View less >
Journal Title
JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume
59
Issue
2
Subject
Medicinal and biomolecular chemistry
Theoretical and computational chemistry