Self-Supervision Improves Diffusion Models for Tabular Data Imputation

Loading...
Thumbnail Image
File version

Version of Record (VoR)

Author(s)
Liu, Y
Ajanthan, T
Husain, H
Nguyen, V
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Date
2024
Size
File type(s)
Location

Boise, United States

Abstract

The ubiquity of missing data has sparked considerable attention and focus on tabular data imputation methods. Diffusion models, recognized as the cutting-edge technique for data generation, demonstrate significant potential in tabular data imputation tasks. However, in pursuit of diversity, vanilla diffusion models often exhibit sensitivity to initialized noises, which hinders the models from generating stable and accurate imputation results. Additionally, the sparsity inherent in tabular data poses challenges for diffusion models in accurately modeling the data manifold, impacting the robustness of these models for data imputation. To tackle these challenges, this paper introduces an advanced diffusion model named S elf-supervised imp utation D iffusion M odel (SimpDM for brevity), specifically tailored for tabular data imputation tasks. To mitigate sensitivity to noise, we introduce a self-supervised alignment mechanism that aims to regularize the model, ensuring consistent and stable imputation predictions. Furthermore, we introduce a carefully devised state-dependent data augmentation strategy within SimpDM, enhancing the robustness of the diffusion model when dealing with limited data. Extensive experiments demonstrate that SimpDM matches or outperforms state-of-the-art imputation methods across various scenarios.

Journal Title
Conference Title

CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

Book Title
Edition
Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

© 2024 Owner/Author. This work is licensed under a Creative Commons Attribution International 4.0 License.

Item Access Status
Note
Access the data
Related item(s)
Subject
Persistent link to this record
Citation

Liu, Y; Ajanthan, T; Husain, H; Nguyen, V, Self-Supervision Improves Diffusion Models for Tabular Data Imputation, CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024, pp. 1513-1522