GPipe: Using Adaptive Directed Acyclic Graphs to Run Data and Feature Pipelines with on-the-fly Transformations

No Thumbnail Available
File version
Author(s)
de Brum Müller, JH
Rabhi, F
Milosevic, Z
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

Hussain, Walayat

Gao, Honghao

Rabhi, Fethi

Martinez, Luis

Date
2024
Size
File type(s)
Location
License
Abstract

Businesses can gain significant value from data for decision making via the construction of complex data analytic pipelines that have a dual purpose of creating reports and serving as machine learning (ML) models. These pipelines implement a series of transformations via scripts that read raw data from different sources and aggregate, clean, transform and save it back into tables. The main challenge addressed in this chapter is how to efficiently transform raw data on the fly into features to be used by ML models. At the same time, the efforts required to maintain the scripts in the face of changes must be minimized. Building on existing solutions, this chapter proposes a hybrid approach that makes a trade-off between supporting dependency change management and allowing partial processing while ensuring platform independence. It uses a directed acyclic graph (DAG) to represent data and features transformations in a way that minimizes the overall processing required and eases the maintenance of the data processing scripts. A prototype has been developed to evaluate the proposed architecture and preliminary performance results are discussed.

Journal Title
Conference Title
Book Title

Advances in Complex Decision Making: Using Machine Learning and Tools for Service-Oriented Computing

Edition

1st

Volume
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Information and computing sciences

Machine learning

Artificial intelligence

Persistent link to this record
Citation

de Brum Müller, JH; Rabhi, F; Milosevic, Z, GPipe: Using Adaptive Directed Acyclic Graphs to Run Data and Feature Pipelines with on-the-fly Transformations, Advances in Complex Decision Making: Using Machine Learning and Tools for Service-Oriented Computing, 2024, 1st, pp. 21-37

Collections