Towards Stream-Relation Join Processing in Data Streaming Engines

View/ Open
Author(s)
Primary Supervisor
Stantic, Bela
Sattar, Abdul
Year published
2016
Metadata
Show full item recordAbstract
We are living in a time where a massive 2.5 quintillion bytes of data is generated every day. To realise the value of this data, stream data processing offers a new pro- cessing paradigm that aggregates and analyses large volumes of data quickly. While several commercial Stream Processing Engines (SPEs) are available, it remains dif- ficult to develop stream-based applications. Over the last decade our research has identified and addressed two dominant reasons for this difficulty: Heterogeneity and Stored-Streaming Divide. Heterogeneity highlights the lack of standards in SPEs as well as the wide and changing variety of ...
View more >We are living in a time where a massive 2.5 quintillion bytes of data is generated every day. To realise the value of this data, stream data processing offers a new pro- cessing paradigm that aggregates and analyses large volumes of data quickly. While several commercial Stream Processing Engines (SPEs) are available, it remains dif- ficult to develop stream-based applications. Over the last decade our research has identified and addressed two dominant reasons for this difficulty: Heterogeneity and Stored-Streaming Divide. Heterogeneity highlights the lack of standards in SPEs as well as the wide and changing variety of application requirements. Stored- Streaming Divide is the focus of this thesis. Stored-Streaming Divide emerges from the fact that commercial SPEs treat streaming data and relational data as separate entities even though applications increasingly demand integrated access to both. This integration manifests itself as the join between the stream of fast coming data and relational data sources and is what we call the Stream-Relation Join (SRJ) problem. Two solutions are provided to address the SRJ problem in this thesis. Some commercial SPEs and research projects take a radical approach to ad- dressing the SRJ problem by building an SPE on top of a database from scratch, we call this the stream-relational approach. This approach is cumbersome, it re- quires extensive alteration to the database kernel to process the streaming queries. Alternatively, our approach provides a lean layer that sits between the application and a commercial SPE, which we call the federation layer. This layer extends the database only to the point that it provides just enough functionality to interact with the application and the SPE. In doing so, the federation approach not only efficiently addresses the SRJ problem but ensures the application is portable across a range of commercial SPEs. How to build a federation layer is detailed in this thesis
View less >
View more >We are living in a time where a massive 2.5 quintillion bytes of data is generated every day. To realise the value of this data, stream data processing offers a new pro- cessing paradigm that aggregates and analyses large volumes of data quickly. While several commercial Stream Processing Engines (SPEs) are available, it remains dif- ficult to develop stream-based applications. Over the last decade our research has identified and addressed two dominant reasons for this difficulty: Heterogeneity and Stored-Streaming Divide. Heterogeneity highlights the lack of standards in SPEs as well as the wide and changing variety of application requirements. Stored- Streaming Divide is the focus of this thesis. Stored-Streaming Divide emerges from the fact that commercial SPEs treat streaming data and relational data as separate entities even though applications increasingly demand integrated access to both. This integration manifests itself as the join between the stream of fast coming data and relational data sources and is what we call the Stream-Relation Join (SRJ) problem. Two solutions are provided to address the SRJ problem in this thesis. Some commercial SPEs and research projects take a radical approach to ad- dressing the SRJ problem by building an SPE on top of a database from scratch, we call this the stream-relational approach. This approach is cumbersome, it re- quires extensive alteration to the database kernel to process the streaming queries. Alternatively, our approach provides a lean layer that sits between the application and a commercial SPE, which we call the federation layer. This layer extends the database only to the point that it provides just enough functionality to interact with the application and the SPE. In doing so, the federation approach not only efficiently addresses the SRJ problem but ensures the application is portable across a range of commercial SPEs. How to build a federation layer is detailed in this thesis
View less >
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
School of Information and Communication Technology
Copyright Statement
The author owns the copyright in this thesis, unless stated otherwise.
Item Access Status
Public
Subject
Stream-relation join processing
Stream processing engines (SPEs)
Stored-streaming divide
Data streaming