Bicluster Analysis of Biomedical Data based on Multi-objective Evolutionary Optimization

Loading...
Thumbnail Image
File version
Primary Supervisor

Liew, Wee-Chung

Other Supervisors

Blumenstein, Michael

Editor(s)
Date
2018-01
Size
File type(s)
Location
License
Abstract

Knowledge discovery is the process of finding hidden knowledge from a large volume of data that involves data mining. Data mining unveils interesting relationships among data and the results can help to make valuable predictions or recommendation in various applications. Recently, biclustering has become a common method in data mining and pattern recognition. Biclustering is an unsupervised machine learning method that can uncover and extract accurate and useful information from high-dimensional sparse data. Biclustering has found many useful applications for visualization and exploratory analysis in various fields such as knowledge discovery, data mining, pattern classification, information retrieval, collaborative filtering, and especially in gene expression data analysis such as functional annotation, tissue classification, and motif identification. It has been shown in previous studies that finding biclusters of data is inherently intractable and computationally complex. Generally, the challenges of biclustering include the high dimensionality of data, noisy data, different types of bicluster patterns, and the fact that biclusters can overlap. Although there are several studies in biclustering, after a review of the methods proposed in the literature, we found that these challenges are not addressed properly. Most of the proposed methods in literature can only detect a limited set of bicluster patterns under restrictive assumptions about the data. Moreover, in many methods biclusters are detected sequentially, i.e., the method replaces the detected bicluster with the background and detects the next bicluster, thus preventing the detection of overlapping biclusters. Given the above statements, there is a need for innovative methods to extract valuable information from the data and to reach a deeper understanding of the outcomes. Therefore, in this study, we first proposed a method (PBD-SPEA) that uses a new dynamic encoding scheme to detect multiple overlapped biclusters concurrently. However, the implementation is complex as there are several heuristic search procedures in different steps of the proposed method, and it is not able to detect all types of patterns in biclusters. Thus, a second method (LBDP) is proposed based on geometrical biclustering. In this method, we search for hyperplanes from the data using an evolutionary algorithm. Applying this idea, we are able to detect all types of bicluster patterns concurrently. We defined several scenarios in both synthetic and real data to test the performance of the proposed methods. Although our work is initially targeted for biomedical data (gene expression data), we also tested the generality of the algorithms on other non-medical data, such as image data and social networking data. In all scenarios, our methods achieved reliable results compared to several state-of-the-arts.

Journal Title
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type

Thesis (PhD Doctorate)

Degree Program

Doctor of Philosophy (PhD)

School

School of Info & Comm Tech

Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement

The author owns the copyright in this thesis, unless stated otherwise.

Item Access Status
Note
Access the data
Related item(s)
Subject

Bicluster analysis

Biomedical data

Multi-objective evolutionary optimization

Gene expression data

Image data

Social networking

Persistent link to this record
Citation