Bicluster Analysis for Coherent Pattern Discovery
Author(s)
Liew, Alan Wee-Chung
Gan, Xiangchao
Law, Ngai-Fong
Yan, Hong
Griffith University Author(s)
Year published
2015
Metadata
Show full item recordAbstract
In unsupervised data mining, one is usually interested in discovering groups of data that exhibit certain kind of coherency. A classical technique for unsupervised data partitioning is cluster analysis, where objects are sorted into groups in such a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. Cluster analysis has been applied to many classification problems. In (Wu, Liew, & Yan, 2004), clustering is applied to find natural groupings in the data. In (Borland, Hirschberg, & Lye, 2001), clustering is used for data reduction, where a group of similar ...
View more >In unsupervised data mining, one is usually interested in discovering groups of data that exhibit certain kind of coherency. A classical technique for unsupervised data partitioning is cluster analysis, where objects are sorted into groups in such a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. Cluster analysis has been applied to many classification problems. In (Wu, Liew, & Yan, 2004), clustering is applied to find natural groupings in the data. In (Borland, Hirschberg, & Lye, 2001), clustering is used for data reduction, where a group of similar objects is summarized by a representative sample in the group. Recently, clustering has been applied extensively in gene expression data analysis. In gene expression data, the objects along the row dimension correspond to genes or some DNA sequence, and the attributes in the column dimension correspond to cDNA microarray experiments or time point samples. Clustering in the row direction, or gene-wise clustering, has been done, for example, on the Yeast gene expression data and human cell (Spellman, Sherlock, Zhang, et al., 1998; Eisen, Spellman, Brown, & Botstein, 1998), whereas clustering in the column direction, or sample-wise clustering, has been done, for example, on cancer type classification (Golub, Slonim, Tamayo, et al., 1999) (Klein, Tu, Stolovitzky, et al., 2001). However, in many real world data, not all attributes of an object are relevant in grouping the objects into meaningful classes. In many cases, some attributes are relevant to only some of the clusters and different clusters may have different relevant subsets of attributes. By relaxing the constraint that related objects must behave similarly across the entire set of attributes, biclustering considers only a relevant subset of attributes when looking for similarity between objects. In this article, we give an overview of the biclustering problem, discuss some common biclustering algorithms, and highlight some interesting applications of biclustering.
View less >
View more >In unsupervised data mining, one is usually interested in discovering groups of data that exhibit certain kind of coherency. A classical technique for unsupervised data partitioning is cluster analysis, where objects are sorted into groups in such a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. Cluster analysis has been applied to many classification problems. In (Wu, Liew, & Yan, 2004), clustering is applied to find natural groupings in the data. In (Borland, Hirschberg, & Lye, 2001), clustering is used for data reduction, where a group of similar objects is summarized by a representative sample in the group. Recently, clustering has been applied extensively in gene expression data analysis. In gene expression data, the objects along the row dimension correspond to genes or some DNA sequence, and the attributes in the column dimension correspond to cDNA microarray experiments or time point samples. Clustering in the row direction, or gene-wise clustering, has been done, for example, on the Yeast gene expression data and human cell (Spellman, Sherlock, Zhang, et al., 1998; Eisen, Spellman, Brown, & Botstein, 1998), whereas clustering in the column direction, or sample-wise clustering, has been done, for example, on cancer type classification (Golub, Slonim, Tamayo, et al., 1999) (Klein, Tu, Stolovitzky, et al., 2001). However, in many real world data, not all attributes of an object are relevant in grouping the objects into meaningful classes. In many cases, some attributes are relevant to only some of the clusters and different clusters may have different relevant subsets of attributes. By relaxing the constraint that related objects must behave similarly across the entire set of attributes, biclustering considers only a relevant subset of attributes when looking for similarity between objects. In this article, we give an overview of the biclustering problem, discuss some common biclustering algorithms, and highlight some interesting applications of biclustering.
View less >
Book Title
Encyclopaedia of Information Science and Technology
Volume
8
Subject
Pattern Recognition and Data Mining