Gene Expression Data Clustering and Visualization based on a Binary Hierarchical Clustering Framework

View/ Open
Author(s)
Szeto, Lap Keung
Liew, Alan Wee-Chung
Yan, Hong
Tang, Sy-sen
Griffith University Author(s)
Year published
2003
Metadata
Show full item recordAbstract
We describe the use of a binary hierarchical clustering (BHC) framework for clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the K-means algorithm is used to split the data into two classes. Secondly, the Fisher criterion is applied to the classes to assess whether the splitting is acceptable. The algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework ...
View more >We describe the use of a binary hierarchical clustering (BHC) framework for clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the K-means algorithm is used to split the data into two classes. Secondly, the Fisher criterion is applied to the classes to assess whether the splitting is acceptable. The algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation. We show that by arranging the BHC clustered gene expression data in a tree structure, we can easily visualize the cluster results. In addition, the tree structure display allows user judgement in finalizing the clustering result using prior biological knowledge.
View less >
View more >We describe the use of a binary hierarchical clustering (BHC) framework for clustering of gene expression data. The BHC algorithm involves two major steps. Firstly, the K-means algorithm is used to split the data into two classes. Secondly, the Fisher criterion is applied to the classes to assess whether the splitting is acceptable. The algorithm is applied to the sub-classes recursively and ends when all clusters cannot be split any further. BHC does not require the number of clusters to be known. It does not place any assumption about the number of samples in each cluster or the class distribution. The hierarchical framework naturally leads to a tree structure representation. We show that by arranging the BHC clustered gene expression data in a tree structure, we can easily visualize the cluster results. In addition, the tree structure display allows user judgement in finalizing the clustering result using prior biological knowledge.
View less >
Conference Title
Proceedings of the First Asia-Pacific Bioinformatics Conference
Publisher URI
Copyright Statement
© 2003 Australian Computer Society Inc. The attached file is posted here in accordance with the copyright policy of the publisher, for your personal use only. No further distribution permitted.Use hypertext link for access to the conference website.