• myGriffith
    • Staff portal
    • Contact Us⌄
      • Future student enquiries 1800 677 728
      • Current student enquiries 1800 154 055
      • International enquiries +61 7 3735 6425
      • General enquiries 07 3735 7111
      • Online enquiries
      • Staff phonebook
    View Item 
    •   Home
    • Griffith Theses
    • Theses - Higher Degree by Research
    • View Item
    • Home
    • Griffith Theses
    • Theses - Higher Degree by Research
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

  • All of Griffith Research Online
    • Communities & Collections
    • Authors
    • By Issue Date
    • Titles
  • This Collection
    • Authors
    • By Issue Date
    • Titles
  • Statistics

  • Most Popular Items
  • Statistics by Country
  • Most Popular Authors
  • Support

  • Contact us
  • FAQs
  • Admin login

  • Login
  • Optimal and Robust Rule Set Generation

    Thumbnail
    View/Open
    02Whole.pdf (614.1Kb)
    Author(s)
    Li, Jiuyong
    Primary Supervisor
    Shen, Hong
    Topor, Rodney
    Other Supervisors
    Webb, Geoffrey
    Year published
    2002
    Metadata
    Show full item record
    Abstract
    The rapidly growing volume and complexity of modern databases makes the need for technologies to describe and summarise the information they contain increasingly important. Data mining is a process of extracting implicit, previously unknown and potentially useful patterns and relationships from data, and is widely used in industry and business applications. Rules characterise relationships among patterns in databases, and rule mining is one of the central tasks in data mining. There are fundamentally two categories of rules, namely association rules and classification rules. Traditionally, association rules are connected ...
    View more >
    The rapidly growing volume and complexity of modern databases makes the need for technologies to describe and summarise the information they contain increasingly important. Data mining is a process of extracting implicit, previously unknown and potentially useful patterns and relationships from data, and is widely used in industry and business applications. Rules characterise relationships among patterns in databases, and rule mining is one of the central tasks in data mining. There are fundamentally two categories of rules, namely association rules and classification rules. Traditionally, association rules are connected with transaction databases for market basket problems and classification rules are associated with relational databases for predictions. In this thesis, we will mainly focus on the use of association rules for predictions. An optimal rule set is a rule set that satisfies given optimality criteria. In this thesis we study two types of optimal rule sets, the informative association rule set and the optimal class association rule set, where the informative association rule set is used for market basket predictions and the class association rule set is used for the classification. A robust classification rule set is a rule set that is capable of providing more correct predictions than a traditional classification rule set on incomplete test data. Mining transaction databases for association rules usually generates a large number of rules, most of which are unnecessary when used for subsequent prediction. We define a rule set for a given transaction database that is significantly smaller than an association rule set but makes the same predictions as the complete association rule set. We call this rule set the informative rule set. The informative rule set is not constrained to particular target items; and it is smaller than the non-redundant association rule set. We characterise the relationships between the informative rule set and the non-redundant association rule set. We present an algorithm to directly generate the informative rule set without generating all frequent itemsets first, and that accesses databases less often than other direct methods. We show experimentally that the informative rule set is much smaller than both the association rule set and the non-redundant association rule set for a given database, and that it can be generated more efficiently. In addition, we discuss a new unsupervised discretization method to deal with numerical attributes in general association rule mining without target specification. Based on the analysis of the strengths and weaknesses of two commonly used unsupervised numerical attribute discretization methods, we present an adaptive numerical attribute merging algorithm that is shown better than both methods in general association rule mining. Relational databases are usually denser than transaction databases, so mining on them for class association rules, which is a set of association rules whose consequences are classes, may be difficult due to the combinatorial explosion. Based on the analysis of the prediction mechanism, we define an optimal class association rule set to be a subset of the complete class association rule set containing all potentially predictive rules. Using this rule set instead of the complete class association rule set we can avoid redundant computation that would otherwise be required for mining predictive association rules and hence improve the efficiency of the mining process significantly. We present an efficient algorithm for mining optimal class association rule sets using upward closure properties to prune weak rules before they are actually generated. We show theoretically the efficiency of the proposed algorithm will be greater than Apriori on dense databases, and confirm experimentally that it generates an optimal class association rule set, which is very much smaller than a complete class association rule set, in significantly less time than generating the complete class association rule set by Apriori. Traditional classification rule sets perform badly on test data that are not as complete as the training data. We study the problem of discovering more robust rule sets, i.e. we say a rule is more robust than another rule set if it is able to make more accurate predictions on test data with missing attribute values. We reveal a hierarchy of k-optimal rule sets where a k-optimal rule set with a large k is more robust, and they are more robust than a traditional classification rule set. We introduce two methods to find k-optimal rule sets, i.e. an optimal association rule mining approach and a heuristic approximate approach. We show experimentally that a k-optimal rule set generated from the optimal association rule mining approach performs better than that from the heuristic approximate approach and both rule sets perform significantly better than a typical classification rule set (C4.5Rules) on incomplete test data. Finally, we summarise the work discussed in this thesis, and suggest some future research directions.
    View less >
    Thesis Type
    Thesis (PhD Doctorate)
    Degree Program
    Doctor of Philosophy (PhD)
    School
    School of Computing and Information Technology
    DOI
    https://doi.org/10.25904/1912/2820
    Copyright Statement
    The author owns the copyright in this thesis, unless stated otherwise.
    Subject
    Data mining
    Association rule
    Classification rule
    Database management
    Publication URI
    http://hdl.handle.net/10072/366394
    Collection
    • Theses - Higher Degree by Research

    Footer

    Disclaimer

    • Privacy policy
    • Copyright matters
    • CRICOS Provider - 00233E
    • TEQSA: PRV12076

    Tagline

    • Gold Coast
    • Logan
    • Brisbane - Queensland, Australia
    First Peoples of Australia
    • Aboriginal
    • Torres Strait Islander