• myGriffith
    • Staff portal
    • Contact Us⌄
      • Future student enquiries 1800 677 728
      • Current student enquiries 1800 154 055
      • International enquiries +61 7 3735 6425
      • General enquiries 07 3735 7111
      • Online enquiries
      • Staff phonebook
    View Item 
    •   Home
    • Griffith Theses
    • Theses - Higher Degree by Research
    • View Item
    • Home
    • Griffith Theses
    • Theses - Higher Degree by Research
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

  • All of Griffith Research Online
    • Communities & Collections
    • Authors
    • By Issue Date
    • Titles
  • This Collection
    • Authors
    • By Issue Date
    • Titles
  • Statistics

  • Most Popular Items
  • Statistics by Country
  • Most Popular Authors
  • Support

  • Contact us
  • FAQs
  • Admin login

  • Login
  • Protocols and Data Structures for Knowledge Discovery on Distributed Private Databases

    Thumbnail
    View/Open
    Amirbekyan_2007_02Thesis.pdf (810.8Kb)
    Author(s)
    Amirbekyan, Artak
    Primary Supervisor
    Estivill-Castro, Vladimir
    Other Supervisors
    Topor, Rodney
    Year published
    2007
    Metadata
    Show full item record
    Abstract
    Data mining has developed many techniques for automatic analysis of today’s rapidly collected data. Yahoo collects 12 TB daily of query logs and this is a quarter of what Google collects. For many important problems, the data is actually collected in distributed format by different institutions and organisations, and it can relate to businesses and individuals. The accuracy of knowledge that data mining brings for decision making depends on considering the collective datasets that describe a phenomenon. But privacy, confidentiality and trust emerge as major issues in the analysis of partitioned datasets among competitors, ...
    View more >
    Data mining has developed many techniques for automatic analysis of today’s rapidly collected data. Yahoo collects 12 TB daily of query logs and this is a quarter of what Google collects. For many important problems, the data is actually collected in distributed format by different institutions and organisations, and it can relate to businesses and individuals. The accuracy of knowledge that data mining brings for decision making depends on considering the collective datasets that describe a phenomenon. But privacy, confidentiality and trust emerge as major issues in the analysis of partitioned datasets among competitors, governments and other data holders that have conflicts of interest. Managing privacy is of the utmost importance in the emergent applications of data mining. For example, data mining has been identified as one of the most useful tools for the global collective fight on terror and crime [80]. Parties holding partitions of the database are very interested in the results, but may not trust the others with their data, or may be reluctant to release their data freely without some assurances regarding privacy. Data mining technology that reveals patterns in large databases could compromise the information that an individual or an organisation regards as private. The aim is to find the right balance between maximising analysis results (that are useful for each party) and keeping the inferences that disclose private information about organisation or individuals at a minimum. We address two core data analysis tasks, namely clustering and regression. For these to be solvable in the privacy context, we focus on the protocol’s efficiency and practicality. Because associative queries are central to clustering (and to many other data mining tasks), we provide protocols for privacy-preserving knear neighbour (k-NN) queries. Our methods improve previous methods for k-NN queries in privacy-preserving data-mining (which are based on Fagin’s A0 algorithm) because we do leak at least an order of magnitude less candidates and we achieve logarithmic performance on average. The foundations of our methods for k-NN queries are two pillars, firstly data structures and secondly, metrics. This thesis provides protocols for privacy-preserving computation of various common metrics and for construction of necessary data structures. We present here new algorithms for secure-multiparty-computation of some basic operations (like a new solution for Yao’s comparison problem and new protocols to perform linear algebra, in particular the scalar product). These algorithms will be used for the construction of protocols for different metrics (we provide protocols for all Minkowski metrics, the cosine metrics and the chessboard metric) and for performing associative queries in the privacy context. In order to be efficient, our protocols for associative queries are supported by specific data structures. Thus, we present the construction of privacy-preserving data structures like R-Trees [42, 7], KD-Trees [8, 53, 33] and the SASH [8, 60]. We demonstrate the use of all these tools, and we provide a new version of the well known clustering algorithm DBSCAN [42, 7]. This new version is now suitable for applications that demand privacy. Similarly, we apply our machinery and provide new multi-linear regression protocols that are now suitable for privacy applications. Our algorithms are more efficient than earlier methods and protocols. In particular, the cost associated with ensuring privacy provides only a linear-cost overhead for most of the protocols presented here. That is, our methods are essentially as costly as concentrating all the data in one site, performing the data-mining task, and disregarding privacy. However, in some cases we make use of a third-trusted party. This is not a problem when more than two parties are involved, since there is always one party that can act as the third.
    View less >
    Thesis Type
    Thesis (PhD Doctorate)
    Degree Program
    Doctor of Philosophy (PhD)
    School
    School of Information and Communication Technology
    DOI
    https://doi.org/10.25904/1912/3194
    Copyright Statement
    The author owns the copyright in this thesis, unless stated otherwise.
    Item Access Status
    Public
    Subject
    Data mining
    Data analysis tasks
    Secure-multiparty-computation
    Computer protocols
    Knowledge discovery
    Publication URI
    http://hdl.handle.net/10072/367447
    Collection
    • Theses - Higher Degree by Research

    Footer

    Disclaimer

    • Privacy policy
    • Copyright matters
    • CRICOS Provider - 00233E

    Tagline

    • Gold Coast
    • Logan
    • Brisbane - Queensland, Australia
    First Peoples of Australia
    • Aboriginal
    • Torres Strait Islander