Scalability in Formal Concept Analysis

No Thumbnail Available
File version
Cole, Richard
Eklund, Peter
Griffith University Author(s)
Primary Supervisor
Other Supervisors
File type(s)

Formal Concept Analysis is a symbolic learning technique derived from mathematical algebra and order theory. The technique has been applied to a broad range of knowledge representation and exploration tasks in a number of domains. Most recorded applications of Formal Concept Analysis deal with a small number of objects and attributes, in which case the complexity of the algorithms used for indexing and retrieving data is not a significant issue. However, when Formal Concept Analysis is applied to exploration of a large numbers of objects and attributes, the size of the data makes issues of complexity and scalability crucial.

This paper presents the results of experiments carried out with a set of 4,000 medical discharge summaries in which were recognized 1,962 attributes from the Unified Medical Language System (UMLS). In this domain, the objects are medical documents (4,000) and the attributes are UMLS terms extracted from the documents (1,962). When Formal Concept Analysis is used to iteratively analyze and visualize these data, complexity and scalability become critically important.

Although the amount of data used in this experiment is small compared with the size of primary memory in modern computers, the results are still important because the probability distributions that determine the efficiencies are likely to remain stable as the size of the data is increased.

Our work presents two outcomes. First, we present a methodology for exploring knowledge in text documents using Formal Concept Analysis by employing conceptual scales created as the result of direct manipulation of a line diagram. The conceptual scales lead to small derived purified contexts that are represented using nested line diagrams. Second, we present an algorithm for the fast determination of purified contexts from compressed representation of the large formal context. Our work draws on existing encoding and compression techniques to show how rudimentary data analysis can lead to substantial efficiency improvements in knowledge visualization.

Journal Title
Computational Intelligence
Conference Title
Book Title
Thesis Type
Degree Program
Publisher link
Patent number
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Access the data
Related item(s)
Artificial Intelligence and Image Processing
Computation Theory and Mathematics
Information Systems
Persistent link to this record