A Semantics-Aware Classification Approach for Data Leakage Prevention
File version
Author(s)
Sithirasenan, Elankayer
Muthukkumarasamy, Vallipuram
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)
Susilo, W
Mu, Y
Date
Size
File type(s)
Location
License
Abstract
Data leakage prevention (DLP) is an emerging subject in the field of information security. It deals with tools working under a central policy, which analyze networked environments to detect sensitive data, prevent unauthorized access to it and block channels associated with data leak. This requires special data classification capabilities to distinguish between sensitive and normal data. Not only this task needs prior knowledge of the sensitive data, but also requires knowledge of potentially evolved and unknown data. Most current DLPs use content-based analysis in order to detect sensitive data. This mainly involves the use of regular expressions and data fingerprinting. Although these content analysis techniques are robust in detecting known unmodified data, they usually become ineffective if the sensitive data is not known before or largely modified. In this paper we study the effectiveness of using N-gram based statistical analysis, fostered by the use of stem words, in classifying documents according to their topics. The results are promising with an overall classification accuracy of 92%. Also we discuss classification deterioration when the text is exposed to multiple spins that simulate data modification.
Journal Title
Lecture Notes in Computer Science
Conference Title
Book Title
Edition
Volume
8544
Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject
Other information and computing sciences not elsewhere classified