A Semantics-Aware Classification Approach for Data Leakage Prevention

No Thumbnail Available
File version
Author(s)
Alneyadi, Sultan
Sithirasenan, Elankayer
Muthukkumarasamy, Vallipuram
Griffith University Author(s)
Primary Supervisor
Other Supervisors
Editor(s)

Susilo, W

Mu, Y

Date
2014
Size
File type(s)
Location
License
Abstract

Data leakage prevention (DLP) is an emerging subject in the field of information security. It deals with tools working under a central policy, which analyze networked environments to detect sensitive data, prevent unauthorized access to it and block channels associated with data leak. This requires special data classification capabilities to distinguish between sensitive and normal data. Not only this task needs prior knowledge of the sensitive data, but also requires knowledge of potentially evolved and unknown data. Most current DLPs use content-based analysis in order to detect sensitive data. This mainly involves the use of regular expressions and data fingerprinting. Although these content analysis techniques are robust in detecting known unmodified data, they usually become ineffective if the sensitive data is not known before or largely modified. In this paper we study the effectiveness of using N-gram based statistical analysis, fostered by the use of stem words, in classifying documents according to their topics. The results are promising with an overall classification accuracy of 92%. Also we discuss classification deterioration when the text is exposed to multiple spins that simulate data modification.

Journal Title

Lecture Notes in Computer Science

Conference Title
Book Title
Edition
Volume

8544

Issue
Thesis Type
Degree Program
School
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
Item Access Status
Note
Access the data
Related item(s)
Subject

Other information and computing sciences not elsewhere classified

Persistent link to this record
Citation
Collections