dc.contributor.author | Alneyadi, Sultan | |
dc.contributor.author | Sithirasenan, Elankayer | |
dc.contributor.author | Muthukkumarasamy, Vallipuram | |
dc.contributor.editor | Susilo, W | |
dc.contributor.editor | Mu, Y | |
dc.date.accessioned | 2019-02-15T12:32:05Z | |
dc.date.available | 2019-02-15T12:32:05Z | |
dc.date.issued | 2014 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.doi | 10.1007/978-3-319-08344-5_27 | |
dc.identifier.uri | http://hdl.handle.net/10072/112649 | |
dc.description.abstract | Data leakage prevention (DLP) is an emerging subject in the field of information security. It deals with tools working under a central policy, which analyze networked environments to detect sensitive data, prevent unauthorized access to it and block channels associated with data leak. This requires special data classification capabilities to distinguish between sensitive and normal data. Not only this task needs prior knowledge of the sensitive data, but also requires knowledge of potentially evolved and unknown data. Most current DLPs use content-based analysis in order to detect sensitive data. This mainly involves the use of regular expressions and data fingerprinting. Although these content analysis techniques are robust in detecting known unmodified data, they usually become ineffective if the sensitive data is not known before or largely modified. In this paper we study the effectiveness of using N-gram based statistical analysis, fostered by the use of stem words, in classifying documents according to their topics. The results are promising with an overall classification accuracy of 92%. Also we discuss classification deterioration when the text is exposed to multiple spins that simulate data modification. | |
dc.description.peerreviewed | Yes | |
dc.language | English | |
dc.language.iso | eng | |
dc.publisher | Springer | |
dc.publisher.place | Switzerland | |
dc.relation.ispartofpagefrom | 413 | |
dc.relation.ispartofpageto | 421 | |
dc.relation.ispartofjournal | Lecture Notes in Computer Science | |
dc.relation.ispartofvolume | 8544 | |
dc.subject.fieldofresearch | Other information and computing sciences not elsewhere classified | |
dc.subject.fieldofresearchcode | 469999 | |
dc.title | A Semantics-Aware Classification Approach for Data Leakage Prevention | |
dc.type | Journal article | |
dc.type.description | C1 - Articles | |
dc.type.code | C - Journal Articles | |
gro.faculty | Griffith Sciences, School of Information and Communication Technology | |
gro.hasfulltext | No Full Text | |
gro.griffith.author | Muthukkumarasamy, Vallipuram | |