Show simple item record

dc.contributor.authorZhang, Libiao
dc.contributor.authorLi, Yuefeng
dc.contributor.authorXu, Yue
dc.contributor.authorTjondronegoro, Dian
dc.contributor.authorSun, Chao
dc.date.accessioned2020-01-14T23:15:08Z
dc.date.available2020-01-14T23:15:08Z
dc.date.issued2014
dc.identifier.isbn9781479969913
dc.identifier.doi10.1109/DSAA.2014.7058104
dc.identifier.urihttp://hdl.handle.net/10072/390270
dc.description.abstractTraditional text classification technology based on machine learning and data mining techniques has made a big progress. However, it is still a big problem on how to draw an exact decision boundary between relevant and irrelevant objects in binary classification due to much uncertainty produced in the process of the traditional algorithms. The proposed model CTTC (Centroid Training for Text Classification) aims to build an uncertainty boundary to absorb as many indeterminate objects as possible so as to elevate the certainty of the relevant and irrelevant groups through the centroid clustering and training process. The clustering starts from the two training subsets labelled as relevant or irrelevant respectively to create two principal centroid vectors by which all the training samples are further separated into three groups: POS, NEG and BND, with all the indeterminate objects absorbed into the uncertain decision boundary BND. Two pairs of centroid vectors are proposed to be trained and optimized through the subsequent iterative multi-learning process, all of which are proposed to collaboratively help predict the polarities of the incoming objects thereafter. For the assessment of the proposed model, F1 and Accuracy have been chosen as the key evaluation measures. We stress the F1 measure because it can display the overall performance improvement of the final classifier better than Accuracy. A large number of experiments have been completed using the proposed model on the Reuters Corpus Volume 1 (RCV1) which is important standard dataset in the field. The experiment results show that the proposed model has significantly improved the binary text classification performance in both F1 and Accuracy compared with three other influential baseline models.
dc.languageEnglish
dc.publisherIEEE
dc.relation.ispartofconferencename2014 International Conference on Data Science and Advanced Analytics (DSAA)
dc.relation.ispartofconferencetitle2014 International Conference on Data Science and Advanced Analytics (DSAA)
dc.relation.ispartofdatefrom2014-10-30
dc.relation.ispartofdateto2014-11-01
dc.relation.ispartoflocationShanghai, China
dc.subject.fieldofresearchArtificial Intelligence and Image Processing
dc.subject.fieldofresearchcode0801
dc.subject.keywordsScience & Technology
dc.subject.keywordsEngineering, Electrical & Electronic
dc.subject.keywordsText classification
dc.titleCentroid Training to Achieve Effective Text Classification
dc.typeConference output
dc.type.descriptionE2 - Conferences (Non Refereed)
dcterms.bibliographicCitationZhang, L; Li, Y; Xu, Y; Tjondronegoro, D; Sun, C, Centroid Training to Achieve Effective Text Classification, 2014 International Conference on Data Science and Advanced Analytics (DSAA), 2014, pp. 406-412
dc.date.updated2020-01-14T23:11:50Z
dc.description.versionAccepted Manuscript (AM)
gro.rights.copyright© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
gro.hasfulltextFull Text
gro.griffith.authorTjondronegoro, Dian W.


Files in this item

This item appears in the following Collection(s)

  • Conference outputs
    Contains papers delivered by Griffith authors at national and international conferences.

Show simple item record