Show simple item record

dc.contributor.advisorEstivill-Castro, Vladimir
dc.contributor.authorLombardi, Matteo
dc.date.accessioned2019-06-06T22:29:20Z
dc.date.available2019-06-06T22:29:20Z
dc.date.issued2018-10
dc.identifier.doi10.25904/1912/1498
dc.identifier.urihttp://hdl.handle.net/10072/385189
dc.description.abstractThe increasing trend of sharing educational resources on the World Wide Web has attracted several contributions from the research community. Since most Technology Enhanced Learning users retrieve resources from the Web for teaching or learning, it is clear that the Web is a source of educational material. Therefore, it should be possible to use the Web as a repository for teaching resources. Regarding the retrieval of online resources, a big issue is that the Web is a huge and mostly unorganised space. Hence, there is no guarantee that items retrieved by current search engines are appropriate for educational uses. Automatically identifying Web-content suitable and usable for education is one of the most challenging objectives because it requires extraordinary attention. Indeed, an inappropriate recommendation in such eld may result in reduced learning outcomes by students in assignments and exams or, even worse, in teachers building their courses on incorrect or incomplete foundations. Studies in Information Retrieval and Technology Enhanced Learning have proposed several solutions to support the teaching and learning needs of instructors and pupils within an enclosed platform. Other studies o er di erent techniques for collecting Web resources that have speci c characteristics. However, to the best of our knowledge, none of the current proposals in the state-of-the-art has paid attention to gathering Web resources that can be used for learning or teaching, without any restriction on topic or terminology. Personalisation also improved Web-search by identifying what topics users prefer, and some progress has been achieved in deducing the purpose of the search (e.g., the user is about to book a trip) for tailored advertising; however, this is a very di erent use of recommendation. Instead, we focus here on identifying documents with a purpose in the sense of being of value for a learning objective. This contribution is built on the rationale that the classi cation of textual materials and natural language processing are strictly related. Thus, we propose to involve natural language processing methods to analyse the content of Web-pages suitable for inclusion in teaching and learning environments. In the eld of the Semantic Web, it is common to apply Information Retrieval from classi ed online pages. The rapid expansion of the Web creates an ever-increasing demand for faster and yet reliable ltering of Web-pages, according to the information needs of users and aiming to eliminate displaying irrelevant and harmful content. The accuracy of the classi cation is not the only di culty when applying Information Retrieval techniques on the sheer volume of documents hosted on the World Wide Web. Accessing the most valuable data as quick as possible raises further research questions about the trade-o in accuracy versus the computational time required by a Webpage classi er. Another characteristic of Web-pages is the multitude of traits (features to be used as independent variables) that may be used for their description. The number of attributes has a signi cant impact on the velocity of the classi er. Therefore, managing a broad set of features is not desirable, because it brings up the issues associated with the curse of dimensionality. Well-cited studies from researchers in Information Retrieval and Knowledge Management focus on handling the typically large number of features of items and examine the balance between reliability and speed. There are a variety of methods that can be applied to most of the existing classi cation problems for reducing the feature space, namely feature-selection and feature-reduction algorithms. However, an improper feature selection may complicate even more the performance in real-time classi cation, now an essential aspect in many Webbased applications. For crawlingWeb-pages tailored to pedagogical purposes, we rmly believe it is fundamental to identify which online resources could be potentially useful for teaching and learning. Our primary motivation is to improve the support o ered by Technology Enhanced Learning systems to learners and educators during their educational tasks, providing straightforward access to a huge dataset of potential educational resources extracted from the Web. We propose a technique for deducing educational semantic information about potential educational resources on the Web by analysing their content and structure, e.g., page title, body, links, and highlights. Then, the Dandelion API, a tool for extracting semantic entities from a text, is used for analysing the textual content of each section. We propose to use a framework introduced in a previous contribution for performing Feature Selection, where several state-of-the-art algorithms are grouped in an ensemble. Such an ensemble of algorithms has the purpose of combining the many di erent aspects analysed by each of the methods. The outcomes of the algorithms are combined into a score that represents the importance of every single feature. Such scoring process allows producing a feature ranking. As a result, the framework enables the reduction of the features set to only a few comprehensive attributes. We incorporate semantic technologies when processing natural language to elicit more than 100 features computed directly from the text of Web-resources. After that, we analyse our features to discover which of these become attributes that permit a clear distinction between resources suitable for education and those not suitable. The resulting features set is evaluated performing a binary classi cation of items in our dataset of more than 2,300 Web-pages obtained from the SeminarsOnly website (http://www.seminarsonly.com), and other sources identi ed as relevant for teaching by surveying human instructors. We built such a dataset labelling the aforementioned educational Web-pages as \relevant for education". Then, we labelled as \non-relevant for education" pages crawled from the former DMOZ Web directory, currently known as Curlie (https://curlie.org), for a total of more than 5,600 labelled Web-pages. Our evaluation covers learning with several representatives of the state-of-the-art of classi cation algorithms. We then apply Student's t-test to strengthen the validity of the features set deduced in this study. The t-test con rms that all the features are essential for achieving the best accuracy in our ltering task when using any of the classi ers. Then, the framework is evaluated in a ltering task performed on the same dataset, comparing our proposal on both accuracy and speed against popular algorithms for feature selection and feature reduction. In both aspects, our framework outperforms current feature reduction algorithms, achieving more accurate and faster classi cation of Web-pages in several scenarios. So, we can declare our framework suitable to be used in a purpose-driven crawling task. Smart systems in Technology Enhanced Learning can use our proposal for retrieving an enormous amount of resources and information ready to be used for educational purposes. For example, recommender systems in Technology Enhanced Learning would bene t from the result of this study for suggesting educational resources for both building and improving courses, signi cantly enhancing the support provided to teachers and students.
dc.languageEnglish
dc.language.isoen
dc.publisherGriffith University
dc.publisher.placeBrisbane
dc.subject.keywordsEducational resources
dc.subject.keywordsWorld wide web
dc.subject.keywordsInformation retrieval
dc.subject.keywordsAlgorithms
dc.titleDiscovering Educational Resources on the Web for Technology Enhanced Learning Applications
dc.typeGriffith thesis
gro.facultyScience, Environment, Engineering and Technology
gro.rights.copyrightThe author owns the copyright in this thesis, unless stated otherwise.
gro.hasfulltextFull Text
dc.contributor.otheradvisorVenema, Sven
dc.contributor.otheradvisorTorrisi, Rosaria
dc.contributor.otheradvisorLimongelli, Carla
gro.thesis.degreelevelThesis (PhD Doctorate)
gro.thesis.degreeprogramDoctor of Philosophy (PhD)
gro.departmentSchool of Info & Comm Tech
gro.griffith.authorLombardi, Matteo


Files in this item

This item appears in the following Collection(s)

Show simple item record