Correcting Response Failure Errors in Multi-Objective Optimisation in Unreliable Distributed Computing Environments
Abstract
Population-based, multi-objective optimisation algorithms are increasingly making use of distributed, parallel computing environments. In these cases it is a commonsense precaution to consider the possibility of a variety of failures. In particular, errors caused by response failures are more prone to arise than in homogeneous parallel computers. While masking errors using redundant computation is simple and reasonably reliable, it is expensive in terms of the computing resources required. An alternative approach is presented that uses a Byzantine agreement methodology, utlising only results already computed. In computational experiments it has a demonstrated ability to correct errors, and salvage useable results from unreliable, distributed computing environments. With increasing reliance on computing resources provided and operated by external agencies, error detection and correction can be expected to become more important to a range of applications.
Conference Title
First Russia and Pacific Conference on Computer Technology and Applications (RPC2010) Proceedings
Publisher URI
Copyright Statement
© 2010 Academic Alliance International. The attached file is reproduced here in accordance with the copyright policy of the publisher. Please refer to the conference's website for access to the definitive, published version.
Subject
Optimisation
Computer Software not elsewhere classified