Learning in Combinatorial Constraint Optimisation
Author(s)
Primary Supervisor
Sattar, Abdul
Other Supervisors
Newton, Muhammad A
Sanderson, Conrad
Year published
2022-10-18
Metadata
Show full item recordAbstract
Many real-world problems can be modelled as constraint optimisation problems (COPs). Each COP includes a set of variables with domains of values, constraints on the assignments to the variables, and an objective function, which should be minimised or maximised. In this thesis, we consider only combinatorial COPs, where domains of the variables are discrete. A component is a subproblem of a COP with specific variables, assignable values or constraints. Most practical COPs, including waste collection, mail delivery, supply chain management, and travelling thief problem (TTP), have more than one component. Existing methods for ...
View more >Many real-world problems can be modelled as constraint optimisation problems (COPs). Each COP includes a set of variables with domains of values, constraints on the assignments to the variables, and an objective function, which should be minimised or maximised. In this thesis, we consider only combinatorial COPs, where domains of the variables are discrete. A component is a subproblem of a COP with specific variables, assignable values or constraints. Most practical COPs, including waste collection, mail delivery, supply chain management, and travelling thief problem (TTP), have more than one component. Existing methods for solving COPs, especially multi-component COPs, repeatedly solve the same problem or subproblem but do not take advantage of learning during the search. This research aimed to apply memorising and online or adaptive machine learning models. The memory buffers and the ML models are built, deployed, and updated during the search to improve search efficacy and efficiency in solving COPs, especially multi-component ones. In this research, we have developed a history memorising method to enhance diversity and effectiveness in solving COPs. Also, we have developed three online machine learning-based methods, one coordination learning for improving efficacy and two surrogate models for enhancing the efficiency of TTP solving. Our proposed solver, CoCo, is currently the state-of-the-art solver for solving TTP. History memorising is an online low-level learning method to keep previously visited solutions or their objective values to avoid or escape from local optima during the search. The Late Acceptance Hill Climbing (LAHC) is a history memorising metaheuristic with promising performance on some COP domains. It aims to overcome the main downside of the traditional Hill Climbing (HC) search, which is often quickly trapped in a local optimum due to strictly accepting only non-worsening moves within each iteration. In contrast, LAHC also accepts worsening moves by keeping the objective values of the previously visited solutions in a limited-size circular memory. It compares the fitness values of candidate solutions against the least recent element in the circular memory to decide on accepting or rejecting them. However, we have realised that whenever all values in memory become the same, LAHC behaves like HC and gets stuck in local optima. We propose an improved form of LAHC called Diversified Late Acceptance Search (DLAS) for solving COPs in general, which usually uses much smaller memory, converges much faster than LAHC and escapes local optima much better than LAHC. The proposed DLAS approach outperforms LAHC on benchmark sets of Travelling Salesman Problem (TSP) and Quadratic Assignment Problem (QAP) instances. TTP is an academic proxy for the waste collection and mail delivery real-world optimisation problems composed of TSP and Knapsack Problem (KP). In TTP, a thief makes a cyclic tour through a set of cities while collecting profitable items scattered over the cities into a rented capacitated knapsack. As the weight of the knapsack increases, the thief’s speed decreases; hence the renting cost increases. Solving TTP aims to maximise profit while minimising the renting cost simultaneously, which means maximising the difference between profit and renting cost. Existing TTP solvers typically employ interleaving and solve one component at a time while keeping the solution of the other component unchanged. This form of interleaving essentially means poor coordination in solving TTP. In this thesis, we first show that a simple local search based coordination approach does not work in TTP. Then, to adequately address interdependence between TSP and KP components, we propose a human-designed coordination heuristic that adjusts collection plans during the exploration of cyclic tours. We further propose another human-designed coordination heuristic that explicitly exploits the cyclic tours in selecting items during collection plan exploration. Lastly, we propose an online machine learning-based coordination heuristic that captures the characteristics of the two human-designed coordination heuristics while solving any TTP instance. Our proposed coordination-based approaches help our TTP solver, cooperative coordination (CoCo), significantly outperform existing state-of-the-art TTP solvers on a set of benchmark TTP instances. Our proposed CoCo solver modifies a TTP instance’s underlying TSP and KP solutions in an iterative interleaved fashion. The TSP solution as a cyclic tour is typically changed in a deterministic way using the steepest-ascent Hill-Climbing (HC) search similar to other cooperative solvers. In contrast, changes to the KP solution typically involve a random HC search, effectively resulting in a quasi-meandering exploration of the TTP solution space. Once CoCo reaches a plateau, it restarts the iterative search of the TTP solution space by using a new initial cyclic tour. We have noticed that the final objective value remains almost the same if the same or similar initial cyclic tour is tried several times by CoCo or the other cooperative TTP solvers. Considering this semideterministic nature of the state-of-the-art cooperative TTP solvers, we propose two adaptive and online surrogate models to filter out non-promising initial cyclic tours to improve search efficiency. These surrogate models are automatically built, updated and deployed while solving any TTP instance. The first model is a Support Vector Regression (SVR)-based black-box model, and the second is a K Nearest Neighbour (KNN)-based white-box simulation model. Both models help to filter out non-promising initial cyclic tours while losing a small number of the cyclic tours leading to the best overall solutions. However, the KNN-based white-box simulation model is more accurate and efficient.
View less >
View more >Many real-world problems can be modelled as constraint optimisation problems (COPs). Each COP includes a set of variables with domains of values, constraints on the assignments to the variables, and an objective function, which should be minimised or maximised. In this thesis, we consider only combinatorial COPs, where domains of the variables are discrete. A component is a subproblem of a COP with specific variables, assignable values or constraints. Most practical COPs, including waste collection, mail delivery, supply chain management, and travelling thief problem (TTP), have more than one component. Existing methods for solving COPs, especially multi-component COPs, repeatedly solve the same problem or subproblem but do not take advantage of learning during the search. This research aimed to apply memorising and online or adaptive machine learning models. The memory buffers and the ML models are built, deployed, and updated during the search to improve search efficacy and efficiency in solving COPs, especially multi-component ones. In this research, we have developed a history memorising method to enhance diversity and effectiveness in solving COPs. Also, we have developed three online machine learning-based methods, one coordination learning for improving efficacy and two surrogate models for enhancing the efficiency of TTP solving. Our proposed solver, CoCo, is currently the state-of-the-art solver for solving TTP. History memorising is an online low-level learning method to keep previously visited solutions or their objective values to avoid or escape from local optima during the search. The Late Acceptance Hill Climbing (LAHC) is a history memorising metaheuristic with promising performance on some COP domains. It aims to overcome the main downside of the traditional Hill Climbing (HC) search, which is often quickly trapped in a local optimum due to strictly accepting only non-worsening moves within each iteration. In contrast, LAHC also accepts worsening moves by keeping the objective values of the previously visited solutions in a limited-size circular memory. It compares the fitness values of candidate solutions against the least recent element in the circular memory to decide on accepting or rejecting them. However, we have realised that whenever all values in memory become the same, LAHC behaves like HC and gets stuck in local optima. We propose an improved form of LAHC called Diversified Late Acceptance Search (DLAS) for solving COPs in general, which usually uses much smaller memory, converges much faster than LAHC and escapes local optima much better than LAHC. The proposed DLAS approach outperforms LAHC on benchmark sets of Travelling Salesman Problem (TSP) and Quadratic Assignment Problem (QAP) instances. TTP is an academic proxy for the waste collection and mail delivery real-world optimisation problems composed of TSP and Knapsack Problem (KP). In TTP, a thief makes a cyclic tour through a set of cities while collecting profitable items scattered over the cities into a rented capacitated knapsack. As the weight of the knapsack increases, the thief’s speed decreases; hence the renting cost increases. Solving TTP aims to maximise profit while minimising the renting cost simultaneously, which means maximising the difference between profit and renting cost. Existing TTP solvers typically employ interleaving and solve one component at a time while keeping the solution of the other component unchanged. This form of interleaving essentially means poor coordination in solving TTP. In this thesis, we first show that a simple local search based coordination approach does not work in TTP. Then, to adequately address interdependence between TSP and KP components, we propose a human-designed coordination heuristic that adjusts collection plans during the exploration of cyclic tours. We further propose another human-designed coordination heuristic that explicitly exploits the cyclic tours in selecting items during collection plan exploration. Lastly, we propose an online machine learning-based coordination heuristic that captures the characteristics of the two human-designed coordination heuristics while solving any TTP instance. Our proposed coordination-based approaches help our TTP solver, cooperative coordination (CoCo), significantly outperform existing state-of-the-art TTP solvers on a set of benchmark TTP instances. Our proposed CoCo solver modifies a TTP instance’s underlying TSP and KP solutions in an iterative interleaved fashion. The TSP solution as a cyclic tour is typically changed in a deterministic way using the steepest-ascent Hill-Climbing (HC) search similar to other cooperative solvers. In contrast, changes to the KP solution typically involve a random HC search, effectively resulting in a quasi-meandering exploration of the TTP solution space. Once CoCo reaches a plateau, it restarts the iterative search of the TTP solution space by using a new initial cyclic tour. We have noticed that the final objective value remains almost the same if the same or similar initial cyclic tour is tried several times by CoCo or the other cooperative TTP solvers. Considering this semideterministic nature of the state-of-the-art cooperative TTP solvers, we propose two adaptive and online surrogate models to filter out non-promising initial cyclic tours to improve search efficiency. These surrogate models are automatically built, updated and deployed while solving any TTP instance. The first model is a Support Vector Regression (SVR)-based black-box model, and the second is a K Nearest Neighbour (KNN)-based white-box simulation model. Both models help to filter out non-promising initial cyclic tours while losing a small number of the cyclic tours leading to the best overall solutions. However, the KNN-based white-box simulation model is more accurate and efficient.
View less >
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
School of Info & Comm Tech
Copyright Statement
The author owns the copyright in this thesis, unless stated otherwise.
Subject
constraint optimisation problems (COPs)
travelling thief problem (TTP)
Late Acceptance Hill Climbing (LAHC)
Hill Climbing (HC)