Flexible and Fault-Tolerant Communication for Safety Critical Real-Time Systems
File version
Author(s)
Primary Supervisor
Chen, David
Hexel, Rene
Other Supervisors
Editor(s)
Date
Size
File type(s)
Location
License
Abstract
In distributed real-time systems, control is distributed over a number of nodes connected in a network. Some nodes rely on information provided by other nodes as an input to perform their operations. Therefore, distributed real-time systems heavily depend on their communication subsystems, which are responsible for timely and error-free delivery of the information among distributed nodes. A real-time system is required to provide its services within a defined time frame, which means it must meet certain deadlines. Safety-critical real-time systems such as fly-by-wire, drive-by-wire and the like, must not only meet the criteria of timeliness, but also must be fault-tolerant, as missing a deadline or a fault may have catastrophic consequences. At the same time, engineering real-time communication is a complex task, particularly in the safety-critical domain. To make the complexity manageable when designing dependable systems in the safety-critical domain, these systems are traditionally kept simple, rigid, and inflexible. This is increasingly becoming a challenge, as hardware and software are becoming more and more capable, and is used in systems that are becoming more and more autonomous. Examples range from advance driver assistance systems to self-driving cars. This poses a significant challenge for communication protocols that exists in this domain.
The Time-Triggered Architecture (TTA) guarantees timeliness in all circumstances by requiring communication scheduling ahead of time, and hence, it is used to produce more dependable real-time systems. TTP/C and FlexRay are two widely used TTA based bus protocols for safety-critical real-time systems. In this thesis, I have investigated the impact of the flexibility issues of existing Time-Triggered (TT) systems through a number of case studies such as brake-by-wire and an autonomous vehicle. I have demonstrated the lack of flexibility results in poor bandwidth and channel utilisation. Following that, I have analysed the shortcomings of existing protocols in a systematic way. Based on this analysis, I propose a protocol that is suitable for modern, autonomous safety-critical real-time systems and provides the flexibility needed for the complex payload requirements of these systems. My proposed approach uses flexible communication schedules based on the transmission payload requirements of participating nodes not only in a single Time Division Multiple Access (TDMA) round but also over multiple TDMA rounds of a cluster cycle. Multiple operational modes are also supported by allowing each mode to have a different communication schedule as per transmission requirements. As with all safety-critical communication, it is vital to prevent the communication channel from being monopolised by a faulty node. The existing fault-tolerant model to tackle such faults cannot work with the proposed approach due to its flexible communication schedules. I therefore investigate approaches for bus guardians that prevent such monopolisation. I propose a system of guardian nodes that are fully aware of the assigned communication schedules and the current situation by listening to the traffic of the channel. My analysis shows that these guardians will block any faulty node trying to transmit outside its assigned timing window. This thesis presents the formal verification model to verify the timing parameters of participating nodes such as transmitter, receiver, and guardian nodes. To tackle the design and implementations issues, I have used a model-driven engineering approach that utilises a high-level design of verifiable and directly runnable implementations. A subsumption architecture with clear execution semantics is used to implement the more complex system behaviours at a high level, mitigating the complexity of state replication. This subsumption architecture made it possible to incrementally refine the implementation without interfering with unaffected components of the system.
The proposed approach improves the channel utilisation by allowing the slot length of nodes to be configured in accordance with the actual payload requirements of these nodes inside a TDMA round. While this approach is based on the traditional TDMA scheme utilised in the TTA, it significantly improves bandwidth utilisation over the traditional schemes. The analyses performed in this thesis have shown that gross overhead time is reduced by almost 90%, improving overall bandwidth utilisation efficiency almost twofold in a typical automotive brake-by-wire system scenario. Furthermore, channel utilisation is also increased by allowing the slot length of each node to be configured in accordance with its actual transmission payload requirements for each TDMA round of a cluster cycle. This eliminates node slot idle times for all nodes, hence reduces transmission overhead. This flexibility makes it possible to reduce the gross overhead time by almost 99%, improving overall bandwidth utilisation efficiency almost nine times compared to existing TTA-based communication protocols in an autonomous vehicle system case study. Despite the added flexibility, the same level of predictability has been maintained. My approach not only increases flexibility and channel utilisation for safety-critical payload, but also maintains the ability to handle faults in a fail-silent way, at the same level as other TTA-based protocols.
Journal Title
Conference Title
Book Title
Edition
Volume
Issue
Thesis Type
Thesis (PhD Doctorate)
Degree Program
Doctor of Philosophy (PhD)
School
School of Info & Comm Tech
Publisher link
Patent number
Funder(s)
Grant identifier(s)
Rights Statement
Rights Statement
The author owns the copyright in this thesis, unless stated otherwise.
Item Access Status
Note
Access the data
Related item(s)
Subject
Time-Triggered Architecture
autonomous vehicles
safety-critical communication