You are hereFault tolerant VirtuosoNext RTOS for ARM Cortex-M microcontrollers

Fault tolerant VirtuosoNext RTOS for ARM Cortex-M microcontrollers


By eric.verhulst - Posted on 08 August 2017

Printer-friendly version

Altreonic has now ported the latest VirtuosoNext Designer to ARM Cortex M-series microcontrollers. The latest version fully exploits the Memory Protection support to provide fine-grain partitioning and allows fine-grain recovery from processor exceptions in a few microseconds at the Task level. This effectively provides fault tolerance for the applications without the system experiencing any significant delay and without having to apply costly hardware redundancy schemes. At the system level, the resilience level is greatly increased at virtually no cost. The economic advantages are significant.

A fault tolerant RTOS, what does it mean?

An increasing number of embedded applications must be able to provide uninterrupted services. Applications range from highly connected devices (e.g. IoT) to autonomous systems.  As the world is not perfect, hardware or software faults do occur. If they occur, even with a very low probability, they can result in so-called hazardous situations with safety and security risks that ultimately can result in lives being lost or in serious economic costs. Embedded systems applies two main mechanisms to mitigate such risks. The first one is redundancy in time and space. This reduces the risk that the a single fault brings down the whole system. The second mechanism is to prevent the resulting errors from propagating. This requires to isolate the faults in time and space and to use guarding mechanisms, often in software. 

A fault tolerant RTOS allows an application to continue even after a fault has occurred. This in contrast to a standard approach whereby very often the whole system has to be rebooted, a time-consuming and not necessarily practical solution. While a complete reboot might be needed if the fault is unrecoverable at the hardware level, in many cases the fault is transient, software or data related and no complete reboot is needed. 

Fine-grain recovery with the fault-tolerant VirtuosoNext RTOS

The VirtuosoNext RTOS kernel comes in different flavors, selected by the developer when building his application. The standard one combines everything in a single static memory image, which is best for performance, but only allows to recover from specific programming errors like numerical exceptions. Nevertheless, the event driven programming model isolates the Tasks logically from each other so that even programming errors will not necessarily propagate to other Tasks.

However when the program execution is affected by a hardware or software fault data memory can become corrupted or the program could jump to a random address in the program memory, essentially resulting in a system that no longer can be trusted. This can be handled by the fine-grain partitioning in VirtuosoNext option that isolates every small Task from the rest of the memory used by other Tasks. This fine-grain partitioning is a pre-condition for the fine-grain recovery mechanism supported by VirtuosoNext.

In both flavors of the kernel, the kernel will call an abort handling function that allows to clean up the Task’s state before re-initialising it, eventually with the data as it was valid before the fault occurred. 

This is in contrast with the prevailing practice whereby a hypervisor-like layer time-slices different applications (effectively jeopardising hard real-time capabilities) but also isolates complete applications which then have to rebooted and even re-initialised as a whole upon a failure. This practice is often also not practical on systems with embedded microcontrollers.

Fine grain fault tolerance with VirtuosoNext on ARM micro-controllers

While this capability was first developed on a high-end Freescale PowerPC multi-core processor, it is a challenge to provide the capability on typical microcontrollers. While used in much larger numbers, microcontrollers often have less memory, less hardware support for memory protection and run at lower frequencies. On the other hand, the peripherals are often much more closely integrated and being less complex, they execute relatively fast with much lower power consumption. Often, the whole system will be integrated in a single chip SoC.

The ARM Cortex-M series are widely used microcontrollers. Altreonic has ported VirtuosoNext to the ARM Cortex-M3, -M4, -M4F and -M7F as well as to the -R4 series. The ARM series are mostly binary compatible using 16bit instructions in a 32bit memory addressing scheme.

Code size:

The total program memory needed was measured for an “empty” application whereby one by one all services were added. Hence this includes the compiler runtime overhead and measures the real memory useage. The measured code size on an ARM Cortex-M3 starts at a minimum of 9376 bytes with a maximum of 13692 bytes with all services included. Note that the Virtuoso Designer code generators will always remove unused functionality.  The code size is slightly less for the non-partitioning version.

Interrupt latency

Interrupt latency is measured by setting up a periodic timer and recording the time it takes to read the timer value after the interrupt, first in the Interrupt Service Routine and then in a high priority waiting Task. The tests were done on a 50 MHz ARM Cortex-M3. A semaphore loop in the background provides a worst case load scenario (as it continuously switches context and activates the kernel task).

Non-partitioning VirtuosoNext:

  • IRQ to ISR:  620 to 2460 nanoseconds (50% median 700 nanoseconds) 
  • IRQ to Task:  16 to 35 microseconds (50% median 22 microseconds)

VirtuosoNext with partitioning enabled:

  • IRQ to ISR:  620 to 5180 nanoseconds (50% median 700 nanoseconds) 
  • IRQ to Task:  23 to 49 microseconds (50% median 30 microseconds)

Fine-grain fault recovery

A small application was set up that deliberately generates a memory access outside the protected memory region assigned to the Task. This results in a memory violation  exception. The Task saved its status data to a predefined so-called BlackBoard managed by the kernel before the exception and restores the data after recovery.  On the recorded trace full recovery of the Task requires 44 microseconds on a 50 MHz ARM Cortex-M3. As this test is done with tracing enabled, the recovery would have been about 25 microseconds faster without the tracing. This was confirmed using a test set-up with a scope whereby we measured 20.2 microseconds. On a ARM Cortex-M4F @ 120 MHz (with an additional floating point context) this was measured at 13.2 microseconds. For most applications, such a delay will go unnoticed. 

 

If a hard fault exception (normally not recoverable) happens the processor is made to reboot. Rebooting largely depends on the system as it entails not just restarting the processor but also reinitalising the peripherals and maybe running application specific diagnostics. To measure the reboot delay a relatively simple application was taken using an ARM Cortex-M4F board (120 MHz). The measured time delay between the hard fault exception and the time to arrive at the same instruction is about 6.9 milliseconds. On the 50 MHz ARM Cortex-M3 board this was measured at 24.8 milliseconds. For some applications this will be fast enough but the reboot is likely to be noticeable. For applications that have a faster loop time, this can be unacceptable, demonstrating the benefit of the the fine-grain recovery mechanism.

How does VirtuosoNext Designer work?

VirtuosoNext Designer is actually composed of two major components. The first one, Visual Designer allows the developer to specify his application at a high level by defining Tasks and how they interact. Taking into account the target platform, Visual Designer then uses metamodels to generate the bulk of the source code and builds the application. The platform can be a single processor or even a heterogeneous distributed system.  The source code itself becomes the master as the higher levels (graphical) models are regenerated from the source code enforcing a one to one relationship between models and source code.

The second component of Visual Designer is the RTOS kernel and its tightly integrated Board Support Package. The VirtuosoNext kernel was formally developed and uses a dedicated and fast packet switching architecture. Scheduling is preemptive and priority based with support for distributed priority inheritance. The formal development resulted in a very clean and predictable behavior but also in a very small code size, achieving better performance.

This approach has many benefits:

  • Productivity: the developer can fully focus on his application specific code without having to worry about the lower levels details.
  • Portability and scalability: the same source can often be executed on different platforms, mostly by recompiling and rebuilding for another target
  • Performance: the generated code is very small and highly optimised.
  • Safety and security: the RTOS kernel itself was formally developed and its architecture is inherently safe and secure by design.
  • Lower cost: the fine grain fault recovery reduces the need for redundancy at the hardware level.
  • Resilience: an embedded application or system is likely to fail catastrophically.

Conclusion

While hardware and software faults can never fully be eliminated, trustworthy systems engineering will employ mechanisms that reduce the probability of catastrophic systems failure to a minimum. Current practice dictates the use of formal techniques to reduce the residual software errors to a minimum and hardware redundancy with coarse grain partitioning. These techniques are costly and complex and often not cost-efficient when using microcontrollers. The fine-grain partitioning of VirtuosoNext allows fine-grain fault recovery in real-time on high-end processors as well as on modern microcontrollers like the ARM Cortex M-series. This capability can drastically increase the operational availability and resilience of embedded systems while drastically reducing the cost-price. Given the rapid proliferation of networked embedded systems and advanced autonomous systems that can’t tolerate expensive recovery times, this has an important economic impact.  Altreonic is using VirtuosoNext Designer in its own range of KURT e-vehicles.

Contact:

Eric Verhulst

eric.verhulst@altreonic.com

www.altreonic.com

 

 

Search

Syndicate

Syndicate content