FPGA for Functional Safety Applications

FPGA for Functional Safety Applications

Digital safety critical applications are becoming more important as we replace traditional mechanical safety mechanisms with intelligent systems with built in diagnostic capabilities. A major benefit of switching to electronic solutions is to embed diagnostic functionality that would never be possible in pure mechanical solutions. When choosing how to develop a safety system the architect must determine if the system shall be based on a software solution, a pure hardware solution – or something in between. FPGAs can be a good tradeoff where you get both the safe and deterministic behavior of hardware, while still supplying more than enough flexibility and computational power to perform advanced operations and built-in self-tests.

Safety Function

During the early stages of a safety project the safety function must be defined. In its simplest form this safety function can be responsible for stopping the system if a dangerous condition occurs. For example, a circuit breaker will cut the power to the electrical grid in a house if an overcurrent is detected. Activating this safety function will bring the house to a de-energized, safe state. In some cases, the safety function is more complex, and the safe state may not be the de-energized state. For example, if an autonomous vehicle detects that a tire is running low on pressure the safest state is surely not always too de-energize itself, as this may result in even more dangerous situations. Safe states that are not de-energized can become complex to implement, which makes FPGAs well suited as a main executor since they can handle complex logic, they have a quick response time and, if done right, they result in deterministic behavior.

Built-in Failure Mitigation

Physical, random failures can occur in any application, also in safety rated FPGA based safety systems. The trick is to detect the failure before the safety function is needed. The ability to detect dangerous failures in a system is actually one of the driving factors which determines the maximum achievable SIL (System Integrity Level) rating of the system. Four categories are used to classify failures: safe undetected, safe detected, dangerous detected and dangerous undetected. In short, the designer should create a system that moves potentially dangerous undetected failures into the safe- or detected categories. There are two typical methods for achieving this goal:

  1. Adding redundancy makes failures safe, and possibly also detectable. Since FPGAs can handle true parallelism, it is possible to add multiple instances of critical modules inside the same FPGA device. To increase reliability, it is even possible to isolate regions of the FPGA to high independence between instances. For even more independence, it is possible to implement one instance of the logic as RTL and the other instance in e.g. a hard or soft CPU. With the Zynq-7000 Xilinx managed to achieve SIL3 on a device, and with newer UltraScale+ devices the certification process has never been simpler as these devices integrate true physical separation between its components with separate power, clocks and interfaces. In addition, the UltraScale+ Zynqs provide LockStep ARM R5 CPUs, ECC protection of all relevant memories and dedicated hardened processors which perform monitoring functions to truly claim On-Chip Redundancy.
  1. Another method of mitigating against failures is to add diagnostic functions internally in the FPGA, which brings dangerous undetected failures into the dangerous detected category. This can be achieved by using CRC checks on memory, read/write tests, test pattern diagnostic etc. Making a dangerous failure detectable may not be as satisfying as making the failure safe, but it is a lot better than having an undetected dangerous failure in the system.


Systematic Development

Even though physical, random failure mitigation is a concept that stands out in functional safety development, in my experience most failures I stumble over are caused by systematic failures. Systematic failures in this context means failures caused by flawed design processes, i.e. bugs. In a perfect world all bugs could be detected during the design phase by either the design, during reviews or during verification or validation. A benefit of FPGA development is that it is generally code based, and hence we can benefit from lessons learned from the vast software community with tools for e.g. code reviews, revision control and linting. Additionally, all major FPGA vendors supply the designers with functional safety toolboxes which are a great help when developing safety applications.  



FPGAs provide a great trade-off between deterministic behavior, quick response, and powerful computational capabilities. They are excellent for handling advanced safe states and with good support from the vendors, digital safety designs are easier to achieve than ever before.