Archive for the ‘real-time’ Category

Taking the guesswork out of timing in real-time software systems

Tuesday, June 9th, 2009

A new technique, static timing analysis, based on a deterministic processor architecture, is described in this article and will be shown to be capable of taking the guesswork out of timing in real-time software systems.

Guaranteeing correct operation of real-time software running on an embedded processor is a significant challenge. Data dependent execution flows, where execution times of many functions are dependent on the data inputs, mean that instruction sequences are very difficult to accurately time.To handle the problem so far, the test bench has been relied upon to adequately exercise the timing corner cases of each individual function. In complex systems this often means developing much of the rest of the application before suitable stimulus can be created. This adds significant effort and delay to certifying software system timing.

New processor architectures, capable of exhibiting deterministic instruction timing, open up interesting possibilities for the future. A new technique, static timing analysis, based on a deterministic processor architecture, is described in this article and will be shown to be capable of taking the guesswork out of timing in real-time software systems.

In short, a static timing tool will analyse object code and determine worst-case timing paths. Paths are analysed between two points interactively or against timing assertions in batch mode to produce a pass or fail result. This information allows the designer to optimise timing critical sections of their code until correct timing closure is achieved.

The article will show how this approach can be successfully used to develop a software implementation of a 100Mbps Ethernet MII interface.

Closing timing in real-time systems

As processor architectures improve in speed and responsiveness, it becomes increasingly attractive to perform functions traditionally implemented in hardware, using software. A simple example such as an IIC master has always been a good candidate for this approach because the master defines the timing. However, an IIC slave must always be ready to respond within a certain time. This imposes timing constraints on the software. Developing interface functions in software thus becomes a real-time programming challenge.

Verifying software functionality is a well practiced task using software test benches. Closing timing may require a significant amount of additional effort on top of authoring the code. The standard approaches to closing timing are limited. Typical approaches include testing in circuit whilst observing pin activity, simulating the software using a cycle accurate simulator or counting the instructions for the path of interest to determine the expected execution time.

All approaches share a common requirement; that is to provide suitable stimulus to fully exercise the software under test ensuring that corner cases are adequately covered. This increases the verification effort due to extending the test bench to include timing. In many cases, obtaining suitable stimuli may not be possible until much later on in the project when the rest of the system is available to generate it. Static timing analysis removes this dependency by formally exercising all paths within the code and allows timing closure of each software function individually.

he Ethernet MII interface is a good example of where it is advantageous to use interface functions developed in software. Replacing the MAC layer with software allows early adoption of new hardware standards and allows custom protocols to be implemented. The timing diagram of the MII interface between the MAC and PHY layers is shown below in figure 1.

Figure 1: Ethernet MII timing diagram.

The source code to manage the MAC / PHY interface is shown below. It is written in XC which has support for direct control of physical pins through port inputs (:>) and outputs (<:). XC also has support for simultaneously managing multiple ports using the select statement. The select statement is like a switch statement done by the hardware. In this case the pins have been mapped to the port_rxd and port_rx_dv ports and the data port (port_rxd) has been configured to convert the stream of data nibbles coming from the PHY into a series of words. The timing pragmas that instruct the static timing analysis tool are also included in the source code.

Before the receiver starts it ensures that the RX_DV signal is low. Then it looks for the start of packet identified by RX_DV going high and the SFD (Start Frame Delimiter) being received. The inner loop simultaneously waits for data words and the RX_DV going low. When RX_DV goes low the code processes any remaining data nibbles and checks for errors before starting to receive the next packet.

The software managing this interface must not only be functionally correct but also timing-safe as failing to meet timing is equally critical. The timing constraints are in the loop receiving data words (T1) and the inter-frame gap (T2 – data valid going low to detecting the next SFD). For 100Mbs Ethernet a word of data is received every 320ns. The inter-frame gap is a minimum of 96 bit times, and the preamble is 56 bit times. Hence the inter-frame code must complete in 1520ns.

The challenge is in guaranteeing the code is timing-safe. There is only one valid path through the function for receiving data, but the inter-frame code has a number of paths to handle different packet sizes and error cases. Each of these must be functionally correct as well as meet timing.

Manually ensuring that all paths meet the constraints is time consuming, especially since it must be repeated every time the code is modified or re-compiled.

The aim of the approach is to automate the task of verifying that software meets timing constraints and to reduce the risk involved with checking timing constraints. Given a deterministic architecture on which to run the software it becomes possible to ensure that worst-case execution time is fast enough to meet the constraints.

The user starts by identifying the path end points in the code which they are timing. The MII source code shows how pragmas have been used to specify these end points. A combination of compiler, simulation and search techniques are used to ensure that all valid paths through a program are evaluated. Only paths explicitly excluded by the user will be ignored, these are known as false paths.

The interactive GUI or console illustrated in figure 2 can be used to visualise the execution paths through the code and identify which are false paths. In the MII code the first timing constraint runs from the word_receive label round the loop back to itself.

Figure 2: Static timing analysis tool GUI image

Two potential paths are found, the tight inner loop and a path which leaves the inner loop and goes through the code waiting for the next SFD. The second timing constraint, the inter-frame gap, runs from the rx_dv_low to wait_for_sfd. The tool finds all possible paths of execution between these two points, including false paths which pass through the word receive loop. With this information the user can create a script to run after every compilation which guarantees that the code meets the timing constraints.

In the case of the MII code the script would look like:

The assertions check the slack or violation for all possible paths of execution which have not been excluded and report the worst-case. For the MII case the output is:

The tool can be used for more than batch pass/fail testing. Code can be analysed using both the GUI and console. Structural code views highlight timing hot-spots. Instruction-level views and traces highlight hardware resource contention. This information is especially useful in allowing the user to concentrate their optimisation efforts where they will have the greatest impact.

In some cases the tool is unable to time code without additional information from the user. Execution flow which is data-dependent, like an unknown loop count, needs the user to specify worst-case values in order to perform timing analysis. These unknowns are highlighted to the user and can be specified through the GUI or console.

The tool works with binary executable files. There are two reasons for this. Firstly, it ensures the accuracy of the timing analysis. Secondly, it makes it language-independent, supporting code generated from any compiled source, be it assembly, C, C++ or XC. The user is able to work at the level of source code wherever debug information is available and can always work at the level of machine instructions.

The case study has shown how a practical real-time software implementation has been proved to be timing-safe through the use of a static timing analysis tool. Whilst this is the main goal of such a tool, the technology opens up further possibilities.

XC includes direct support for timed input/output operations. This means it is possible for the tool to identify these instructions and automatically generate the appropriate timing constraints. For example, the code between two timed inputs can automatically be verified as timing-safe by the tool.

The ability to generate a pass/fail result on timing-safe code enables the toolchain to close the loop. From a suitable report file, the compiler is able to determine where optimisation is needed and apply appropriate heuristics in an iterative manner to help close timing, without any effort from the designer.

The static timing tool identifies all paths through the code, including the worst case. By analysing the data that causes the worst-case timing, it is possible to reverse engineer the stimulus that caused this case. The worst-case test stimulus can then be added to the test bench used at a later stage, such as when testing in hardware. This gives the designer confidence that the test stimulus really does cover the corner cases.

One final interesting possibility is for power optimisation. Power consumption is closely matched to instruction execution, particularly for event driven processors that can enter a low power state when busy waiting. Analysis and optimisation of timing paths between pausing instructions allows the energy consumed per loop iteration to be determined and consequently reduced through optimisation. Saving power is always desirable and has commercial benefits which ripple through the rest of the system.

Designing real-time software capable of performing hardware functions is an attractive prospect however the flow has so far been missing a comprehensive tool to certify timing. With the introduction of static timing analysis, it is now possible to guarantee the execution time of real-time software running on a deterministic processor.

Static timing analysis also brings the advantage that timing closure of individual functions can be achieved well in advance of a full test bench or the rest of the system being available. By using formal analysis of the code, the static timing analyser exhaustively analyses all paths ensuring that no corner cases are missed. Adding timing assertions to the source code using #pragma statement means that the source code not only describes the functionality, but also defines the required timing. This allows the code to not only be portable across suitable architectures, but also timing-safe.

Static timing analysis takes the guesswork out of timing real-time software systems.

Accurate instruction timing predictions are fundamental requirement for a static timing analysis tool. There are many reasons why a sequence of instructions may not always take the same time to execute.

Interrupts by nature alter the execution flow dramatically by forcing a change of the CPU context for an amount of time. Further, all RTOSs have critical sections of code where interrupts are disabled. This means only one task can truly be real-time in a single threaded system and this is likely to be the scheduler in systems utilising an RTOS.

Architectures that include a memory hierarchy and cache memory in particular are known to exhibit less predictable execution times. The previous contents of the cache very strongly influence timing unless cache lines can be locked.

Resource conflict also can also increase execution time. Being denied a hardware resource will naturally block execution and adversely affect timing.

By avoiding these architectural issues, and employing simple round-robin scheduling, event driven, hardware multi-threaded processors provide highly predictable worst case execution timing. Further, the ability to pre-load output ports with a valid time allows execution to continue whilst the port autonomously handles the timed output.

Only data dependent execution flow may impede accurate timing prediction for event driven, hardware multi-threaded processors. A static timing tool is able to determine all possible paths through the software, allowing for the variability due to data dependencies and allowing timing guarantees to be made.