Static analysis with Astrée

The following examples demonstrate typical difficulties in statically analyzing control/command programs. For illustrative purposes, the code snippets have been deliberately constructed to be simple. One of the main challenges in implementing a sound analysis is, of course, to ensure that it scales up to large real-world applications. Astrée takes care of that for you.

Boolean variables

Control/command programs, in particular synchronous ones, manipulate thousands of boolean variables. Analyzing which program runtime properties hold when each boolean variable is either true or false rapidly leads to a combinatorial explosion of the number of cases to be considered.

Sophisticated algorithms enable Astrée to handle thousands of boolean variables with just enough precision to avoid both false alarms and combinatorial explosion.

Here’s a simple example:

/* boolean.c */
typedef enum {FALSE = 0, TRUE = 1} BOOLEAN;
BOOLEAN B;
void main () {
  unsigned int X, Y;
  while (1) {
    /* ... */
    B = (X == 0);
    /* ... */
    if (!B) {
      Y = 1 / X;
    };
    /* ... */
  };
}

The analysis of the above code by Astrée yields no warnings, thanks to the automatically de­termined relationship between B and X. Integer divide-by-zero can never happen when executing this program.

Floating-point computations

Command programs controlling complex physical systems are derived from mathematical mod­els designed with real numbers whereas computer programs perform floating-point computations. The two computation models are completely different, which can yield very surprising results.

Consider the following two programs:

/* float-error.c */
int main () {
    float x, y, z, r;

    x = 1.000000019e+38;
    y = x + 1.0e21;
    z = x - 1.0e21;
    r = y - z;
    printf("%f\n", r);
}
% gcc float-error.c
% ./a.out
0.000000
/* double-error.c */
int main () {
  double x; float y, z, r;
  /* x = ldexp(1.,50)+ldexp(1.,26); */
  x = 1125899973951488.0;
  y = x + 1;
  z = x - 1;
  r = y - z;
  printf("%f\n", r);
}
% gcc double-error.c
% ./a.out
134217728.000000

One would expect these programs to print 2.0e21 and 2.0, respectively, based on the reasoning that (x+a)-(x-a) = 2a. However, this reasoning turns out to be wrong due to rounding errors. Astrée recognizes that automatically.

Astrée is sound for floating-point computations and handles them precisely and safely. It takes all possible rounding errors into account. It will even consider possible cumulative effects in programs running for hours.

Here’s another example:

/* float.c */
void main () {
    float x,y,z;
    /* ... code to initialize x,y,z ... */
    if ( (((*((unsigned*)&x) & 0x7f800000) >> 23) != 255 ) /* not NaN */
       && (x >= -1.0e38) && (x <= 1.0e38) ) {
        while (1) {
            y = x+1.0e21;
            z = x-1.0e21;
            x = y-z;
        }}
    else
        return;
}

Astrée proves that the above program is free of runtime errors when running on a machine with floats on 32 bits.

Likewise, Astrée handles possible positive and negative infinity as well NaN values for variables, and can track their effects through arithmetic calculations and comparisons. This is used to precisely analyze source code with special handling of these values.

Digital filters

Control/command programs perform lots of digital filtering, as shown by the following example:

/* filter.c */
typedef enum {FALSE = 0, TRUE = 1} BOOLEAN;
BOOLEAN INIT;
float P, X;

void filter () {
  static float E[2], S[2];
  if (INIT) {
    S[0] = X;
    P = X;
    E[0] = X;
  } else {
    P = (((((0.5*X)-(E[0]*0.7))+(E[1]*0.4))+(S[0]*1.5))-(S[1]*0.7));
  }
  E[1] = E[0];
  E[0] = X;
  S[1] = S[0];
  S[0] = P;
}

void main () {
  X = 0.2 * X + 5;
  INIT = TRUE;
  while (1) {
    X = 0.9 * X + 35;
    filter ();
    INIT = FALSE;
  }
}

The absence of overflow (and more precisely that P is in [-1327.05, 1327.05] as found by Astrée) is not obvious, in particular because of 32-bit floating-point computations. The situa­tion becomes even more inextricable in the presence of boolean control or cascades of filters. Astrée knows enough about control theory to analyze filters precisely.

Signal-flow analysis

The idea of signal flow analysis is to analyze the flow from input signals to output signals through complex software. By identifying input signals as taint sources and output signals as taint sinks, taint analysis computes an over-approximation of the possible propagation or flow of an input signal. Hence, taint analysis is a good fit for signal flow analysis. The following depicts a small signal-flow example:

__ASTREE_taint_add((input; "INPUT"));
__ASTREE_taint_sink((output; all));
auto a = input;   // (1)
f(&b, a);         // f: *b <- a (2)
if (b) ...        // (3)
output = b + ...; // (4)

The input variable is tainted by hue INPUT, and the output variable is declared as sink, specifying that all possible signal influences are relevant for the output. Alternatively, a list of a relevant subset of input signals can be specified.

The taint directives can be included as externally-stored analysis configuration, so no actual code modifications are required.

The taint analysis now tracks signal propagation in simple direct cases like (1), indirect flow via pointers as in (2), as well as through complex computation exemplified by (4).

Whenever the output taint sink is reached by a taint, an alarm is reported by Astrée. Optionally, the influence of signals on control flow decisions can be reported such as in (3).

The taint analysis builds upon a full-fledged semantic analysis of the software, thus taking into account the influence of application configuration parameters on the signal flow. The analysis can be performed for a specific application configuration, avoiding false signal flows not present in that configuration. Alternatively, the influence of application parameters on the output signal can be determined for a generic configuration.

If Astrée does not report an influence of a tainted input signal on an output signal declared as taint sink, independence has been proven.