This simple sketch illustrates a typical challenge faced by developers of safety-critical real-time software when trying to determine the worst-case execution time of a given task.
Increasing the number of tests and measurements increases your workload, and while it may improve your chances of hitting the actual WCET, it is even more likely to lull you into a false sense of security, with the worst case remaining undiscovered.
“Testing, in general, cannot show the absence of errors.” — DO-178B/C
“Testing by itself is not sufficient.” — FDA
Even if you do manage to identify and replicate all the conditions leading to the maximum execution time, you will need to constantly re-evaluate your test criteria whenever any changes are made to your code by anyone on your team.
In order to address all these problems, many companies come up with their own in-house heuristics to compute safe upper approximations for the WCET, without running any tests at all.
However, with modern processor components such as caches and pipelines, such heuristics cannot provide safe results unless they deliberately overestimate the WCET, often by orders of magnitude.