Module Bap_taint

Taint Analysis Framework.

Taint analysis is a dynamic data flow analysis, that tracks values on demand, i.e., the engine doesn't track all values, but only those that are marked as tainted.

In general, we say that a computation is tainted by an object, if its value is dependent (or somehow influenced) on the value of that object. However, the precise definition depends on the taint policy that is pluggabe and customizable. Some taint policies might be more relaxed, than others.

Traditionally, we use the word "taint" to represent an abstract entity whose influence we consider as undesirable or unsafe. Thus a computation is tainted, if it depends on a program runtime representation of that entity. An actual application of the framework may, of course, bend the notion of taint, and instead of attaching a negative connotation, treat it in a positive way thus checking for the liveness property instead of the safety property.

In the framework a taint represents an object of some kind, that we would like to track, i.e., we would like to know which values are affected by the value of that object, which program terms compute those values, etc. The engine can track several such objects at the same time and can distinguish objects of different kinds as well objects of the same kind. In other words a particular computation could be tainted by several taints, that correspondingly could have different kinds. To make it easier to understand let's take a classical taint analysis example with SQL sanitization. We would like to track all objects that were produced by sources that we do not trust. These objects are abstract to the framework and are introduced arbitrary by the analysis. For example, suppose there exists several calls to recv, and our analysis introduces a taint on each call to recv, that it designates with the unescaped-query kind. Every call to recv will create a new object that will belong to the same kind. Now, let's assume that our analysis designates the sql_escape function as a function that negates the untrustworthy effect of the input, and essentially clears the taint, or sanitizes it in our parlance. A value passed to the sql_escape function may have several taints attached to it. Some of those taints may be irrelevant to the unescaped-query kind, i.e., attached by other analysis. Since the query could be built from multiple calls to recv, there could be more than one taints of the unescaped-query kind attached to the value. So our analysis will clear all taints of the specified kind. Finally, when the privileged location is reached, e.g., the sql_exec function is called, we check that values passed to that function doesn't have any taints of the specified kind.

Other than classical approaches to taint analysis, described above, the framework could be used for checking liveness properties, lifetime analysis, and other analysis that requires tracking the flow of information.

module Std : sig ... end