Module Theory.Parser

Generates parsers of untyped ASTs into type Core Theory terms.

The Parser.Make functor produces a recursive-descent parser for a family of untyped BIL-style languages.

In general, the problem of parsing an untyped representation into richly-typed term, such as the Core Theory term is convoluted if not tedious. This module along with the signatures in the Grammar module facilitates this task, at least for typical intermediate languages such as BIL and its variants.

The recursive-descent parser has six top-level rules, for each supported term sort:

The target language is not required to represent all those terms or to distinguish types in the same way as this parser do.

To implement a parser for a target language all five rules must be provided, but they could be incomplete or empty.

Each rule is a function, that takes a parser that recognizes a sub-language of a term that is parsed by the function, and the expression or statement of the target language. The function has to map each non-terminal of the target language to a rule in the provided subgrammar, i.e.,

        let bitv : type t r. (t,exp,r) bitv_parser =
          fun (module S) -> function
          | symbol .. symbol -> S.rule1 symbol .. symbol
          | ..
          | symbol .. symbol -> S.ruleN symbol .. symbol

Let's take the BIL parser as a concrete example,

let bitv : type t r. (t,exp,r) bitv_parser =
  fun (module S) -> function
    | Cast (HIGH,n,x) -> S.high n x
    | .. -> ..
    | Cast (SIGNED,n,x) -> S.signed n x
    | BinOp(PLUS,x,y) -> S.add x y
    | .. -> ..
    | BinOp(XOR,x,y) -> S.logxor x y
    | Let (v,y,z) when is_bit v -> S.let_bit (Var.name v) y z
    | .. -> ..
    | Let (v,y,z) when is_mem v -> S.let_mem (Var.name v) y z
    | .. -> ..
      (* ill-formed expressions *)
    | Let _
    | Store (_, _, _, _, _)
    | Unknown (_, (Mem _|Unk)) as exp -> fail exp `Bitv; S.error

We omitted most of the rules for brevity, see the full definition in the BIL plugin. The mapping is pretty straightforward. As this example highlights, the parser is also playing the role of a type checker. Expressions that are not valid in the given context has to be mapped to the error term. The term itself doesn't have any payload and works as an abort symbol for the parser. Since parser are expected to be implemented in a plugin code, they can utilize to full extend the interface with the user to provide fancy error messages.

The context of the expression is automatically inferred by the parser generator with the help of the top-level syntactic rules provided by the user. In the example above, the context of the let-bound expression is determined from the type of the variable. For some languages, it might be necessary to write a type-inference algorithm.

Note, that the parser must return a value of type S.t which is abstract and couldn't be unified with the type of the Core Theory term. Therefore, the rule implementer is limited to the language that is defined by the transitive closure of the sub-grammar that is passed to the function that implements the rule. That basically makes this parser infrastructure non-extensible, which is alleviated by the fact that provided sub-grammars are very rich. The task of writing an extensible parser is still a research question.

Notes on future extensions

We're reserving the right to extend the grammar signatures in the Parser.Grammar module without changing the major version of the library, since those modules are designed to be used co-inductively (as types of modules that are provided by the library, not required), therefore adding more definitions to the grammars which could be used by the library user makes the library strictly more powerful (stronger). To prevent any issues, it is recommended not to open the provided module (i.e., always use the grammar rules via the dot notation, e.g., S.add), since adding a new rule may hide the existing functions in the scope if the module is opened. For alternative implementations of this parser generator it is recommended not to reuse these module types literally but to copy them (or define your own rules).

module Grammar : sig ... end

An untyped grammar for a subset of Core Theory languages.

type ('a, 'e, 'r) bitv_parser = (module Grammar.Bitv with type exp = 'e and type rmode = 'r and type t = 'a) -> 'e -> 'a

bitv grammar exp parses exp using grammar.

type ('a, 'e, 'r) bool_parser = (module Grammar.Bool with type exp = 'e and type t = 'a) -> 'e -> 'a

bool grammar exp parses exp using grammar.

type ('a, 'e) mem_parser = (module Grammar.Mem with type exp = 'e and type t = 'a) -> 'e -> 'a

mem grammar exp parses exp using grammar.

type ('a, 'e, 'r, 's) stmt_parser = (module Grammar.Stmt with type exp = 'e and type rmode = 'r and type stmt = 's and type t = 'a) -> 's -> 'a

stmt grammar stmt parses stmt using grammar.

type ('a, 'e, 'r) float_parser = (module Grammar.Float with type exp = 'e and type rmode = 'r and type t = 'a) -> 'e -> 'a

float grammar exp parses exp using grammar.

type ('a, 'e) rmode_parser = (module Grammar.Rmode with type exp = 'e and type t = 'a) -> 'e -> 'a

rmode grammar exp parses exp using grammar.

type ('e, 'r, 's) t = {
  1. bitv : 'a. ('a, 'e, 'r) bitv_parser;
  2. bool : 'a. ('a, 'e, 'r) bool_parser;
  3. mem : 'a. ('a, 'e) mem_parser;
  4. stmt : 'a. ('a, 'e, 'r, 's) stmt_parser;
  5. float : 'a. ('a, 'e, 'r) float_parser;
  6. rmode : 'a. ('a, 'r) rmode_parser;
}
type ('e, 'r, 's) parser = ('e, 'r, 's) t

parser is a tuple of top-level parsing routines.

Parser defines a Core Theory denotation of an untyped AST with expressions of type 'e, statements of type 's and rounding modes represented with type 'r.

module Make (S : Core) : sig ... end

Make(Theory) parses AST to the specified Theory terms.