Module Bap_primus.Std.Primus.Lisp

module Lisp: sig .. end

Lisp machine.

The Lisp Machine is an extensible Lisp Machine embedded into the Primus Machine. The Lisp machine is used to provide function stubs (summaries), as well as to control the Primus Machine using a dialect of Lisp.

Primus Lisp Language

Overview

Primus Lisp is a dialect of Lisp, that can be used to interact with a native program. Primus Lisp is close to Common Lisp and to the Emacs Lisp dialect.

Primus Lips is a low-level language that doesn't provide many abstractions, as it tries to be as close to the machine language as possible. In that sense Primus Lisp can be seen as an assembler, except that it can't really assemble binaries, as it operates over already existing and assembled program. Primus Lisp, however is still quite powerfull, as the absence of suitable abstractions is compensated with powerful and versatile meta-programming system.

Primus Lisp is primarily used for the following tasks:

A Primus Lisp program is a file, that can contain the following entities:

The entities may be specified in any order, however the above order constitutes a good programming practice.

Each file provides (implements) a feature, that has the same name as the name of the file without an extension and directories. Thus the namespace of features is flat. A feature is usually a function, macro definition, or any other definition, or a collection of definition, gathered under the same theme. For example, the getopt feature implements C getopt function, and accompanying definitions. The features maybe very specific, i.e., providing an implementation for only one small function, or they can be a collection of other features. For example, the posix feature provides an implementation of all functions specified in the POSIX standard (not all at the time of writing).

A collection of files is called a library. To use features provided by another file, the file should be requested with the (require <ident>) form, where <ident> is the name of the feature. A file with the requested name is searched in the library, and loaded, making all its definitions available in the lexical scope, that follows the require form. If a feature is already provided, then nothing happens. Dependencies should not contain cycles.

Top-level declarations specify attributes that are shared by all definitions in a file. For example, a declaration

(declare (context (arch armv7))

makes all definitions visible only in the context of the ARMv7 architecture.

Constants and substitutions are primitive abstractions that give names to code fragments. Macros are program transformations. Functions add parameters to a code, and are basic building blocks. Scope of all definitions can be limited with the context declarations. Finally, a function can be advised with another function using the advice-add function.

Type system

A type defines all possible values of an expression. In Primus Lisp, expression values can be only scalar, i.e., machine words of different widths. The width is always specified in the number of bits. A maximum width of a word is equal to the width of the architecture machine word, thus a family of types is dependent on the context of evaluation. (Note, current implementation limits maximum width of the machine word to 64 bits). We denote a type of expression with a decimal number, e.g., (exp : 16) means that an expression ranges over all 16 bit words.

An expression can have a polymorphic type t that denotes a powerset of all types for the given architecture. For example, for ARMv7, t = 32 \/ 31 \/ .. \/ 1 . Thus a value of any type, is a also a value of type t.

Side note -- the type system doesn't include the unit type, i.e., the 0 type. An expression () evaluates to the 0:1 value.

Functions and expressions

Functions are named abstractions of code, where a code is a sequence of expressions. Since a value of an expression is a machine word, functions are not first-class values in Primus Lisp. However, functions and types can be manipulated on the meta-programming level.

A function is defined with the defun form, that has the following syntax:

          (defun <name> (<arg> ...) <exp> ...)
        

A list of arguments (that can be empty) defines function arity. Functions in Primus Lisp has fixed arity, unlike macros.

A function definition may optionally contain a documentation strings and a declaration section. For example,

         (defun strlen (p)
           "returns a length of the null-terminated string pointed by P"
           (declare (external "strlen"))
           (msg "strlen was called with $p")
           (let ((len 0))
             (while (not (points-to-null p))
               (incr len p))
             len))
       

A function can be called (applied) using the function application form:

(<name> <exp> ...)

The first element of the function application form is not an expression and must be an identified. The rest arguments are expressions, that are evaluated from left to right. All arguments are passed by value.

The body of a function is a sequence of expressions, that is evaluated in the lexical order (i.e., from left to right). A value of the last expression is the result of the function evaluation. An expression is either a function application or or a special form. Primus Lisp defines only 5 special forms, the rest of the syntax is defined using the macro system.

Conditionals

The (if <test-expr> <then-expr> <else-expr> ...) form is a basic control flow structure. If <test-expr> evaluates to a non-zero word then the result of the <if> form is the result of evaluation of the <then-expr>, otherwise a sequence of <else-expr> ... is evaluated and the result of the form evaluation would be a result of the last expression in a form

For example,

        (if (< 4 3)
            (msg "shouldn't happen")
          (msg "that's right")
          (- 4 3))
       

Note that the the <else-expr> sequence maybe empty.

Several derived forms are defined as macros, e.g.,

          (when <cond> <expr> ...)
          (or <expr> ...)
          (and <expr> ...)
       

Loops

Iterations can be implemented either using recursion or with the while special form. Since the interpreter doesn't provide the tail-call optimization it is better to use the latter (although the interpreter itself is using a constant stack size, as it uses the host language heap memory to represent the Primus Lisp call stack).

The (while <cond> <expr> ...) form, will evaluate the <cond> expression first, and if it is a non-zero value, then the sequence of expressions <expr> ... is evaluated, and the value of the last expression becomes the value of the while form. If the value of the <cond> expression is a false value, then this value becomes the value of the while form.

Variables

The let form binds values to names in the lexical scope.

         (let (<binding> ...)  <body-expr> ...)
         binding ::= (<var> <expr>)
       

Evaluates each <binding> in order binding the <var> identifier to a result of <expr>. The newly created binding is available in consequent bindings and in the <body-expr>, but is not visible outside of the scope of the let-form.

Example,

        (let ((x 4)
              (y (+ x 2)))
          (+ x 3))
       

The value of the let form is the value of the last expression <sN> .

Sequencing

The (prog <expr> ...) form combines a sequence of expressions into one expression, and is useful in the contexts where an expression is required. The expressions are evaluated from left to right, and the value of the prog form is the value of the last expression.

Messaging

The (msg <fmt> <expr> ...) form constructs logging/debugging messages using an embedded formatting language. The formed message will be sent to the logging facility, that was set up during the Primus Lisp library initialization.

The format language interprets all symbols literally, unless they start with the dollar sign ($).

A pair of characters of the form $<n> , where <n> is a decimal digit, will be substituted with the value of the n'th expression (counting from zero).

Example,

(msg "hello, $0 $0 world, (+ 7 8) = $1" "cruel" (+ 7 8))

will be rendered to a message:

"hello, cruel cruel world, (+ 7 8) = 15"

Metaprogramming

Ordinary Primus Lisp expressions are evaluated at the runtime in the Primus emulator, and are quite limited as they need to be evaluated directly on the CPU model. To mitigate this limitation, Primus Lisp provides a powerful metaprogramming system. The metaprogram is evaluated when the Primus Lisp program is read. A metaprogram generates a program, that will be evaluated by the CPU. The metaprogram itself is Turing complete, thus any transformation can be applied to a program. The Primus Lisp metaprogramming system use term-rewriting as a computational model, with Lisp code fragments as terms. Primus Lisp provides three facilities for metaprogramming:

Constants

The syntactic constants is the simplest syntactic substitution, it just substitutes atoms for atoms. Constants are introduced with the defconstant form, that has the following syntax:

         (defconstant <name> <atom>)
         (defconstant <name> <docstring> <atom>)
         (defconstant <name> <declarations> <atom>)
         (defconstant <name> <docstring> <declarations> <atom>)
       

For example,

(defconstant main-address 0xDEAD)

During the program parsing, each occurrence of the <name> term will be rewritten with the <value> term, that should be an atom.

Substitutions

The syntactic substitution is a generalization of syntactic constant, and has quite a similar syntax:

          | (defsubst <name> <value> ...)
          | (defsubst <name> <declarations> <value> ...)
          | (defsubst <name> :<syntax> <value> ...)
          | (defsubst <name> <declarations> :<syntax> <value> ...)
       

During parsing, every occurrence of the term <name> (that should be an atom), will be rewritten with a sequence of values

 {<value>} 

.

Example,

(defsubst ten-digits 0 1 2 3 4 5 6 7 8 9)

A process of applying of the substitutions is called "expansion". Since the expansion transforms an atom to a list of atoms, it can be applied only inside the macro or function application. For example,

(+ ten-digits)

will be expanded to

(+ 0 1 2 3 4 5 6 7 8 9 )
Special syntax

Expansions also provide a support for extensible value specification syntax, that enables domain-specific data specification languages. Currently, we support only two syntaxes: :ascii and :hex.

In the :ascii syntax the values should be atoms, possibly delimited with double quotes. Each character of each atom will be expanded to its corresponding ASCII code. Strings can contain special characters prefixed with a backslash. The special character can be one of the well-known ASCII special character, e.g., \n, \r, etc, or it can be a decimal or a hexadecimal code of a character.

Example, given the following substitution:

(defsubst hello-cruel-world :ascii "hello, cruel world\n\000")

the following application:

(write-block SP hello-cruel-world

will be expanded with

          (write-block SP
             0x68 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x63
             0x72 0x75 0x65 0x6c 0x20 0x77 0x6f 0x72
             0x6c 0x64 0x0a 0x00)
        

In the :hex syntax the sequence of atoms is split into two-characters subsequences each treated as a hex value. This syntax is useful for encoding memory dumps in a format that is close to the hexdump (without offsets). E.g., given the following substitution rule

          (defsubt example :hex 68656c 6c 6f2c2063)
        

an application

          (write-block SP example)
        

will be expanded into

          (write-block SP 0x68 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x63)
        

Macro

The macros provide the most versatile and powerful way to specify arbitrary code transformations. The macro definitions introduce abstractions on the meta-programming level. I.e., it allows a programmer to write a function that operates on code terms, making the code a first class value.

The macro definition has the following syntax:

          (defmacro <name> (<param> ...) <value>)
          (defmacro <name> (<param> ...) <docstring> <value>)
          (defmacro <name> (<param> ...) <declarations> <value>)
          (defmacro <name> (<param> ...) <docstring> <declarations> <value>)
       

A macro definition adds a term rewriting rule, that rewrites each occurrence of (<name> <arg> ...) , where the number of arguments N is greater or equal then the number of parameters M, with the <value> in which occurrences of the ith parameter is substituted with the term ith argument. If N is bigger than M, then the last parameter is bound with the sequence of arguments <argM>...<argN> .

The macro subsystem doesn't provide any specific looping or control-flow facilities, however, the macro-overloading mechanism along with the recursion make it possible to encode arbitrary meta-transformations.

Other than a standard context-based ad-hoc overloading mechanism, the macro application uses the arity-based resolution. As it was described above, if a number of arguments is greater than the number of parameters, then the last parameter is bound to the rest of the arguments. When several macro definitions matches, then a definition that has fewer unmatched arguments is chosen. For example, suppose we have the following definitions:

          (defmacro list-length (x) 1)
          (defmacro list-length (x xs) (+ 1 (list-length xs)))
       

The the following term

(list-length 1 2 3)

will be normalized (after a series of transformations) with the following:

(+ 1 (+ 1 1))
          1: (list-length 1 2 3) => (+ 1 (list-length 2 3))
          2: (+ 1 (list-length 2 3)) => (+ 1 (+ 1 (list-length 3)))
          3: (+ 1 (+ 1 (list-length 3))) => (+ 1 (+ 1 1))
       

In the first step, both definition match. In the first definition x is bound to 1 2 3, while in the second x is bound to 1 and xs is bound to 2 3. Since the last parameter is bound to fewer arguments, the second definition is chosen as the most certain. In the second step the second definition is still more concrete. Finally at the last step, the second definition doesn't match at all, as it has more parameters than arguments.

A slightly more complex example, is a fold iterator, that applies a function to a sequence of arguments of arbitrary length, e.g.,

          (defmacro fold (f a x) (f a x))
          (defmacro fold (f a x xs) (fold f (f a x) xs))
       

Using this definition we can define a sum function (although it is not needed as the + function defined in the Primus Lisp standard library, already accepts arbitrary number of arguments), as:

(defmacro sum (xs) (fold + 0 xs))

The (sum 1 2 3) will be rewritten as follows:

          1: (sum 1 2 3) => (fold + 0 1 2 3)
          2: (fold + 0 1 2 3) => (fold + (+ 0 1) 2 3)
          3: (fold + (+ 0 1) 2 3) => (fold + (+ (+ 0 1) 2) 3)
          4: (fold + (+ (+ 0 1) 2) 3) => (+ (+ (+ 0 1) 2) 3)
       

A more real example is the write-block macro, that takes a sequence of bytes, and writes them starting from the given address:

          (defmacro write-block (addr bytes)
             (fold memory-write addr bytes))
       

The definition uses the memory-write primitive, that writes a byte at the given address and returns an address of the next byte.

Polymorphism

Primus Lisp provides both kinds of polymorphism: parametric and ad hoc.

Expressions in Primus Lisp have types, that are denoted with natural numbers starting with one. Each type defines a set of values that can be represented with the given number of bits. Values with different widths are different, even if they represent the same number. Expressions can be polymorphic, e.g., function

(defun square (x) ( * x x))

has type `forall n. n -> n -> n`. Thus it can be applied to values of different types, e.g., (square 4:4), that will be evaluated to the 0:4 value, or (square 4:8), that will be evaluated to 16:8, etc. The parametric polymorphism doesn't require any special annotations or type specifications so we will not stop on it anymore.

The ad hoc polymorphism provides a facilities for overloading definitions. That is, the same entity may have multiple definitions, and depending on a context, only one definition should be chosen. Not only functions can have multiple definitions, but also macros, constants, and substitutions. Since the latter three entities operate on the syntactic level, the syntax of Primus Lisp itself is context-dependent.

Context

A context (from the perspective of the type system) is a set of type class instances. The context is fixed when a program is parsed. A Primus Lisp program may not change the context; neither in runtime, nor it the parse time, thus a program is parsed and evaluated at the specific context. However, a definition may declare that it makes sense only in some context. If more than one definition make sense under the given context, then the most specific one is chosen. If no definition is more specific than another, then an error occurs.

A type class defines a type as a set of features. The subset relation induces a subtyping relation over types - a type t' is a subtype of a type t if t' <= t (i.e., if t' is a subset of t). Each feature is a textual tag, called a feature constructor.

The context declaration limits an associated definition to the specified type class(es), and has the following syntax:

          (declare (context (<type-class> <feature> ...) ...))
       

Let's use the following two definitions for a concrete example,

          (defmacro get-arg-0 ()
             (declare (context (arch arm gnueabi)))
             R0)

          (defmacro get-arg-0 ()
             (declare (context (arch x86 cdecl)))
             (read-word word-width (+ SP (sizeof word-width))))
       

We have two definitions of the same macro get-arg-0, that are applicable to different contexts. The first definition, is only applicable in the context of the ARM architecture and the gnueabi ABI. The second is applicable in the context of the x86 architecture and the cdecl ABI. More formally, a definition is considered only if its context is a subtype of the current type context.

Advice mechanism

Primus Lisp also provides a mechanism for non-intrusive extending existing function definitions. An existing definition maybe advised with another definition. A piece of advice maybe added to a function that will be called either before or after the evaluation of an advised function, e.g.,

          (defun memory-written (a x)
            (declare (advice :before memory-write))
            (msg "write $x to $a"))
       

This definition not only defines a new function called memory-written, but also proclaims it as advice function to the memory-write function that should before it is called.

If an advisor is attached before the advised function, then it the advisor will be called with the same arguments as the advised function. The return value of the advisor is ignored. The advisor function will be called as a normal Lisp function, with all expected overloading and name resolving. So it is possible to provide context specific advice. If there are several advice to the same function, then they will be called in the unspecified order.

An advisor that is attached after the advised function will be called with one extra argument - the result of evaluation of the advised function. The value returned by the advisor will override the result of the advised function. If there are several advisors attached after the same function, then they will be called in the unspecified order.

Signaling Mechanims

The Primus Observation system is reflected onto the Primus Lisp Machine Signals. Every time a reflected observation occurs the Lisp Machine receives a signal that is dispatched to handlers. A handler can be declared defined with the defmethod form, e.g.,

        (defmethod call (name arg)
          (when (= name 'malloc)
            (msg "malloc($0) was called" arg)))
       

The defmethod form follows the general definiton template, i.e., it can contain a docstring and declaration section, and selection and resolution rules are applicable to methods. Methods of the same signal are invoked in an unspecified order.

Formal syntax

Each entity is an s-expression with the grammar, specified below.We use BNF-like syntax with the following conventions. Metavariables are denoted like <this>. The <this> ... stands of any number of <this> (possibly zero). Ordinary parentheses do not bear any notation, and should be read literally. Note, since the grammar is not context free, and is extensible, the following is an approximation of the language grammar. Grammar extension points are defined with the '?extensible?'comment in a production definition.

module ::= <entity> ...

entity ::=
  | <feature-request>
  | <declarations>
  | <constant-definition>
  | <substitution-definition>
  | <parameter-definition>
  | <macro-definition>
  | <function-definition>
  | <method-definition>

feature-request ::= (require <ident>)

declarations ::= (declare <attribute> ...)

constant-definition ::=
  | (defconstant <ident> <atom>)
  | (defconstant <ident> <atom> <docstring>)
  | (defconstant <ident> <atom> <declarations>)
  | (defconstant <ident> <atom> <declarations> <docstring>)

parameter-definition ::=
  | (defparameter <ident> <atom>)
  | (defparameter <ident> <atom> <docstring>)
  | (defparameter <ident> <atom> <declarations> <docstring>)

substitution-definition ::=
  | (defsubst <ident> <atom> ...)
  | (defsubst <ident> <declarations> <atom> ...)
  | (defsubst <ident> :<syntax> <atom> ...)
  | (defsubst <ident> <declarations> :<syntax> <atom> ...)

macro-definition ::=
  | (defmacro <ident> (<ident> ...) <exp>)
  | (defmacro <ident> (<ident> ...) <docstring> <exp>)
  | (defmacro <ident> (<ident> ...) <declarations> <exp>)
  | (defmacro <ident> (<ident> ...) <docstring> <declarations> <exp>)

function-definition ::=
  | (defun <ident> (<var> ...) <exp> ...)
  | (defun <ident> (<var> ...) <docstring> <exp> ...)
  | (defun <ident> (<var> ...) <declarations> <exp> ...)
  | (defun <ident> (<var> ...) <docstring> <declarations> <exp> ...)

method-definition ::=
  | (defmethod <ident> (<var> ...) <exp> ...)
  | (defmethod <ident> (<var> ...) <docstring> <exp> ...)
  | (defmethod <ident> (<var> ...) <declarations> <exp> ...)
  | (defmethod <ident> (<var> ...) <docstring> <declarations> <exp> ...)


exp ::=
  | ()
  | <var>
  | <word>
  | <sym>
  | (if <exp> <exp> <exp> ...)
  | (let (<binding> ...) <exp> ...)
  | (set <var> <exp>)
  | (while <exp> <exp> <exp> ...)
  | (prog <exp> ...)
  | (msg <format> <exp> ...)
  | (<ident> <exp> ...)

binding ::= (<var> <exp>)

var ::= <ident> | <ident>:<size>

attribute ::=
  | (external <ident> ...)
  | (context (<ident> <ident> ...) ...)
  | (<ident> ?ident-specific-format?)

docstring ::= <text>

syntax ::= :hex | :ascii | ...

atom  ::= <word> | <text>

word  ::= ?ascii-char? | <int> | <int>:<size>

sym   ::= '<atom>

int   ::= ?decimal-octal-hex-or-bin format?

size  ::= ?decimal?

ident ::= ?any atom that is not recognized as a <word>?
       

type program 

an abstract type representing a lisp program

type message 

an abstract type that represents messages send with the msg form.

module Load: sig .. end

Primus Lisp program loader

module Doc: sig .. end
module Type: sig .. end

Lisp Type System.

module Message: sig .. end

Lisp Machine Message interface.

module type Closure = (Machine : Std.Primus.Machine.Ssig .. end

Machine independent closure.

type closure = (module Bap_primus.Std.Primus.Lisp.Closure) 

a closure packed as an OCaml value

module Primitive: sig .. end
module type Primitives = (Machine : Std.Primus.Machine.Ssig .. end
type primitives = (module Bap_primus.Std.Primus.Lisp.Primitives) 

a list of primitives.

type Std.Primus.exn += 
| Runtime_error of string
val message : message Std.Primus.observation

message observation occurs every time a message is sent from the Primus Machine.

module Make (Machine : Std.Primus.Machine.S: sig .. end

Make(Machine) creates a Lisp machine embedded into the Primus Machine.