Module Std.Project

Disassembled program.

Project contains data that we were able to reconstruct during the disassembly, semantic analysis, and other arbitrary amount of analyses.

Actually, project allows to associate arbitrary data with memory regions, program terms, and even attach them globally to itself. So it can be seen as a knowledge base of deeply interconnected facts.

Other than delivering information, from the bap to a passes, it can be also used as a communication media between different passes, (see Working with project).

type t = project
type state
val bin_shape_state : Core_kernel.Bin_prot.Shape.t
val bin_size_state : state Core_kernel.Bin_prot.Size.sizer
val bin_write_state : state Core_kernel.Bin_prot.Write.writer
val bin_writer_state : state Core_kernel.Bin_prot.Type_class.writer
val bin_read_state : state Core_kernel.Bin_prot.Read.reader
val __bin_read_state__ : (int -> state) Core_kernel.Bin_prot.Read.reader
val bin_reader_state : state Core_kernel.Bin_prot.Type_class.reader
val bin_state : state Core_kernel.Bin_prot.Type_class.t
type input
type library

IO interface to a project data structure.

include Regular.Std.Data.S with type t := t
type info = string * [ `Ver of string ] * string option

name,Ver v,desc information attached to a particular reader or writer.

val version : string

Data representation version. After any change in data representation the version should be increased.

Serializers that are derived from a data representation must have the same version as a version of the data structure, from which it is derived. This kind of serializers can only read and write data of the same version.

Other serializers can actually read and write data independent on its representation version. A serializer, that can't store data of current version simply shouldn't be added to a set of serializers.

It is assumed, that if a reader and a writer has the same name and version, then whatever was written by the writer should be readable by the reader. The round-trip equality is not required, thus it is acceptable if some information is lost.

It is also possible, that a reader and a writer that has the same name are compatible. In that case it is recommended to use semantic versioning.

val size_in_bytes : ?ver:string -> ?fmt:string -> t -> int

size_in_bytes ?ver ?fmt datum returns the amount of bytes that is needed to represent datum in the given format and version

val of_bytes : ?ver:string -> ?fmt:string -> Regular.Std.bytes -> t

of_bytes ?ver ?fmt bytes deserializes a value from bytes.

val to_bytes : ?ver:string -> ?fmt:string -> t -> Regular.Std.bytes

to_bytes ?ver ?fmt datum serializes a datum to a sequence of bytes.

val blit_to_bytes : ?ver:string -> ?fmt:string -> Regular.Std.bytes -> t -> int -> unit

blit_to_bytes ?ver ?fmt buffer datum offset copies a serialized representation of datum into a buffer, starting from the offset.

val of_bigstring : ?ver:string -> ?fmt:string -> Core_kernel.bigstring -> t

of_bigstring ?ver ?fmt buf deserializes a datum from bigstring

val to_bigstring : ?ver:string -> ?fmt:string -> t -> Core_kernel.bigstring

of_bigstring ?ver ?fmt datum serializes a datum to a sequence of bytes represented as bigstring

val blit_to_bigstring : ?ver:string -> ?fmt:string -> Core_kernel.bigstring -> t -> int -> unit

blit_to_bigstring ?ver ?fmt buffer datum offset copies a serialized representation of datum into a buffer, starting from offset.

module Io : sig ... end

Input/Output functions for the given datum.

module Cache : sig ... end

Data cache.

val add_reader : ?desc:string -> ver:string -> string -> t Regular.Std.reader -> unit

add_reader ?desc ~ver name reader registers a new reader with a provided name, version ver and optional description desc

val add_writer : ?desc:string -> ver:string -> string -> t Regular.Std.writer -> unit

add_writer ?desc ~ver name writer registers a new writer with a provided name, version ver and optional description desc

val available_readers : unit -> info list

available_reader () lists available readers for the data type

val default_reader : unit -> info

default_reader returns information about default reader

val set_default_reader : ?ver:string -> string -> unit

set_default_reader ?ver name sets new default reader. If version is not specified then the latest available version is used. Raises an exception if a reader with a given name doesn't exist.

val with_reader : ?ver:string -> string -> (unit -> 'a) -> 'a

with_reader ?ver name operation temporary sets a default reader to a reader with a specified name and version. The default reader is restored after operation is finished.

val available_writers : unit -> info list

available_writer () lists available writers for the data type

val default_writer : unit -> info

default_writer returns information about the default writer

val set_default_writer : ?ver:string -> string -> unit

set_default_writer ?ver name sets new default writer. If version is not specified then the latest available version is used. Raises an exception if a writer with a given name doesn't exist.

val with_writer : ?ver:string -> string -> (unit -> 'a) -> 'a

with_writer ?ver name operation temporary sets a default writer to a writer with a specified name and version. The default writer is restored after operation is finished.

val default_printer : unit -> info option

default_writer optionally returns an information about default printer

val set_default_printer : ?ver:string -> string -> unit

set_default_printer ?ver name sets new default printer. If version is not specified then the latest available version is used. Raises an exception if a printer with a given name doesn't exist.

val with_printer : ?ver:string -> string -> (unit -> 'a) -> 'a

with_printer ?ver name operation temporary sets a default printer to a printer with a specified name and version. The default printer is restored after operation is finished.

Low level access to serializers

val find_reader : ?ver:string -> string -> t Regular.Std.reader option

find_reader ?ver name lookups a reader with a given name. If version is not specified, then a reader with maximum version is returned.

val find_writer : ?ver:string -> string -> t Regular.Std.writer option

find_writer ?ver name lookups a writer with a given name. If version is not specified, then a writer with maximum version is returned.

val create : ?package:string -> ?state:state -> ?disassembler:string -> ?brancher:brancher source -> ?symbolizer:symbolizer source -> ?rooter:rooter source -> ?reconstructor:reconstructor source -> input -> t Core_kernel.Or_error.t

create input creates a project from the provided input source.

The input code regions are speculatively disassembled and the set of basic blocks is determined, using the algorithm described in Disasm.Driver. After that the concrete whole program control-flow graph (CFG) is built, which can be accessed with the Project.disasm function. The whole program CFG is then partitioned into a set of subroutines using the dominators analsysis, see Disasm.Subroutines for details. Based on this partition a symbol table, which is a set of a subroutines control-flow graphs, is built. The symbol table, which can be accessed with Project.symbols, also contains information about the interprocedural control flow. Finally, the symbol table is translated into the intermediate representation, which can be accessed using the Project.program function. The whole process is pictured below.

                         ---------------------
                        (    Disassembling    )
                         ---------------------
                                  |
                        +---------------------+
                        |                     |
                        |  All instructions   |
                        |  and basic blocks   |
                        |                     |
                        +---------------------+
                                  |
                         ---------------------
                        ( CFG  reconstruction )
                         ---------------------
                                  |
                        +---------------------+
                        |                     |
                        |  The whole program  |
                        |  control-flow graph |
                        |                     |
                        +---------------------+
                                  |
                         ---------------------
                        (    Partitioning     )
                         ---------------------
                                  |
                        +---------------------+
                        |                     |
                        | The quotient set of |
                        |    basic  blocks    |
                        |                     |
                        +---------------------+
                                  |
                         ---------------------
                        ( Constructing Symtab )
                         ---------------------
                                  |
                        +---------------------+
                        |                     |
                        |   The symbol table  |
                        |  and the  callgraph |
                        |                     |
                        +---------------------+
                                  |
                         ---------------------
                        (  IR Reconstruction  )
                         ---------------------
                                  |
                        +---------------------+
                        |                     |
                        |    The IR of the    |
                        |   binary  program   |
                        |                     |
                        +---------------------+

The disassembling process is fully integrated with the knowledge base. If the input source provides information about symbols and their location, then this information will be automatically reflected to the knowledge base.

The brancher, symbolizer, and rooter parameters are ignored since 2.0.0 and their information could be reflected to the knowledge base using, correspondingly, Brancher.provide, Symbolizer.provide, and Rooter.provide functions.

  • parameter state

    if specified then the provided state will be used as the initial state

  • parameter package

    if specified, then all symbols during the disassembly will be created (interned) in the specified package.

  • since 2.0.0 the state parameter is added
  • since 2.0.0 the parameter [disassembler] is unused
  • since 2.0.0 the parameter [brancher] is unused
  • since 2.0.0 the parameter [symbolizer] is unused
  • since 2.0.0 the parameter [rooter] is unused
  • since 2.0.0 the parameter [reconstructor] is unused
  • since 2.2.0 the parameter [package] is added
  • since 2.6.0 if [input] consists of library files in addition

to the main binary, then the accessors to the state of the project reflect that of the main binary, except for program, which contains the code of both the main program and the library programs linked together.

empty target creates a for the given target.

val arch : t -> arch

arch project reveals the architecture of a loaded file

  • deprecated

    use target project instead.

target project returns the target system of the project.

  • since 2.2.0
val specification : t -> Ogre.doc

specification p returns the specification of the binary.

  • since 2.2.0
val state : t -> state

state project returns the core state of the project.

  • since 2.0.0

disasm project returns results of disassembling

val disasm : t -> disasm

disasm project returns results of disassembling

val program : t -> program term

program project returns a program lifted into IR

val with_program : t -> program term -> t

with_program project program updates a project program

val map_program : t -> f:(program term -> program term) -> t

map_program t ~f maps the IR representation of the program with function f.

  • since 2.6.0 the program is no longer lazily computed.
val symbols : t -> symtab

symbols t returns reconstructed symbol table

val with_symbols : t -> symtab -> t

with_symbols project symbols updates project symbols

val storage : t -> dict

returns an attribute storage of the project

val with_storage : t -> dict -> t

updates the attribute storage

val memory : t -> value memmap

memory t returns the memory as an interval tree marked with arbitrary values.

the memory of the unit in the knowledge base.

  • since 2.2.0
val tag_memory : t -> mem -> 'a tag -> 'a -> t

tag_memory project region tag value tags a given region of memory in project with a given tag and value. Example: Project.tag_memory project tained color red

val substitute : t -> mem -> string tag -> string -> t

substitute p region tag value is like tag_memory, but it will also apply substitutions in the provided string value, as per OCaml standard library's Buffer.add_substitute function.

Example:

Project.substitute project comment "$symbol starts at $symbol_addr"

The following substitutions are supported:

  • $section{_name,_addr,_min_addr,_max_addr} - name of region of file to which it belongs. For example, in ELF this name will correspond to the section name
  • $symbol{_name,_addr,_min_addr,_max_addr} - name or address of the symbol to which this memory belongs
  • $asm - assembler listing of the memory region
  • $bil - BIL code of the tagged memory region
  • $block{_name,_addr,_min_addr,_max_addr} - name or address of a basic block to which this region belongs
  • $min_addr, $addr - starting address of a memory region
  • $max_addr - address of the last byte of a memory region.
val with_memory : t -> value memmap -> t

with_memory project updates project memory. It is recommended to use tag_memory and substitute instead of this function, if possible.

Extensible record

Project can also be viewed as an extensible record, where one can store arbitrary values. Example,

let p = Project.set project color `green

This will set field color to a value `green.

val set : t -> 'a tag -> 'a -> t

set project field value sets a field to a give value. If field was already set, then new value overrides the old one. Otherwise the field is added.

val get : t -> 'a tag -> 'a option

get project field returns the value of the field if it exists

val has : t -> 'a tag -> bool

has project field checks whether field exists or not. Useful for fields of type unit, that actually isomorphic to bool fields, e.g., if Project.has project mark

val del : t -> 'a tag -> t

del project attr removes an attribute from a project

val libraries : t -> library list

libraries project returns the shared libraries that were loaded with project.

module Library : sig ... end

A library that was loaded alongside the main program.

module Info : sig ... end

Information obtained during project reconstruction.

module State : sig ... end

The core state of the project.

module Input : sig ... end

Input information.

Registering passes

To add new pass one of the following register_* functions should be called.

type pass
val register_pass : ?autorun:bool -> ?runonce:bool -> ?deps:string list -> ?name:string -> (t -> t) -> unit

register_pass ?autorun ?runonce ?deps ?name pass registers a pass over a project.

If autorun is true, then the host program will run this pass automatically. If runonce is true, then for a given project the pass will be run only once. Each repeating attempts to run the pass will be ignored. The runonce parameter defaults to false when autorun is false, and to true otherwise.

Parameter deps is list of dependencies. Each dependency is a name of a pass, that should be run before the pass. The dependencies will be run in a specified order every time the pass is run.

To get access to command line arguments use Plugin.argv

val register_pass' : ?autorun:bool -> ?runonce:bool -> ?deps:string list -> ?name:string -> (t -> unit) -> unit

register_pass' pass registers pass that doesn't modify the project effect and is run only for side effect. (See register_pass)

val passes : unit -> pass list

passes () returns all currently registered passes.

val find_pass : string -> pass option

find_pass name returns a pass with the given name.

type second = float

time duration in seconds

module Pass : sig ... end

A program analysis pass.

module Collator : sig ... end

A pass that collates projects.

module Analysis : sig ... end

Knowledge base analyses.