Module Std.Disasm

The interface to the disassembler level.

The following definitions are used in documentation of modules and functions in this interface.

An instruction is a sequence of consecutive bytes that has known decoding in the given instruction set architecture (ISA). The following semantic properties of an instruction, as provided by ISA specification. In the definitions below the following properties play an important role (see for more details about the properties):

An instruction address is the address of the first byte of the instruction.

A jump instruction destination is an address defined by ISA specification to which the control flow should transfer if the jump is taken. Potentially, it is possible that the destination of a jump instruction follows the instruction, but otherwise, the instruction that follows the instruction is not the destination, only destinations of the taken jump are considered to be in the set of destinations of an instruction.

An instruction is a conditional jump if it is a jump instruction that is not always taken, as defined by the ISA specification.

An instruction is a barrier if it a jump that is not a call and is not conditional.

An execution order, is an order in which CPU executes instructions.

The linear order of a sequence of instructions is the ascending order of their addresses.

An instruction is delayed by m > 0 instructions if it takes effect not immediately but after m other instructions are executed.

An instruction i(k) follows the instruction i(j) if i(j) is not a barrier and either the address of i(k) is the successor of the address of the last byte of i(j) or if either i(k+m) or i(k) is an instruction that is delayed by m > 0 instructions.

A chain of instructions is a sequence of instruction {i(0); ...; i(k),i(k+1),i(n)} so that i(k+1) is either a resolved destination of i(k) or follows it. An instruction can belong to more than one chain.

A valid chain of instructions is a chain where the last instruction is a jump instruction that is either indirect or its destinations belong to some previous jump in the same chain.

An instruction is valid if it belongs to a valid chain of instructions.

A byte is data if one the following is true: 1) its address is an address of an instruction that is not valid; 2) it was classified in the knowledge base as data; 3) it is not an instruction.

A basic block is an non-empty instruction chain {i(1); ... i(n)} such that for each 1 < i <= n,

A subroutine is a non-empty finite set of basic blocks {b(1); ..; b(n)} such that b(1) dominates each block in {b(2); ..; b(n)} (which also implies that they are reachable) and b(1) is called the entry block (or point).

module Driver : sig ... end

Disassembler Driver.

module Subroutines : sig ... end

A set of subroutines.

type t = disasm
val create : cfg -> t

create cfg

val of_mem : ?backend:string -> ?brancher:brancher -> ?rooter:rooter -> arch -> mem -> t Core_kernel.Or_error.t

disassemble ?roots arch mem disassemble provided memory region mem using best available algorithm and backend for the specified arch. Roots, if provided, should point to memory regions, that are believed to contain code. At best, this should be a list of function starts. If no roots are provided, then the starting address of the provided memory mem will be used as a root.

The returned value will contain all memory reachable from the a given set of roots, at our best knowledge.

val of_image : ?backend:string -> ?brancher:brancher -> ?rooter:rooter -> image -> t Core_kernel.Or_error.t

disassemble_image image disassemble a given image. Will take executable segments of the image and disassemble it, applying disassemble function. If no roots are specified, then symbol table will be used as a source of roots. If file doesn't contain one, then entry point will be used.

val of_file : ?backend:string -> ?brancher:brancher -> ?rooter:rooter -> ?loader:string -> string -> t Core_kernel.Or_error.t

disassemble_file ?roots path takes a path to a binary and disassembles it

module With_exn : sig ... end

With_exn.f is the same as f except that it throws an exception instead of returning Error.

val merge : t -> t -> t

merge d1 d2 is a union of control flow graphs and erros of the two disassemblers.

val insns : t -> (mem * insn) seq

returns all instructions that was successfully decoded in an ascending order of their addresses. Each instruction is accompanied with its block of memory.

val cfg : t -> cfg

A whole program CFG.


val insn : insn tag

machine instruction.