Module Disasm_expert.Basic

Basic disassembler.

This is a target agnostic basic low-level machine code disassembler.

type pred = [
  1. | `Valid
    (*

    stop on first valid insn

    *)
  2. | Kind.t
]

predicate to drive the disassembler

val sexp_of_pred : pred -> Ppx_sexp_conv_lib.Sexp.t
val pred_of_sexp : Ppx_sexp_conv_lib.Sexp.t -> pred
val __pred_of_sexp__ : Ppx_sexp_conv_lib.Sexp.t -> pred

Basic types

type (+'a, +'k) insn

insn basic instruction.

See Insn module for a more detailed description.

@typevar 'a = asm | empty, denotes whether assembly representation is available for the given instruction.

@typevar 'k = kind | empty, denotes whether semantics kinds are available for the given instruction.

type (+'a, +'k) insns = (mem * ('a, 'k) insn option) list

insns is a list of pairs, where each pair consists of a memory region occupied by an instruction, and the instruction itself.

type empty

witnesses the absence of the information

type asm

witnesses a presence of the assembly string

type kinds

witnesses a presence of the semantic kinds

type full_insn = (asm, kinds) insn

abbreviate an instruction with full information.

val compare_full_insn : full_insn -> full_insn -> int
val sexp_of_full_insn : full_insn -> Ppx_sexp_conv_lib.Sexp.t
type ('a, 'k) t

Disassembler.

The 'a and 'k type variables specify disassembler modes of operation. In a process of disassembly it can store extra information that might be useful. Although, since storing it takes extra time and space, it is disabled by default.

The first type variable specifies whether storing assembly strings is enabled. It can be switched using store_asm, drop_asm functions. When it is enabled, then this type variable will be set to asm, and it will give an access to functions that returns this information. Otherwise, this type variable will be set to empty, thus stopping you from accessing assembler information.

The second type variable stands for kinds, i.e. to store or not to store extra information about instruction kind.

Note: at some points you can have an access to this information even if you don't enable it explicitly.

type (+'a, +'k, 's, 'r) state

Disassembler state.

Words of precaution: this state is valid only inside handlers functions of the run function. It shouldn't be stored anywhere. First two type variables are bound correspondingly to two variables of the disassmbler ('a,'k) t type. The last pair of type variables are bounded to input and output types of user functions. They are made different, so that a function can be run in an arbitrary monad. For simple cases, the can be made the same.

val register : Bap_core_theory.Theory.language -> (Bap_core_theory.Theory.target -> (empty, empty) t Core_kernel.Or_error.t) -> unit

register encoding constructor registers a disassembler constructor for the given encoding.

The constructor receives the target value that further specifies the details of the target system, e.g., a cpu model, limitiations on the instruction set, etc.

The constructor commonly uses create and passes the backend and target specific options to it. It can also use the custom function to create its own backend. Alternatively, the lookup function could be used to delegate the decoding to another encoder.

val lookup : Bap_core_theory.Theory.target -> Bap_core_theory.Theory.language -> (empty, empty) t Core_kernel.Or_error.t

lookup target encoding returns the disassembler for the specified target and encoding, creates one if necessary.

Returns an error if there is no constructor for the given encoding registered (via the register function) or if the constructor itself fails to create a disassembler.

val create : ?debug_level:int -> ?cpu:string -> ?attrs:string -> ?backend:string -> string -> (empty, empty) t Core_kernel.Or_error.t

create ?debug_level ?cpu ~backend target creates the disassmbler from one of the C-level backends.

The parameters are backend-specific and are commonly set by the target support plugins via the register function, therefore the create function should only be used to register a new target. Use lookup to get an appropriate disassembler for your target/encoding.

  • since 2.2.0 has the [attrs] parameter
val custom : Bap_core_theory.Theory.target -> Bap_core_theory.Theory.language -> (module Backend.S with type t = 'a) -> 'a -> (empty, empty) t

custom target encoding backend disassembler creates a custom backend for the given target and encoding.

This function is commonly called by the constructor function registered with the register function.

val with_disasm : ?debug_level:int -> ?cpu:string -> ?backend:string -> string -> f:((empty, empty) t -> 'a Core_kernel.Or_error.t) -> 'a Core_kernel.Or_error.t

with_disasm ?debug_level ?cpu ~backend ~f target creates a disassembler passing all options to create function and applies function f to it. Once f is evaluated the disassembler is closed with close function.

val close : (_, _) t -> unit

close d closes a disassembler d.

val store_asm : (_, 'k) t -> (asm, 'k) t

enables storing assembler information

val store_kinds : ('a, _) t -> ('a, kinds) t

enables storing instruction kinds information

val run : ?backlog:int -> ?stop_on:pred list -> ?invalid:(('a, 'k, 's, 'r) state -> mem -> 's -> 'r) -> ?stopped:(('a, 'k, 's, 'r) state -> 's -> 'r) -> ?hit:(('a, 'k, 's, 'r) state -> mem -> (asm, kinds) insn -> 's -> 'r) -> ('a, 'k) t -> return:('s -> 'r) -> init:'s -> mem -> 'r

run ?stop_on ?invalid ?stopped dis mem ~init ~return ~hit performs the recursive disassembly of the specified chunk of memory mem. The process of disassembly can be driven using the stop, step, back and jump functions, described later.

  • parameter backlog

    defines a size of history of states, that can be used for backtracking. Defaults to some positive natural number.

  • parameter stop_on

    defines a set of predicates that will be checked on each step to decide whether a disassembler should stop here and call the user-provided hit function, or it should continue. The decision is made according to the rule: if exists stop_on then stop, i.e., it there exists such predicate in a set of predicates, that evaluates to true, then stop the disassembly and pass the control to the user function hit. A few notes: only valid instructions can match predicates, and if the set is empty, then it always evaluates to false.

  • parameter init

    initial value of user data, that can be passed through handlers (cf., fold)

  • parameter return

    a function that lifts user data type 's to type 'r. It is useful when you need to perform disassembly in some monad, like Or_error, or Lwt. Otherwise, just use Fn.id function and assume that 's == 'r.

    The disassembler will invoke user provided callbacks. To each callback at least two parameters are passed: state and user_data. user_data is arbitrary data of type 's with which the folding over the memory is actually performed. state incapsulates the current state of the disassembler, and provides continuation functions, namely stop, next and back, that drives the process of disassembly. This functions are used to pass control back to the disassembler.

    stopped state user_data is called when there is no more data to disassemble. This handler is optional and defaults to stop.

    invalid state user_data is an optional handler that is called on each invalid instruction (i.e., a portion of data that is not a valid instruction), it defaults to step, i.e., to skipping.

    hit state mem insn data is called when one of the predicates specified by a user was hit. insn is actually the instruction that satisfies the predicate. mem is a memory region spanned by the instruction. data is a user data. insn can be queried for assembly string and kinds even if the corresponding modes are disabled.

val insn_of_mem : (_, _) t -> mem -> (mem * (asm, kinds) insn option * [ `left of mem | `finished ]) Core_kernel.Or_error.t

insn_of_mem dis mem performs a disassembly of one instruction from the a given memory region mem. Returns a tuple imem,insn,`left over where imem stands for a piece of memory consumed in a process of disassembly, insn can be Some ins if disassembly was successful, and None otherwise. `left over complements imem to original mem.

val addr : (_, _, _, _) state -> addr

current position of the disassembler

val preds : (_, _, _, _) state -> pred list

current set of predicates

val with_preds : ('a, 'k, 's, 'r) state -> pred list -> ('a, 'k, 's, 'r) state

updates the set of predicates, that rules the stop condition.

val insns : ('a, 'k, _, _) state -> ('a, 'k) insns

a queue of instructions disassembled in this step

val last : ('a, 'k, 's, 'r) state -> int -> ('a, 'k) insns

last s n returns last n instructions disassembled in this step. If there are less then n instructions, then returns a smaller list

val memory : (_, _, _, _) state -> mem

the memory region we're currently working on

val stop : (_, _, 's, 'r) state -> 's -> 'r

stop the disassembly and return the provided value.

val step : (_, _, 's, 'r) state -> 's -> 'r

continue disassembling from the current point. You can change a a set of predicates, before stepping next. If you want to continue from a different address, use jump

val jump : (_, _, 's, 'r) state -> mem -> 's -> 'r

jump to the specified memory and continue disassembly in it.

For example, if you want to jump to a specified address, and you're working in a Or_error monad, then you can:

view ~from:addr (mem state) >>= fun mem -> jump mem data

val back : (_, _, 's, 'r) state -> 's -> 'r

restarts last step.

module Insn : sig ... end

Basic instruction aka machine-specific instruction.

module Trie : sig ... end

Trie maps over instructions

val available_backends : unit -> string list

enumerates names of available disassembler backends.