During disassembly, BAP lifts native binary instructions to a language-agnostic, intermediate representation: the BAP intermediate Language (BIL). In this post we look specifically at traversing and transforming BIL using BAP’s API. Lifted BIL code is represented as an AST data structure that can be traversed and transformed for the purposes of analysis.
BAP provides a plethora of method hooks for traversing BIL ASTs according to the visitor design pattern. OCaml’s object-oriented features allow us to realize these visitor patterns in an elegant and powerful way; however, for the unfamiliar, usage tends to be harder to grasp. This post serves to provide self-contained, explanatory examples that eases the introduction to the BIL visitor and mapper capabilities.
A full template is provided for each example at the end of this post–it may be
used with the same example binary from previous posts.
Visitors
A simple visitor
How do I use a BIL visitor to print BIL statements?
The following snippet accepts a list of bil_stmts and simply visits each
statement in the list, printing it.
let visit_each_stmt bil_stmts =
  (object inherit [unit] Bil.visitor
    method! enter_stmt stmt state =
      Format.printf "Visiting %s\n" (Stmt.to_string stmt)
    end)#run bil_stmts ()
Output:
Visiting t_113 := RBP
Visiting RSP := RSP - 0x8:64
Visiting mem64 := mem64 with [RSP, el]:u64 <- t_113
...
Notes:
- We inherit the Bil.visitorclass, which defines and provides our visitor callback hooks.
- We make use of the enter_stmtcallback. There are many such callbacks, for each language construct for BIL.
- [unit]indicates the type of the state that we are passing along in our visitor; here, every time we enter a statement. This corresponds with the variable- statefor- enter_stmtwhich we override.
- The #runinvocation operates over astmt listby default.
- We pass unit ()as the initial state.
- The return type of enter_stmt is that of our state: unit.
A visitor with state
How do I collect all the the jump (direct) targets in a list of BIL statements?
let collect_jumps bil_stmts =
  (object inherit [Word.t list] Bil.visitor
    method! enter_int x state = if in_jmp then x :: state else state
  end)#run bil_stmts []
Output (if we print the result):
Jmp: 0x40056E:64
Notes:
- This time, the visitor uses a Word.t listas user-supplied state which stores jump targets.
- Our callback triggers every time we enter an int; essentially, a constant.
- We determine that this constant is a jump target with the in_jmppredicate: this state is implicitly included with each visit. See the class state in the API for other information passed along visits.
- Instead of in_jmp, we could of course have used a different hook: the provided enter_jmp callback.
Mappers
A simple mapper
How do I use a BIL mapper to transform BIL code?
Our previous visitor inherited the BIL class 'a visitor, where 'a was our
inherited user-supplied state. But there’s also class mapper. class mapper
doesn’t carry any user-supplied state with it. With mapper, you can transform
the BIL statements and expressions in the AST.
Let’s transform binary operations with some constant offset to an offset of 0x41. For instance:
RSP := RSP - 0x8:64
becomes
RSP := RSP - 0x41:64
Specifically, if we encounter the binary operator + or -, and its second
operand is a constant, we rewrite the constant to be 0x41.
let offset_41_mapper bil_stmts =
  (object inherit Bil.mapper
    method! map_binop operator operand1 operand2 =
      match operator,operand2 with
      | Bil.PLUS,Bil.Int offset
      | Bil.MINUS,Bil.Int offset ->
        let new_operand2 = Bil.int (Word.of_int ~width:64 0x41) in
        Bil.binop operator operand1 new_operand2
      | _ -> Bil.binop operator operand1 operand2
  end)#run bil_stmts in
Output:
t_113 := RBP
RSP := RSP - 0x41:64
mem64 := mem64 with [RSP, el]:u64 <- t_113
Notes:
- We inherit Bil.mapper.
- We use map_binopinstead ofenter_binop.
- We pattern-match against the BIL operators PLUSandMINUS, and pattern match the second operand againstBil.Int: if it matches, we rewrite the second operand and construct a newBil.binopexpression.
- If we reach the _case for pattern matching, nothing changes: we reconstruct the original expression using the same operator and operands.
- No user-state is passed a long. The return type for each expression is exp.
Customization
A custom visitor
What is a custom visitor and how can I make one?
We can create our own subclassing visitor, i.e., we don’t have to use class ‘a visitor or class mapper. For instance, we can pass our own implicit state a long with a custom visitor (and still allow anyone else to define a user-supplied state variable). Here’s some quick syntax for defining your own visitor:
class ['a] custom_visitor = object
   inherit ['a * int list] Bil.visitor
end
Let’s define a custom_visit function:
let custom_visit bil_stmts =
  (object inherit [string] custom_visitor
    method! enter_stmt stmt state =
      Format.printf "Visiting %s\n" (Stmt.to_string stmt);
      ("still-user-defined",[3;2;1])
  end)#run bil_stmts ("user-defined",[1;2;3])
Output:
Visiting t_113 := RBP
Visiting RSP := RSP - 0x8:64
Visiting mem64 := mem64 with [RSP, el]:u64 <- t_113
Notes:
- Our visitor inherits only the type of our user-defined state: a string.
- However, the inherited statevariable inenter_stmthas typestring * int list: the type defined in ourcustom_visitorclass.
- The int listis passed along any visitor we create usingcustom_vistor. This is useful if theint liststate is changed by another function as we fold over BIL (for instance, for tracking depth in the AST, we might create adepth_visitorthat maintains a depth of the current traversal, without the user having to define their own variable for doing so).
Wrap-up
This post highlighted some basics of BIL visitor and mapper functionality, but there is a lot more to discover. For example, here are further extensions that are possible within the visitor framework:
- Our examples used only a single callback method; of course, we can have multiple visit methods inside our visitor object.
- Our examples have all invoked the traversal with #run. However, we can in fact visit any particular part of the BIL AST by replacing#runin previous examples with #enter_stmt, #enter_exp, #enter_binop, and so on: the only condition is that we supply these visits with the correct type. So, where#runaccepts astmt list,enter_expexpects anexp.
- Our examples made use of enter_...visitors. However, every language construct also has additionalvisit_...andleave_...directives, depending on the need.
- There are a host of BIL
iterators
that can be used in all sorts of imaginative ways: We can iterate, map, fold
(and many more!) over BIL statements. For example, we can supply Bil.foldwith a visitor object which is run over the AST with our owninitstate.
- Many interesting BIL transformers exist, for example, a constant folder and expression substituter.
- Have a look at the finder if you want a BIL visitor that searches for specific patterns in the AST.
Examples template
Visitor and mapper examples template
#use "topfind";;
#require "bap.top";;
open Core_kernel.Std
open Bap.Std
open Or_error
let main () =
  Project.from_file Sys.argv.(1) >>= fun project ->
  let normalize = String.filter ~f:(function
      | '\t' | '{' | '}' -> false
      | _ -> true) in
  let syms = Project.symbols project in
  let main_fn = match Symtab.find_by_name syms "main" with
    | Some fn -> fn
    | None -> failwith "Could not find function main in symbol table"
  in
  let entry_block = Symtab.entry_of_fn main_fn in
  let block_insns = Block.insns entry_block in
  (* visit_each_stmt *)
  let visit_each_stmt bil_stmts =
    (object inherit [unit] Bil.visitor
      method! enter_stmt stmt state =
        Format.printf "Visiting %s\n" (Stmt.to_string stmt)
    end)#run bil_stmts ()
  in
  List.iter block_insns ~f:(fun (_,insn) ->
      let bil = Insn.bil insn in
      visit_each_stmt bil);
  (* collect_jumps *)
  let collect_jumps bil_stmts =
    (object inherit [Word.t list] Bil.visitor
      method! enter_int x state = if in_jmp then x :: state else state
    end)#run bil_stmts []
  in
  List.iter block_insns ~f:(fun (_,insn) ->
      let bil = Insn.bil insn in
      collect_jumps bil |> List.iter
        ~f:(fun word -> Format.printf "Jmp: %a\n" Word.pp word));
  (* offset_41_mapper *)
  let offset_41_mapper bil_stmts =
    (object inherit Bil.mapper
      method! map_binop operator operand1 operand2 =
        match operator,operand2 with
        | Bil.PLUS,Bil.Int offset
        | Bil.MINUS,Bil.Int offset ->
          let new_operand2 = Bil.int (Word.of_int ~width:64 0x41) in
          Bil.binop operator operand1 new_operand2
        | _ -> Bil.binop operator operand1 operand2
    end)#run bil_stmts in
  List.iter block_insns ~f:(fun (_,insn) ->
      let bil = Insn.bil insn in
      let new_bil =
        offset_41_mapper bil in
      Format.printf "41-Bil: %s\n" (Bil.to_string new_bil |> normalize));
  return ()
  let () =
    try main ()
        |> function
        | Ok o -> ()
        | Error e -> Format.printf "BAP error: %s\n" @@ Error.to_string_hum e
    with
    | Invalid_argument _ ->
      Format.printf "Please specify a file on the command line\n"
Custom visitor template
#use "topfind";;
#require "bap.top";;
open Core_kernel.Std
open Bap.Std
open Or_error
(* custom_visitor *)
class ['a] custom_visitor = object
   inherit ['a * int list] Bil.visitor
end
let main () =
  Project.from_file Sys.argv.(1) >>= fun project ->
  (* custom_visit *)
  let custom_visit bil_stmts =
    (object inherit [string] custom_visitor
      method! enter_stmt stmt state =
        Format.printf "Visiting %s\n" (Stmt.to_string stmt);
        ("still-user-defined",[3;2;1])
    end)#run bil_stmts ("user-defined",[1;2;3])
  in
  let syms = Project.symbols project in
  let main_fn = match Symtab.find_by_name syms "main" with
    | Some fn -> fn
    | None -> failwith "Could not find function main in symbol table"
  in
  let entry_block = Symtab.entry_of_fn main_fn in
  let block_insns = Block.insns entry_block in
  List.iter block_insns ~f:(fun (mem,insn) ->
      let bil = Insn.bil insn in
      custom_visit bil |> Pervasives.ignore);
  return ()
  let () =
    try main ()
        |> function
        | Ok o -> ()
        | Error e -> Format.printf "BAP error: %s\n" @@ Error.to_string_hum e
    with
    | Invalid_argument _ ->
      Format.printf "Please specify a file on the command line\n"