During disassembly, BAP lifts native binary instructions to a language-agnostic, intermediate representation: the BAP intermediate Language (BIL). In this post we look specifically at traversing and transforming BIL using BAP’s API. Lifted BIL code is represented as an AST data structure that can be traversed and transformed for the purposes of analysis.
BAP provides a plethora of method hooks for traversing BIL ASTs according to the visitor design pattern. OCaml’s object-oriented features allow us to realize these visitor patterns in an elegant and powerful way; however, for the unfamiliar, usage tends to be harder to grasp. This post serves to provide self-contained, explanatory examples that eases the introduction to the BIL visitor and mapper capabilities.
A full template is provided for each example at the end of this post–it may be
used with the same example
binary from previous posts.
Visitors
A simple visitor
How do I use a BIL visitor to print BIL statements?
The following snippet accepts a list of bil_stmts
and simply visits each
statement in the list, printing it.
let visit_each_stmt bil_stmts =
(object inherit [unit] Bil.visitor
method! enter_stmt stmt state =
Format.printf "Visiting %s\n" (Stmt.to_string stmt)
end)#run bil_stmts ()
Output:
Visiting t_113 := RBP
Visiting RSP := RSP - 0x8:64
Visiting mem64 := mem64 with [RSP, el]:u64 <- t_113
...
Notes:
- We inherit the
Bil.visitor
class, which defines and provides our visitor callback hooks. - We make use of the
enter_stmt
callback. There are many such callbacks, for each language construct for BIL. [unit]
indicates the type of the state that we are passing along in our visitor; here, every time we enter a statement. This corresponds with the variablestate
forenter_stmt
which we override.- The
#run
invocation operates over astmt list
by default. - We pass unit
()
as the initial state. - The return type of enter_stmt is that of our state:
unit
.
A visitor with state
How do I collect all the the jump (direct) targets in a list of BIL statements?
let collect_jumps bil_stmts =
(object inherit [Word.t list] Bil.visitor
method! enter_int x state = if in_jmp then x :: state else state
end)#run bil_stmts []
Output (if we print the result):
Jmp: 0x40056E:64
Notes:
- This time, the visitor uses a
Word.t list
as user-supplied state which stores jump targets. - Our callback triggers every time we enter an int; essentially, a constant.
- We determine that this constant is a jump target with the
in_jmp
predicate: this state is implicitly included with each visit. See the class state in the API for other information passed along visits. - Instead of
in_jmp
, we could of course have used a different hook: the provided enter_jmp callback.
Mappers
A simple mapper
How do I use a BIL mapper to transform BIL code?
Our previous visitor inherited the BIL class 'a visitor
, where 'a
was our
inherited user-supplied state. But there’s also class mapper
. class mapper
doesn’t carry any user-supplied state with it. With mapper, you can transform
the BIL statements and expressions in the AST.
Let’s transform binary operations with some constant offset to an offset of 0x41
. For instance:
RSP := RSP - 0x8:64
becomes
RSP := RSP - 0x41:64
Specifically, if we encounter the binary operator +
or -
, and its second
operand is a constant, we rewrite the constant to be 0x41
.
let offset_41_mapper bil_stmts =
(object inherit Bil.mapper
method! map_binop operator operand1 operand2 =
match operator,operand2 with
| Bil.PLUS,Bil.Int offset
| Bil.MINUS,Bil.Int offset ->
let new_operand2 = Bil.int (Word.of_int ~width:64 0x41) in
Bil.binop operator operand1 new_operand2
| _ -> Bil.binop operator operand1 operand2
end)#run bil_stmts in
Output:
t_113 := RBP
RSP := RSP - 0x41:64
mem64 := mem64 with [RSP, el]:u64 <- t_113
Notes:
- We inherit
Bil.mapper
. - We use
map_binop
instead ofenter_binop
. - We pattern-match against the BIL operators
PLUS
andMINUS
, and pattern match the second operand againstBil.Int
: if it matches, we rewrite the second operand and construct a newBil.binop
expression. - If we reach the
_
case for pattern matching, nothing changes: we reconstruct the original expression using the same operator and operands. - No user-state is passed a long. The return type for each expression is
exp
.
Customization
A custom visitor
What is a custom visitor and how can I make one?
We can create our own subclassing visitor, i.e., we don’t have to use class ‘a visitor or class mapper. For instance, we can pass our own implicit state a long with a custom visitor (and still allow anyone else to define a user-supplied state variable). Here’s some quick syntax for defining your own visitor:
class ['a] custom_visitor = object
inherit ['a * int list] Bil.visitor
end
Let’s define a custom_visit
function:
let custom_visit bil_stmts =
(object inherit [string] custom_visitor
method! enter_stmt stmt state =
Format.printf "Visiting %s\n" (Stmt.to_string stmt);
("still-user-defined",[3;2;1])
end)#run bil_stmts ("user-defined",[1;2;3])
Output:
Visiting t_113 := RBP
Visiting RSP := RSP - 0x8:64
Visiting mem64 := mem64 with [RSP, el]:u64 <- t_113
Notes:
- Our visitor inherits only the type of our user-defined state: a string.
- However, the inherited
state
variable inenter_stmt
has typestring * int list
: the type defined in ourcustom_visitor
class. - The
int list
is passed along any visitor we create usingcustom_vistor
. This is useful if theint list
state is changed by another function as we fold over BIL (for instance, for tracking depth in the AST, we might create adepth_visitor
that maintains a depth of the current traversal, without the user having to define their own variable for doing so).
Wrap-up
This post highlighted some basics of BIL visitor and mapper functionality, but there is a lot more to discover. For example, here are further extensions that are possible within the visitor framework:
- Our examples used only a single callback method; of course, we can have multiple visit methods inside our visitor object.
- Our examples have all invoked the traversal with
#run
. However, we can in fact visit any particular part of the BIL AST by replacing#run
in previous examples with #enter_stmt, #enter_exp, #enter_binop, and so on: the only condition is that we supply these visits with the correct type. So, where#run
accepts astmt list
,enter_exp
expects anexp
. - Our examples made use of
enter_...
visitors. However, every language construct also has additionalvisit_...
andleave_...
directives, depending on the need. - There are a host of BIL
iterators
that can be used in all sorts of imaginative ways: We can iterate, map, fold
(and many more!) over BIL statements. For example, we can supply
Bil.fold
with a visitor object which is run over the AST with our owninit
state. - Many interesting BIL transformers exist, for example, a constant folder and expression substituter.
- Have a look at the finder if you want a BIL visitor that searches for specific patterns in the AST.
Examples template
Visitor and mapper examples template
#use "topfind";;
#require "bap.top";;
open Core_kernel.Std
open Bap.Std
open Or_error
let main () =
Project.from_file Sys.argv.(1) >>= fun project ->
let normalize = String.filter ~f:(function
| '\t' | '{' | '}' -> false
| _ -> true) in
let syms = Project.symbols project in
let main_fn = match Symtab.find_by_name syms "main" with
| Some fn -> fn
| None -> failwith "Could not find function main in symbol table"
in
let entry_block = Symtab.entry_of_fn main_fn in
let block_insns = Block.insns entry_block in
(* visit_each_stmt *)
let visit_each_stmt bil_stmts =
(object inherit [unit] Bil.visitor
method! enter_stmt stmt state =
Format.printf "Visiting %s\n" (Stmt.to_string stmt)
end)#run bil_stmts ()
in
List.iter block_insns ~f:(fun (_,insn) ->
let bil = Insn.bil insn in
visit_each_stmt bil);
(* collect_jumps *)
let collect_jumps bil_stmts =
(object inherit [Word.t list] Bil.visitor
method! enter_int x state = if in_jmp then x :: state else state
end)#run bil_stmts []
in
List.iter block_insns ~f:(fun (_,insn) ->
let bil = Insn.bil insn in
collect_jumps bil |> List.iter
~f:(fun word -> Format.printf "Jmp: %a\n" Word.pp word));
(* offset_41_mapper *)
let offset_41_mapper bil_stmts =
(object inherit Bil.mapper
method! map_binop operator operand1 operand2 =
match operator,operand2 with
| Bil.PLUS,Bil.Int offset
| Bil.MINUS,Bil.Int offset ->
let new_operand2 = Bil.int (Word.of_int ~width:64 0x41) in
Bil.binop operator operand1 new_operand2
| _ -> Bil.binop operator operand1 operand2
end)#run bil_stmts in
List.iter block_insns ~f:(fun (_,insn) ->
let bil = Insn.bil insn in
let new_bil =
offset_41_mapper bil in
Format.printf "41-Bil: %s\n" (Bil.to_string new_bil |> normalize));
return ()
let () =
try main ()
|> function
| Ok o -> ()
| Error e -> Format.printf "BAP error: %s\n" @@ Error.to_string_hum e
with
| Invalid_argument _ ->
Format.printf "Please specify a file on the command line\n"
Custom visitor template
#use "topfind";;
#require "bap.top";;
open Core_kernel.Std
open Bap.Std
open Or_error
(* custom_visitor *)
class ['a] custom_visitor = object
inherit ['a * int list] Bil.visitor
end
let main () =
Project.from_file Sys.argv.(1) >>= fun project ->
(* custom_visit *)
let custom_visit bil_stmts =
(object inherit [string] custom_visitor
method! enter_stmt stmt state =
Format.printf "Visiting %s\n" (Stmt.to_string stmt);
("still-user-defined",[3;2;1])
end)#run bil_stmts ("user-defined",[1;2;3])
in
let syms = Project.symbols project in
let main_fn = match Symtab.find_by_name syms "main" with
| Some fn -> fn
| None -> failwith "Could not find function main in symbol table"
in
let entry_block = Symtab.entry_of_fn main_fn in
let block_insns = Block.insns entry_block in
List.iter block_insns ~f:(fun (mem,insn) ->
let bil = Insn.bil insn in
custom_visit bil |> Pervasives.ignore);
return ()
let () =
try main ()
|> function
| Ok o -> ()
| Error e -> Format.printf "BAP error: %s\n" @@ Error.to_string_hum e
with
| Invalid_argument _ ->
Format.printf "Please specify a file on the command line\n"