Bap_byteweight.BytesDefault implementation that uses memory chunk as the domain.
include V2.S
  with type key = Bap.Std.mem
   and type corpus = Bap.Std.mem
   and type token := Bap.Std.wordinclude V1.S with type key = Bap.Std.mem with type corpus = Bap.Std.meminclude Bin_prot.Binable.S with type t := tval bin_size_t : t Bin_prot.Size.sizerval bin_write_t : t Bin_prot.Write.writerval bin_read_t : t Bin_prot.Read.readerval __bin_read_t__ : (int -> t) Bin_prot.Read.readerval bin_writer_t : t Bin_prot.Type_class.writerval bin_reader_t : t Bin_prot.Type_class.readerval bin_t : t Bin_prot.Type_class.ttype key = Bap.Std.memtype corpus = Bap.Std.memval create : unit -> tcreate () creates an empty instance of the byteweigth decider.
train decider ~max_length test corpus train the decider on the specified corpus. The test function classifies extracted substrings. The max_length parameter binds the maximum length of substrings.
val length : t -> intlength decider total amount of different substrings known to a decider.
next t ~length ~threshold data begin the next positive chunk.
Returns an offset that is greater than begin of the next longest substring up to the given length, for which h1 / (h0 + h1) > threshold.
This is a specialization of the next_if function from the extended V1.V2.S interface.
val pp : Stdlib.Format.formatter -> t -> unitpp ppf decider prints all known to decider chunks.
next_if t ~length ~f data begin the next chunk that f.
Finds the next offset greater than begin of a string of the given length for which there was an observing of a substring s with length n and statistics stats, such that f s n stats is true.
val fold : t -> init:'b -> f:('b -> Bap.Std.word list -> stats -> 'b) -> 'bfold t ~init ~f applies f to all chunks known to the decider.
val t : t Bap_byteweight_signatures.dataval find : t -> length:int -> threshold:float -> corpus -> Bap.Std.addr listfind mem ~length ~threshold corpus extract addresses of all memory chunks of the specified length, that were classified positively under given threshold.
val find_if : 
  t ->
  length:int ->
  f:(key -> int -> stats -> bool) ->
  corpus ->
  Bap.Std.addr listfind_if mem ~length ~f corpus finds all positively classfied chunks.
This is a generalization of the find function with an arbitrary thresholding function.
It scans the input corpus using the next_if function and collects all positive results.
val find_using_bayes_factor : 
  t ->
  min_length:int ->
  max_length:int ->
  float ->
  corpus ->
  Bap.Std.addr listfind_using_bayes_factor sigs mem classify functions starts using the Bayes factor procedure.
Returns a list of addresses in mem that have a signature in sigs with length min_length <= n <= max_length and the Bayes factor greater than threshold.
The Bayes factor is the ratio between posterior probabilities of two hypothesis, the h1 hypothesis that the given sequence of bytes occurs at the function start, and the dual h0 hypothesis,
k = P(h1|s)/P(h0|s) = (P(s|h1)/P(s|h0)) * (P(h1)/P(h0)),
where
P(hN|s) is the probability of the hypothesis P(hN) given the sequence of bytes s as the evidence,P(s|hN is the probability of the sequence of bytes s, given the hypothesis hN,P(hN) is the prior probability of the hypothesis hN.Given that m is the total number of occurences of a sequence of bytes s at the beginning of a function, and n is the total number of occurences of s in a middle of a function, we compute P(s|h1) and P(s|h0) as
P(s|h1) = m / (m+n),P(s|h0) = 1 - P(s|h1) = n / (m+n).Given that q is the total number of substrings in sigs of length min_length <= l <= max_length and p is the total number of substrings of the length l that start functions, we compute prior probabilities as,
P(h1) = p / q,P(h0) = 1 - P(h1).The resulting factor is a value 0 < k < infinity that quantify the strength of the evidence that a given substring gives in support of the hypothesis h1. Levels below 1 support hypothesis h0, levels above 1 give some support of h1, with the following interpretations (Kass and Raftery (1995)),
        Bayes Factor          Strength
        1 to 3.2              Weak
        3.2 to 10             Substantial
        10 to 100             Strong
        100 and greater       Decisiveval find_using_threshold : 
  t ->
  min_length:int ->
  max_length:int ->
  float ->
  corpus ->
  Bap.Std.addr listfind_using_threshold sigs mem classify function starts using a simple thresholding procedure.
Returns a list of addresses in mem that have a signature s in sigs with length min_length <= n <= max_length and the sample probability P1(s) of starting a function greater than threshold,
P1(s) = m / (m+n), where
s at the begining of a function in sigs;s not at the begining of a function in sigs.