`Theory.Fbasic`

The Basic Theory of Floating-Points.

floating-point numbers represent a finite subset of the set of real numbers. Some formats also extend this set with special values to represent infinities or error conditions. This, in general, exceeds the scope of the floating-point theory, however the theory includes predicates with domains that potentially may include this special numbers, e.g., `is_nan`

. For floating-point formats that do not support special values, such predicates will become constant functions.

All operations in the Floating-Point theory are defined in terms of operations on real numbers. Since floating-point numbers represent only a subset of the real set, denotations select a number from the set of numbers of the floating-point sort using concrete rules expressed in terms of rounding modes. The rounding mode is a parameter of many operations, denoted with a term of sort `rmode`

.

`float s x`

interprets `x`

as a floating-point number.

`fbits x`

is a bitvector representation of the floating-point number `x`

.

`is_finite x`

holds if `x`

represents a finite number.

A floating-point number is finite if it represents a number from the set of real numbers `R`

.

The predicate always holds for formats in which only finite floating-point numbers are representable.

`is_nan x`

holds if `x`

represents a not-a-number.

A floating-point value is not-a-number if it is neither finite nor infinite number.

The predicated never holds for formats that represent only numbers.

`is_inf x`

holds if `x`

represents an infinite number.

Never holds for formats in which infinite numbers are not representable.

`is_fpos x`

holds if `x`

represents a positive number.

The denotation is not defined if `x`

represents zero.

`is_fneg x`

hold if `x`

represents a negative number.

The denotation is not defined if `x`

represents zero.

Many operations in the Theory of Floating-Point numbers are defined using the rounding mode parameter.

The rounding mode gives a precise meaning to the phrase "the closest floating-point number to `x`

", where `x`

is a real number. When `x`

is not representable by the given format, some other number `x'`

is selected based on rules of the rounding mode.

`val rne : rmode`

rounding to nearest, ties to even.

The denotation is the floating-point number nearest to the denoted real number. If the two nearest numbers are equally close, then the one with an even least significant digit shall be selected. The denotation is not defined, if both numbers have an even least significant digit.

`val rna : rmode`

rounding to nearest, ties away.

The denotation is the floating-point number nearest to the denoted real number. If the two nearest numbers are equally close, then the one with larger magnitude shall be selected.

`val rtp : rmode`

rounding towards positive.

The denotation is the floating-point number that is nearest but no less than the denoted real number.

`val rtn : rmode`

rounding towards negative.

The denotation is the floating-point number that is nearest but not greater than the denoted real number.

`val rtz : rmode`

rounding towards zero.

The denotation is the floating-point number that is nearest but not greater in magnitude than the denoted real number.

`val cast_float : 'f Float.t Value.sort -> rmode -> 'a bitv -> 'f float`

`cast_float s m x`

is the closest to `x`

floating number of sort `s`

.

The bitvector `x`

is interpreted as an unsigned integer in the two-complement form.

`val cast_sfloat : 'f Float.t Value.sort -> rmode -> 'a bitv -> 'f float`

`cast_sfloat s rm x`

is the closest to `x`

floating-point number of sort `x`

.

The bitvector `x`

is interpreted as a signed integer in the two-complement form.

`val cast_int : 'a Bitv.t Value.sort -> rmode -> 'f float -> 'a bitv`

`cast_int s rm x`

returns an integer closest to `x`

.

The resulting bitvector should be interpreted as an unsigned two-complement integer.

`val cast_sint : 'a Bitv.t Value.sort -> rmode -> 'f float -> 'a bitv`

`cast_sint s rm x`

returns an integer closest to `x`

.

The resulting bitvector should be interpreted as a signed two-complement integer.

`fadd m x y`

is the floating-point number closest to `x+y`

.

`fsub m x y`

is the floating-point number closest to `x-y`

.

`fmul m x y`

is the floating-point number closest to `x*y`

.

`fdiv m x y`

is the floating-point number closest to `x/y`

.

`fsqrt m x`

returns the closest floating-point number to `r`

, where `r`

is such number that `r*r`

is equal to `x`

.

If `x`

is a negative finite non-zero number, or is `nan`

, or is the negative infinity, then `sqrt x`

is `nan`

. If `x`

is the positive infinity then `fsqrt x`

is the positive infinity.

`fdiv m x y`

is the floating-point number closest to the remainder of `x/y`

.

`fmad m x y z`

is the floating-point number closest to `x * y + z`

.

`fround m x`

is the floating-point number closest to `x`

rounded to an integral, using the rounding mode `m`

.

`val fconvert : 'f Float.t Value.sort -> rmode -> _ float -> 'f float`

`fconvert f r x`

is the closest to `x`

floating number in format `f`

.

`fsucc m x`

is the least floating-point number representable in (sort x) that is greater than `x`

.

`fsucc m x`

is the greatest floating-point number representable in (sort x) that is less than `x`

.