Assignment 2

In the usual nomenclature, 32-bit and 64-bit (IEEE Standard 754) floating point numbers are referred to as single precision and double precision. Suppose that we invent an 8-bit quarter-precision standard consisting of one sign bit, three exponent bits, and four mantissa bits:

s

eee

mmmm

The sign bit signals whether the number is positive (\(\mathsf{0}\)) or negative (\(\mathsf{1}\)), and the exponent uses an offset \(\mathsf{(011)_2 = 0+2+1 = 3}\). This means that the numbers \(\mathsf{1}\) and \(\mathsf{-2}\) are represented as

\(\mathsf{+1= + 1 \times 2^0 = +(1.000)_2 \times 2^{3-3} =}\)

0

011

0000

\(\mathsf{-2= - 1 \times 2^1 = +(1.000)_2 \times 2^{4-3} =}\)

1

100

0000

  1. What is the largest, positive, finite number that can be represented in this scheme. Report the decimal value and the underlying bit pattern.

  2. What is the largest (in magnitude) negative, finite number that can be represented in this scheme. Report the decimal value and the underlying bit pattern.

  3. What is the smallest, positive, nondenormalized nonzero number that can be represented? Report the decimal value and the underlying bit pattern.

  4. Here is the bit pattern for zero:

    0

    000

    0000

    Report the decimal value and bit pattern of the positive, denormalized number that is closest to zero; also report the largest, positive denormalized number.

  5. Consider the following numbers:

    \[\begin{aligned} \mathsf{u = 2 = \bigl(1+0\bigr)\times 2^1} &= \mathsf{(1.0000)_2 \times 2^{4-3}}\\ \mathsf{w = 2.25 = \bigl(1+\tfrac{1}{8}\bigr)\times 2^1} &= \mathsf{(1.0010)_2 \times 2^{4-3}}\\ \mathsf{x = 4.25 = \bigl(1+\tfrac{1}{16}\bigr)\times 2^2} &= \mathsf{(1.0001)_2 \times 2^{5-3}}\\ \mathsf{y = 4.5 = \bigl(1+\tfrac{1}{8}\bigr) \times 2^2} &= \mathsf{(1.0010)_2 \times 2^{5-3}} \end{aligned}\]

    Express each of \(\mathsf{u}\), \(\mathsf{w}\), \(\mathsf{x}\), \(\mathsf{y}\) in the 8-bit floating-point scheme. Then determine the difference of their squares, computed as \(\mathsf{x^2 - y^2}\) and \(\mathsf{(x+y)(x-y)}\); and \(\mathsf{u^2 - w^2}\) and \(\mathsf{(u+w)(u-w)}\).