Assignment 2 ============ In the usual nomenclature, 32-bit and 64-bit (IEEE Standard 754) floating point numbers are referred to as single precision and double precision. Suppose that we invent an 8-bit quarter-precision standard consisting of one sign bit, three exponent bits, and four mantissa bits: .. list-table:: * - s - eee - mmmm The sign bit signals whether the number is positive (:math:`\mathsf{0}`) or negative (:math:`\mathsf{1}`), and the exponent uses an offset :math:`\mathsf{(011)_2 = 0+2+1 = 3}`. This means that the numbers :math:`\mathsf{1}` and :math:`\mathsf{-2}` are represented as :math:`\mathsf{+1= + 1 \times 2^0 = +(1.000)_2 \times 2^{3-3} =}` .. list-table:: * - 0 - 011 - 0000 :math:`\mathsf{-2= - 1 \times 2^1 = +(1.000)_2 \times 2^{4-3} =}` .. list-table:: * - 1 - 100 - 0000 #. What is the largest, positive, finite number that can be represented in this scheme. Report the decimal value and the underlying bit pattern. #. What is the largest (in magnitude) negative, finite number that can be represented in this scheme. Report the decimal value and the underlying bit pattern. #. What is the smallest, positive, *nondenormalized* nonzero number that can be represented? Report the decimal value and the underlying bit pattern. #. Here is the bit pattern for zero: .. list-table:: * - 0 - 000 - 0000 Report the decimal value and bit pattern of the positive, *denormalized* number that is closest to zero; also report the largest, positive *denormalized* number. #. Consider the following numbers: .. math:: \begin{aligned} \mathsf{u = 2 = \bigl(1+0\bigr)\times 2^1} &= \mathsf{(1.0000)_2 \times 2^{4-3}}\\ \mathsf{w = 2.25 = \bigl(1+\tfrac{1}{8}\bigr)\times 2^1} &= \mathsf{(1.0010)_2 \times 2^{4-3}}\\ \mathsf{x = 4.25 = \bigl(1+\tfrac{1}{16}\bigr)\times 2^2} &= \mathsf{(1.0001)_2 \times 2^{5-3}}\\ \mathsf{y = 4.5 = \bigl(1+\tfrac{1}{8}\bigr) \times 2^2} &= \mathsf{(1.0010)_2 \times 2^{5-3}} \end{aligned} Express each of :math:`\mathsf{u}`, :math:`\mathsf{w}`, :math:`\mathsf{x}`, :math:`\mathsf{y}` in the 8-bit floating-point scheme. Then determine the difference of their squares, computed as :math:`\mathsf{x^2 - y^2}` and :math:`\mathsf{(x+y)(x-y)}`; and :math:`\mathsf{u^2 - w^2}` and :math:`\mathsf{(u+w)(u-w)}`.