Assignment 2
============

In the usual nomenclature, 32-bit and 64-bit (IEEE Standard 754) 
floating point numbers are referred to as single precision and 
double precision. Suppose that we invent an 8-bit quarter-precision
standard consisting of one sign bit, three exponent bits, and four 
mantissa bits:

.. list-table:: 

   * - s
     - eee
     - mmmm

The sign bit signals whether the number is positive (:math:`\mathsf{0}`) 
or negative (:math:`\mathsf{1}`), and the exponent uses an offset 
:math:`\mathsf{(011)_2 = 0+2+1 = 3}`. This means that the numbers 
:math:`\mathsf{1}` and :math:`\mathsf{-2}` are represented as

:math:`\mathsf{+1= + 1 \times 2^0 = +(1.000)_2 \times 2^{3-3} =}`

.. list-table:: 

   * - 0
     - 011
     - 0000

:math:`\mathsf{-2= - 1 \times 2^1 = +(1.000)_2 \times 2^{4-3} =}`

.. list-table:: 

   * - 1
     - 100
     - 0000

#. What is the largest, positive, finite number that can be 
   represented in this scheme. Report the decimal value and the 
   underlying bit pattern.

#. What is the largest (in magnitude) negative, finite number that 
   can be represented in this scheme. Report the decimal value and 
   the underlying bit pattern.   

#. What is the smallest, positive, *nondenormalized* nonzero 
   number that can be represented? Report the decimal value and 
   the underlying bit pattern.

#. Here is the bit pattern for zero:
   
   .. list-table:: 

      * - 0
        - 000
        - 0000

   Report the decimal value and bit pattern of the positive, 
   *denormalized* number that is closest to zero; also report the 
   largest, positive *denormalized* number.

#. Consider the following numbers:

   .. math::

      \begin{aligned}
      \mathsf{u = 2 = \bigl(1+0\bigr)\times 2^1} &= 
      \mathsf{(1.0000)_2 \times 2^{4-3}}\\
      \mathsf{w = 2.25 = \bigl(1+\tfrac{1}{8}\bigr)\times 2^1} &= 
      \mathsf{(1.0010)_2 \times 2^{4-3}}\\
      \mathsf{x = 4.25 = \bigl(1+\tfrac{1}{16}\bigr)\times 2^2} &= 
      \mathsf{(1.0001)_2 \times 2^{5-3}}\\
      \mathsf{y = 4.5 = \bigl(1+\tfrac{1}{8}\bigr) \times 2^2} &= 
      \mathsf{(1.0010)_2 \times 2^{5-3}}
      \end{aligned}
        
   Express each of :math:`\mathsf{u}`, :math:`\mathsf{w}`, 
   :math:`\mathsf{x}`, :math:`\mathsf{y}` in the 8-bit floating-point 
   scheme. Then determine the difference of their squares, computed 
   as :math:`\mathsf{x^2 - y^2}` and :math:`\mathsf{(x+y)(x-y)}`; and
   :math:`\mathsf{u^2 - w^2}` and :math:`\mathsf{(u+w)(u-w)}`.