Assignment 3
============

Linear congruential generator
-----------------------------

In class, we discussed the linear congruential generator scheme. 
Starting from a single initial seed :math:`\mathsf{x_0}` (a 
non-negative integer), the sequence unfolds according to 
:math:`\mathsf{x_{n+1} = {(ax_n + c) \mod m}}`. The most convenient 
choice of modulus is :math:`\mathsf{m = 2^{64}}`, such that the 
modulus operation is effected automatically as overflow on a 64-bit
fixed-width binary register.

In that case, if we take ``x`` to be of type ``UInt64``, each
next number in the sequence is found with a single multiplication 
and addition: ``x = a*x + c``. The trick is to choose good values 
of the multiplier and increment. Two popular choices are shown in 
the table.

.. list-table:: 
   :widths: 12 18 18
   :header-rows: 1

   * - scheme
     - multiplier (:math:`\mathsf{a}`)
     - increment (:math:`\mathsf{c}`)
     
   * - LLNL/SPRNG 
     - 2862933555777941757
     - 3037000493
     
   * - Knuth (MMIX)
     - 6364136223846793005
     - 1442695040888963407

These numbers are designed to satisfy the Hull–Dobell Theorem 
and thus ensure a repeat time of :math:`\mathsf{2^{64}}`, the 
maximum period. A known deficiency is that the least significant 
bits are somewhat less random than the most significant bits.

A convenient way to write this in Julia is with a *closure*, in 
which a function returns a function that captures a set of 
parameters provided by the user as arguments.

.. code-block:: Julia

   function create_lcg(a0, c0, x0)
      a = a0
      c = c0
      x = x0
      return function()
         x = a*x + c
         return x
      end
   end
   
   my_seed = UInt64(1975)
   lcg = create_lcg(UInt64(2862933555777941757), UInt64(3037000493), my_seed)

   # generate pseudorandom 64-bit sequences
   lcg()
   => 0x8516d481892a6308
   lcg()
   => 0x0ac522dfb57e5215
   # interpret as signed, two's complement integers
   reinterpret(Int64,lcg())
   => 712743970554937838
   reinterpret(Int64,lcg())
   => -6940842903748281501
   # rescale to produce numbers in the closed interval [0,1]
   lcg()/(UInt64(0)-1)
   => 0.660163672390821
   lcg()/(UInt64(0)-1)
   => 0.037941720613110005

1. Explain why the period of the linear congruential generator 
   described above can never exceed :math:`\mathsf{2^{64}}`.

2. It's sad/amusing to read about
   `IBM's RANDU generator <https://en.wikipedia.org/wiki/RANDU>`_,
   a notorious linear congruential generator developed by the 
   company in the 1960s for use on its mainframes. It is widely
   considered one of the most flawed random number generators 
   ever created. Many scientific results from that era that were 
   based on Monte Carlo and stochastic methods are suspect.

   Make an instance of RANDU using 
   ``randu = create_lcg(UInt32(65539),UInt32(0), UInt32(seed))}``. 
   Generate 3000 numbers and output them to a file ``randu.dat`` 
   in three-column format as follows:

   .. list-table:: 
      :widths: 10 10 10
      :header-rows: 0

      * - :math:`\mathsf{x_0}`
        - :math:`\mathsf{x_1}`
        - :math:`\mathsf{x_2}`

      * - :math:`\mathsf{x_3}`
        - :math:`\mathsf{x_4}`
        - :math:`\mathsf{x_5}`
   
      * - :math:`\mathsf{x_6}`
        - :math:`\mathsf{x_7}`
        - :math:`\mathsf{x_8}`

      * - 
        - :math:`\vdots`
        - 
   
      * - :math:`\mathsf{x_{2997}}`
        - :math:`\mathsf{x_{2998}}`
        - :math:`\mathsf{x_{2999}}`
   
   
   In ``gnuplot``, type 
   ``splot 'randu.dat' with points pointtype 7 pointsize 0.5``, 
   which can be abbreviated to ``splot 'randu.dat' w p pt 7 ps 0.5``.
   Rotate the three-dimensional image until you can identify the 
   flaw in the randomness. Include a screenshot or pdf of the revealing plot.

Manipulating floating-point representations at the bit Level
------------------------------------------------------------

In Julia the ``reinterpret`` function can be used to read 
an integer's bit pattern as a floating-point type. This is 
distinct from standard *type casting*, which tries to preserve value.

.. code-block:: Julia

   # Reinterpret the integer 4607182418800017408 as a Float64
   UInt64(4607182418800017408)
   => 0x3ff0000000000000
   Int64(0x3ff0000000000000)
   => 4607182418800017408
   reinterpret(Float64, 0x3ff0000000000000) 
   => 1.0
   # On the other hand, casting gives
   Float64(0x3ff0000000000000)
   # = > 4.6071824188000174e18

Here, ``UInt64`` and ``Float64`` are the types we'd call
``unsigned long int`` (or ``uint64_t``) and ``double`` in C++.

Recall that the 64-bit IEEE Standard 754 uses a bit
pattern with 1 sign bit, 11 exponent bits, and 52 mantissa bits:

.. list-table:: 
   :header-rows: 0
   
   * - :math:`\mathsf{s}`
     - :math:`\mathsf{e_{10}}`
     - :math:`\mathsf{e_{9}}`
     - :math:`\cdots`
     - :math:`\mathsf{e_{0}}`
     - :math:`\mathsf{m_{51}}`
     - :math:`\mathsf{m_{50}}`
     - :math:`\cdots`
     - :math:`\mathsf{m_{0}}`
      
3. Here, you will generate pseudorandom numbers in the 
   range :math:`\mathsf{[0,1)}`. Furthermore, you will implement 
   a shuffling scheme that renders the numbers more randomized.
   Start by populating an array ``num_bank`` with 1024 entries, 
   each a 64-bit integer produced by a high-quality linear 
   congruential generator. Once this initialization step is 
   complete, carry out the following steps to produce the next number.
   Generate a 64-bit unsigned integer
   :math:`\mathsf{x = b_{63} b_{62} \cdots b_2 b_1 b_0}`.
   Ignore the two least significant bits 
   (:math:`\mathsf{b_0}` and :math:`\mathsf{b_1}`). Harvest the
   next 10 bits to create a number 
   :math:`\mathsf{k = 1 + b_{11}b_{10}\cdots b_4b_3b_2}`
   (with :math:`\mathsf{1 \le k \le 1024}`), which we will treat 
   as a (one-based) array index. Swap the values of 
   :math:`\mathsf{x}` and the :math:`\mathsf{k}\!` th element of
   `num_bank`. From the new :math:`\mathsf{x}`, create a floating 
   point number :math:`\mathsf{1 \le f < 2}` by transcribing the 
   52 most significant bits as follows:
   
   .. list-table:: 
      :header-rows: 0
   
      * - :math:`\mathsf{0}`
        - :math:`\mathsf{0}`
        - :math:`\mathsf{1}`
        - :math:`\mathsf{1}`
        - :math:`\mathsf{1}`
        - :math:`\cdots`
        - :math:`\mathsf{1}`
        - :math:`\mathsf{1}`                                                  
        - :math:`\mathsf{b_{63}}`
        - :math:`\mathsf{b_{62}}`
        - :math:`\cdots`
        - :math:`\mathsf{b_{14}}`
        - :math:`\mathsf{b_{13}}`
        - :math:`\mathsf{b_{12}}`
        
   Then return :math:`f-1`.

   Examine the program ``shuffled-lcg.jl``. Complete the 
   missing return value so that each call to ``next_rnd()`` 
   produces a floating-point number in the interval 
   :math:`\mathsf{[0,1)}` from the highest 52 bits in 
   ``x`` following the shuffling scheme described above.

   .. code-block:: Julia
      
      num_bank = Vector{UInt64}(undef, 1024) 
      for i in 1:1024
         num_bank[i] = lcg()
      end

      function next_rnd()
         x = lcg()
         k = x & ((1 << 12) - 1)
         k = (k >> 2) + 1
         # swap values
         num_bank[k], x = x, num_bank[k]
         return ????
      end

   All you need to report for this question is a valid 
   Julia expression for the correct return value. *Hint:* 
   my solution used the ``reinterpret`` function, a bit 
   shift, a bitwise OR with a mask, and a floating-point 
   subtraction.

4. There are various 
   `run tests <https://en.wikipedia.org/wiki/Wald\%E2\%80\%93Wolfowitz_runs_test>`_
   that are used evaluate the apparent randomness of a 
   pseudorandom sequence. Here is one scheme. For each number 
   :math:`\mathsf{x}`, let :math:`\mathsf{x < 1/2}` map to 
   symbol ":math:`\mathsf{-}`" and 
   :math:`\mathsf{x \ge 1/2}` map to ":math:`\mathsf{+}`". 
   Then we can break up sequences
   :math:`\mathsf{(x_0, x_1, x_2, \ldots)}` into contiguous runs of 
   common symbols; e.g.
   
   .. math::
   
      \mathsf{\underbrace{+ + +}_3 \overbrace{- - - - -}^5 
      \underbrace{+ + +}_3 - \underbrace{+ + + + + +}_6 
      - \underbrace{+ + +}_3}
      
   shows 7 runs of size 3, 5, 3, 1, 6, 1, and 3. Prove that
   :math:`\mathsf{\textsf{Prob}(1\textsf{-run}) = 1/2}`,
   :math:`\mathsf{\textsf{Prob}(2\textsf{-run}) = 1/4}`,
   ..., and 
   :math:`\mathsf{\textsf{Prob}(n\textsf{-run}) = 2^{-n}}`.

5. Modify ``shuffled-lcg.jl`` so that it outputs a histogram of
   the frequency of :math:`\mathsf{n}`-runs for 
   :math:`\mathsf{n = 1, 2, \ldots, 20}`. 
   You should be able to reproduce the terminal session below. For
   the two seeds shown, report the histogram entries for 
   :math:`\mathsf{n = 5, 6, 7}`.

   .. code-block:: Julia
   
      $ julia shuffled-lcg.jl
      Usage: julia shuffled-lcg.jl <pseudorandom generator seed>
      $ julia shuffled-lcg.jl 1975
      1 	 0.5004000000
      2 	 0.2484000000
      3 	 0.1244000000
      4 	 0.0657000000
      .
      .
      20 	 0.0000000000
      $ julia shuffled-lcg.jl 87263
      1 	 0.5046000000
      2 	 0.2529000000
      3 	 0.1194000000
      4 	 0.0607000000
      .
      .
      20 	 0.0000000000

6. Generate :math:`\mathsf{N=20}` such histograms, based 
   on 20 different seeds. Collate the data sets to report 
   the aggregate histogram in the format 
   :math:`\mathsf{n}`,
   :math:`\mathsf{\textsf{AVE}[\textsf{Prob}(n\textsf{-runs})]}`,
   :math:`\mathsf{\textsf{ERR}[\textsf{Prob}(n\textsf{-runs})]}`, 
   where :math:`\mathsf{\textsf{AVE}}` denotes 
   the mean over the :math:`\mathsf{N}` instantiations and 
   :math:`\mathsf{\textsf{ERR} = \sqrt{\textsf{VAR}/N}}` is the 
   square root of the variance from the mean divided 
   by :math:`\mathsf{N}`.

   By typing ``plot 'hist-all.dat' u 1:2:3 w e, 2**(-x)``
   and then ``set logscale y; replot`` in gnuplot, 
   you should be able to produce a plot similar to this:

.. image:: figures/runs.pdf
   :width: 560
   :alt: Alternative text for accessibility
   :align: center