Integer representations
Integers
The fact that computers are finite has important design implications. It means that computers can never faithfully represent the sets of integers or real numbers, both of which are infinite. Since these are generally what we work with in physics and mathematics, it’s important to understand how we approximately represent integers and reals on a computer. (What we’ll discover is that finitefields are substituted for the integers and rationals for the reals.)
But first, let’s consider how we might represent numbers at all. What strategies could we employ? Historically, the most common are these two:
simple enumeration (/ // /// //// …)
grouping and labelling (e.g., Roman numerals I II III IV V X L C D M; 1998 = MCMXCVIII)
Neither, however, is suitable for computation. The first is grossly inefficient, requiring \(\mathsf{N}\) bits to store numbers as large as \(\mathsf{N}\). The second requires an evergrowing family of new symbols to represent large values. Moreover, we need a systematic and extensible representation in which basic arithmetic operations are mechanistic. The usual solution to this dilema is the following.
positional number systems: (\(\mathsf{a_3a_2a_1a_0.a_{1}a_{2}}\)) base \(\mathsf{b}\) with \(\mathsf{0 \le a_k < b}\)
By convention, the leading (high) digit is most significant and the trailing (low) digit the least. The part to the right of the radix point is understood to be fractional. Our conventional decimal number system corresponds to \(\mathsf{b = 10}\) with digits \(\mathsf{a_k \in \{ 0,1,2,\ldots,9 \}}\). Bases \(\mathsf{b = 2, 8, 16}\) (all powers of two) are the most commonly used in computer science, since computers use the binary representation in hardware.
base 
common name 
math name 
example 
decimal conversion 

2 
binary 
10010111_{2} 
167 

8 
octal 
octonal 
1735_{8} 
989 
10 
decimal 
234 
234 

16 
hexadecimal 
sexadecimal 
3F7A_{16} 
16250 
60 
sexagesimal 
23 44’ 12” 
23.736666 
The conversion to decimal can be carried out by summing powers.
Note that for bases greater than 10, we run out of arabic numerals. The convention is to fill out the missing digits using roman letters.
Hence,
Of course, for base \(\mathsf{b > 36}\) we run out of symbols, and the notation becomes idiosyncratic. The convention for fractions of degrees is to use an increasing number of primes to mark places:
Binary is not as foreign as it first seems. Humans have invented many binary number systems; good examples are the western system of musical notation (1 whole note = 2 half notes = 4 quarter notes = 8 eights notes, etc.) and the British system of weights and measures (1 gallon = 2 pottles = 4 quarts = 8 pints, etc.).
Signed and unsigned binary numbers
A fixedwidth binary number is a sequence of \(\mathsf{N}\) bits. The smallest possible number is \(\mathsf{0}\) and the largest is \(\mathsf{2^N1}\). For example, with eight bits the numbers \(\mathsf{0}\) to \(\mathsf{255}\) are represented by the \(\mathsf{2^8}\) unique patterns of \(\mathsf{0}\) and \(\mathsf{1}\).
0 
0 0 0 0 0 0 0 0 
1 
0 0 0 0 0 0 0 1 
2 
0 0 0 0 0 0 1 0 
3 
0 0 0 0 0 0 1 1 
255 
1 1 1 1 1 1 1 1 
Negative numbers can be represented using what’s called the two’s complement scheme. Here, the numbers \(\mathsf{\{0,1,\ldots,255\}}\) are reinterpreted as \(\mathsf{\{128,127,\ldots,127\}}\) with the leading (most significant) bit signalling that \(\mathsf{x \to x256}\).
two’s complement 
bit pattern 
unsigned binary 

−128 
1 0 0 0 0 0 0 0 
128 
−3 
1 1 1 1 1 1 0 1 
253 
−2 
1 1 1 1 1 1 1 0 
254 
−1 
1 1 1 1 1 1 1 1 
255 
0 
0 0 0 0 0 0 0 0 
0 
1 
0 0 0 0 0 0 0 1 
1 
2 
0 0 0 0 0 0 1 0 
2 
3 
0 0 0 0 0 0 1 1 
3 
127 
0 1 1 1 1 1 1 1 
127 
Under two’s complement, the high bit effectively contains the sign information. Still, this is quite different than having the high bit represent the sign explicitly and the remaining low bits the magnitude. Under that scheme we would have the numbers
Note that there are two representations of zero. Two’s complement has the advantage that the zero is unique.
A more important advantage is that addition and subtraction of two’s complement numbers can be carried out in exactly the same way as for unsigned binary numbers. The result of the following sum is shown in grey; the carry bits are shown in red. The addition operation can equally be interpreted as acting on signed or unsigned integers.
This 4digit binary computation can be interpreted in two ways: unsigned \(\mathsf{3 + 10 = 13}\) or two’s complement \(\mathsf{3 + (6) = 3}\).
Limitations
Fixed width binary numbers can only represent a limited range of integers. The result of an operation (such as addition or multiplication) performed on a pair of representable integers may not be representable itself. This condition is called overflow.
The next example provides a simple demonstration. The overflow result can be understood in terms of arithmetic modulo \(\mathsf{2^4 = 16}\).
For unsigned binary, \(\mathsf{9 + 13 = (overflow)\ 6}\), which is \(\mathsf{22\ \text{mod}\ 16}\). For two’s complement, \(\mathsf{(7) + (3) = (overflow)\ 6}\), which is \(\mathsf{10\ \text{mod}\ 16}\).
C++ integer types
The integer types provided by C++ do not have a guaranteed width. Rather, only their relative sizes are enforced.
assert( sizeof(char) <= sizeof(short) and
sizeof(short) <= sizeof(int) and
sizeof(int) <= sizeof(long) );
The sizes, however, are standardized across 32bit Intel machines.
type 
width 


8 bits, 1 byte 

16 bits, 2 bytes, 1 word 

32 bits, 4 bytes, 1 double word 
By default, the integer types are signed. On almost all architectures,
they are represented internally using the two’s complement scheme.
Unsigned binary versions can be specified by prepending the unsigned
keyword: e.g., unsigned char
, unsigned short int
, unsigned
int
, unsigned long int
.
From 2011 on, thanks to the C++11 language update, we can now specify signed and unsigned integer types of fixed bit width.
#include <cstdint>
assert( sizeof(int8_t) == 1 );
assert( sizeof(int16_t) == 2 );
assert( sizeof(int32_t) == 3 );
assert( sizeof(int64_t) == 4 );
assert( sizeof(uint8_t) == 1 );
assert( sizeof(uint16_t) == 2 );
assert( sizeof(uint32_t) == 3 );
assert( sizeof(uint64_t) == 4 );
Python uses a different approach. It supports unlimitedprecision integers. This is done in software rather than hardware, so it’s slower; but not having to worry about overflow can be very convenient. Julia also supports arbitrary precision arithmetic but as a special BigNum type rather than by default.