Floating Point Representation of Numbers
First of all, we should know what is floating point
numbers. As the name suggests
floating point numbers contain floating
decimal points. For example, 6.55, 0.0001, and
−2, 345.5432 are floating point numbers. Numbers are known as integers if they do not have decimal places.
In a computer two types of arithmetic operations are available. They are
One memory location
or word
|
6 |
7 |
8 |
2 |
4 |
1 |
Table 1: A memory
location storing number
6782.41
For this, a new convention is adopted which aims to
preserve the maximum number of significant digits in a real number and increase the range of values stored in real numbers. This representation is called the normalized floating-point mode of
representing and storing real numbers.
In this mode, a real number is expressed as a
combination of a mantissa and an exponent. The
mantissa is made < 1 or ≥ .1 and the exponent is the power of 10 which multiplies
the mantissa.
For example, The number 56.78 × 105 is represented in this notation
as .5678 E 7 where E is used to represent 107. The mantissa is .5678 and exponent is 7.
The number is stored
in the memory location
as:
Therefore, the shifting
of the mantissa to the left till its most significant digit is non-zero
is called normalization.
For example,
The number is .0000768 may be stored
as .7680 E − 4 because the leading zeroes
serve only to locate
the decimal point.
The range of the numbers that may be
stored are .9999 × 1099 to .1000 × 10−99 in
magnitude which is obviously much larger than that used earlier in fixed decimal
point notation.
2 Arithmetic Operations Using Normalized Floating
Point Numbers
2.1
Addition and Subtraction
If two numbers represented in normalized floating point notation are to
be added, the exponents of two numbers must be made equal and the mantissa
shifted appropriately. The operation of subtraction is nothing but adding a negative number. Thus the principles are the same.
1. .3456 E 7 and .4563 E 7
3. .3456 E 5 and .4563 E 9
4. .6457 E 5 and .4564 E 5
5. .6457 E 99 and .4564 E 99
Solution: 1. In this problem
exponents are equal.
Thus we add mantissa as follow:
.3456 E 7
+ .4563 E 7
= .8019 E 7
2.
Here, in this problem exponents are not equal. So,
first, we make it equal. As the operand with the
larger exponent is kept as it is and change the operand with a smaller exponent
by multiplying and dividing by 102, as the difference in the exponent
is 2. Therefore we get .0034 E 9
.4563 E 9
+ .0034 E 9
= .4597 E 9
3.
Again, in this problem exponents are not equal. So, first, we make it equal. By applying the same procedure as applied in the previous example multiply
and divide by 104. Therefore we get
.0000 E 9
.4563 E 9
+ .0000 E 9
= .4563 E 9
4. In this example we can see the exponents are equal. Therefore, we add the mantissa.
.6457 E 5
+ .4564 E 5
= 1.0921 E 5
As, we add the mantissa we get
1.0921 with 5 digits and > 1. So, it is
shifted to left one place before it stored and increase the the exponent
by 1. Thus .1092 E 6.
5.
Here, in this problem we can see again the exponents are equal. so we add the mantissa
and we get 1.0921 again this is > 1. So like previously we shift the decimal but now the value of exponent is
100. As we know, exponent
part cannot store more than 99. This condition is known as overflow
condition and arithmetic unit will intimate
an error condition.
Example
2.2. Subtract the following: 1. .4567 E7 and .4535 E 7
2. .8967E − 5 and .3456E − 4
3. .4567
E -99 and .4556 E -99
Solution: We apply
the same concept
that we have discussed in addition.
1. In this problem exponents
are equal therefore, we subtract the mantissa as follows:
.4567 E 7
— .4535 E 7
= .0032 E 7
Thus we can write it as .3200 E 5.
2.
In this problem, we can see that the
exponents are not equal so we have
to make them equal as we discussed in the previous
examples. Thus, we multiply and divide the value by 101.
.3456 E − 4
— .0896 E − 4
= .2560 E −
4
3. Again, in problem exponent
is same so subtract the mantissa as follow:
.4567 E − 99
— .4456 E − 99
= .0111 E −
99
For normalization, the mantissa is shifted to the left and the exponent is reduced by 1 the exponent would thus
become -100 which cannot be stored. This condition is known as the underflow condition and the arithmetic unit will intimate
an error condition.
2.2
Division
In division, the mantissa of the numerator is divided by that of the denominator. The denominator exponent
is subtracted from the numerator
exponent. The quotient
mantissa is normalized to make the most significant digit non-zero and the exponent is appropriately adjusted.
Example 2.3. Division of the following: 1. .8867 E 2 ÷.1234 E − 98
2. .7689 E 5 ÷.3456 E 56
3. .3452 E 45 ÷.6754 E 68
Solution: 1. .8867 E 2 ÷ .1234 E −
98 = 7.1855 E 100
= .7185 E 101
Overflow
2. .7689 E 5 ÷ .3456 E 56
= 2.2248 E − 51 = .2224 E − 50
3. .3452 E 45
÷ .6754 E 68
= .5111 E −
23
2.3
Multiplication
Two numbers are multiplied in the
normalized floating-point mode by multiplying the mantissa and adding the exponents. After the
multiplication of the mantissa, the result mantissa is normalized as in addition
or subtraction operation
and the exponent is appropriately adjusted.
Example 2.4. Multiply the following: 1. .4567 E 31and.3456 E − 12
2. .1111 E 67and.1345 E − 87
3. .4563 E 56and.3452 E 44
4. .5673 E55and.1234 E − 44
Solution: 1. .4567 E 31 ×.3456 E − 12 = .1578 E 19
2. .1111 E 67 ×.1345 E − 87 = .0149 E − 27 = .1490 E − 28
3. .4563 E 56 ×.3452 E 44 = .1575 E 100. The result
overflows.
4. .5673 E55 ×.1234 E − 44 = .0700 E − 99 = .7000 E − 100. The
result is underflow. Few questions
for practice
Example 2.5. Represent 657.9 × 1067 in normalized floating point mode. Example 2.6. Subtract the (.9876 E 45) − (.3456 E 47) floating point numbers.
Example 2.7. Find the value of (1 + x)2 where (x = 0.4523 E 3).
Example 2.8. Apply all the arithmetic operations on any two normalized floating point numbers.
In computer arithmetic, floating point representation provides the following benefits:
Precision: A large range of values can be represented with different precision when using floating point numbers. Floating point numbers can handle extremely big and extremely small values, in contrast to fixed-point integers, which have a limited range. For computations in science and engineering, this flexibility is essential.
Dynamic Range: An expanded dynamic range is offered by floating-point encoding. It has normalized precision and can represent numbers between (10^{-308}) and (10^{308}) (roughly). Applications like financial computations, scientific modeling, and simulations require this range.
Effective Arithmetic Operations: Addition, subtraction, multiplication, and division are all carried out on floating-point numbers with efficiency by floating-point hardware accelerators, including FPU units. Scientific simulations, visual rendering, and other computationally demanding applications require these procedures.
Standardization: The representation and arithmetic operations for floating point numbers are defined by the IEEE 754 standard. This standard permits numerical code to be portable and interoperable across various platforms and computer languages.
Keep in mind that although floating point representation offers many benefits, it also has drawbacks, like the possibility of rounding mistakes during arithmetic operations and precision errors owing to finite precision. These trade-offs must be considered by developers while creating numerical algorithms. Please feel free to ask any further questions or for further clarification if necessary! 😊

Comments
Post a Comment