# FLOATING POINT ROUNDING ERROR

Lets take a look at the following java code block.

`for(float a=5; a!=0.0;a-=0.1)`

{

System.out.println(a);

}

So if this code block is run then the expected output would be the values starting from 5.0 to 0.1 and the loop will run for 100 times. But the actual output would look like this.

`5.0`

4.9

4.8

4.7000003

4.6000004

4.5000005

4.4000006

4.3000007

4.200001

4.100001

4.000001

3.900001

3.8000011

3.7000012

3.6000013

3.5000014

3.4000015

3.3000016

3.2000017

3.1000018

3.000002

2.900002

2.800002

2.7000022

2.6000023

2.5000024

2.4000025

2.3000026

2.2000027

2.1000028

2.0000029

1.9000028

1.8000028

1.7000028

1.6000028

1.5000027

1.4000027

1.3000027

1.2000027

1.1000026

1.0000026

0.9000026

0.8000026

0.70000255

0.6000025

0.5000025

0.4000025

0.30000252

0.20000252

0.10000252

2.5197864E-6

-0.09999748

-0.19999748

-0.29999748

-0.39999747

-0.49999747

-0.59999746

-0.6999975

-0.7999975

-0.89999753

-0.99999756

-1.0999975

-1.1999975

-1.2999976

-1.3999976

-1.4999976

-1.5999976

-1.6999977

-1.7999977

-1.8999977

-1.9999977

...

As seen from the image the program does not reach the 0.0. The values continues to get decreased endlessly. The reason for this is the IEEE 754 floating point representation. This representation will be explained in the following section.

# IEEE 754 floating point standard

In order to represent a floating point number IEEE divides them into three different parts.

1. Sign bit : This bit will tell whether the number is positive or negative. 0 means positive and 1 means negative.

2. Exponent : Exponents can be represented in 2⁸ number of bits. If the value is 12.12 then the exponent bias would be adding 127 and 3.( This will be explained in detail in the upcoming sections.)

3. Mantissa : Mantissa is the binary representation of the scientific notation for base 2 number. Mantissa will contain the bits that comes after the decimal point.

According to the IEEE 754 the following images show the different representations for the floating point numbers.

Now lets take an example number 12.7 and convert into IEEE single precision standard.

first we will convert 12 into binary.

`12 -> 1100`

Now lets take that .7 and convert it into binary.

`0.7 -> 1`**011**011011011011011011011....

it will continue to get 011 so for the moment lets stop at this specific point.

Now the number 12.7 can be written like this

`12.7 -> 1100.1011011011011011011011011....`

In scientific notation the decimal point will be put to the number after the leftmost bit.

`12.7 -> 1.1001011011011011011011011011.... x 2³`

To get the biased exponent we add the 3 from 2³ with 127 (single precision) which will give us the value 130. 130 in binary is this.

`10000010`

Next is the mantissa. In single precession the value of the number can be represented in 23 bits. And we can ignore the 1 that comes before the decimal as when writing binary in scientific decimal point will always come after a 1.

`10010110110110110110110`

This is a positive number so the sign bit value will be a 0. Now with all this information the IEEE 754 representation of 12.7 should be this.

`0 | 10000010 | 10010110110110110110110`

Even though this is the value we get computers will get the following value.

0 | 10000010 | 1001011011011011011011**1**

The reason for this is that computers will check the 24th bit `010010110110110110110110`

**1**** **if the mantissa has more bits then 23 and round up to the upper limit. If the 24th bit is 0 it will add nothing but if the 24th bit is 1 it will add 1 to the 23rd bit which will affect its value when converting back to the decimal.

Lets Convert this value back to decimal so we can see how the values have changed.

`1.10010110110110110110111 x 2³`

is the value we have to convert. (Additional one has been added left most as it was the 1 that was left behind during conversion). `1100.10110110110110110111`

this is the value after moving 3 bits to the right. Now if we convert (2³x1+2²x1+2⁻¹x1+2⁻³x1+2⁻⁴x1+2⁻⁶x1+2⁻⁷x1+2⁻⁹x1+2⁻¹⁰x1+2⁻¹²x1+2⁻¹³x1+2⁻¹⁵x1+2⁻¹⁶x1+2⁻¹⁸x1+2⁻¹⁹x1+2⁻²⁰x1) the value would be **12.71428585052490234375**

This is also the reason why we get those values in the first coding example.

Now in order to overcome this problem programmers can use the `BigDecimal`

class in the `java.math`

library.

# BigDecimal

**BigDecimal** has multiple constructors to initialize it. An integer value, string value or a double value can be passed as the parameter to create a BigDecimal object.

**BigDecimal**(double val)

**BigDecimal**(int val)

**BigDecimal**(**String** val)

Now lets say you create the BigDecimal object parsing a float value. And lets substrat -0.2 from the value.

` BigDecimal a = new BigDecimal(5);`

BigDecimal b = new BigDecimal(0.2);

a = a.subtract(b);

System.out.println(a.toString());

This is the output that you will be getting.

`4.799999999999999988897769753748434595763683319091796875`

The reason for this is that creating BigDecimal objects without parsing the **MathContext **parameter. By parsing the MathContext parameter we can specify the precision (the decimal place) that we need to round and also say the rounding mode we need to use (In this instance rounding mode is not specified so the defualt** HalfUp **is used).

` BigDecimal a = new BigDecimal(5);`

BigDecimal b = new BigDecimal(0.2,new MathContext(1));

a = a.subtract(b);

System.out.println(a.toString());

With this the output will be `4.8`

.

Note : The reason for using

`MathContext`

only on object b is that when we use the arithmetic methods. The`MathContext`

of the object that we are passing will be set to the Object that is calling the method. In this instance Object a is getting the`MathContext`

of Object b.

If passing in `MathContext `

seems too much work, parsing the value as a String will simply solve the issue. `BigDecimal b = new BigDecimal(“0.2”);`

. The reason for this is, the constructor will set the the precision value on the decimal place of the string. In this instance it will be 1.

As seen from the above example `BigDecimal `

has other Arithmetic Operations. The following methods can be used to arithmetic calculations. (conisder `bigDecimal`

as the object that was created)

`bigDecimal.add(BigDecimal obj) -> adds the two values.`

bigDecimal.subtract(BigDecimal obj) -> substracts the obj value from the bigDecimal value.

bigDecimal.multiply(BigDecimal obj)-> multiplies the two values.

bigDecimal.divide(BigDecimal onj) -> divides the two values.

The `BigDecimal `

class also has comparative methods to compare two `BigDecimal `

values the below two methods can be used for comparisons.

`BigDecimal a = new BigDecimal("5");`

BigDecimal b = new BigDecimal("0.2");

boolean val = a.equals(c)

int res = a.compareTo(c)

The equals method will return the Boolean value true if the values are equal and false if the values are not equal. The `compareTo() `

method will return an int value of 1 if the calling object (a in this instance) is bigger, -1 if the calling object is smaller and 0 if both object have equal values. For the above shown example `val = false`

and `res = 1`

.

More details about **BigDecimal **class can be found by clicking the following link.

Now before I end this article lets write the original problem using the `BigDecimal `

class and get the correct output.

` BigDecimal a = new BigDecimal("5");`

BigDecimal b = new BigDecimal("0.1");

BigDecimal c = new BigDecimal("0.0");

for(;a.equals(c)!=true;a=a.subtract(b))

{

System.out.println(a);

}

System.out.println("Loop Succesfully Finished");

Note: Instead of

`a.equals(c)!=true`

,`a.compareTo(c)!=0`

can also be used. And remember as we are parsing string values, it is not necessary to specify the precision using`.`

MathContext

The output will be this.

` 5.0`

4.9

4.8

4.7

4.6

4.5

4.4

4.3

4.2

4.1

4.0

3.9

3.8

3.7

3.6

3.5

3.4

3.3

3.2

3.1

3.0

2.9

2.8

2.7

2.6

2.5

2.4

2.3

2.2

2.1

2.0

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

1.1

1.0

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

This time we will be getting the expected output.

# When to use BigDecimal

Now that you all know about **BigDecimal **is it necessary to use BigDecimal for all operations regarding floating point. The answer would be no. BigDecimal should be used for when dealing with critical operations. Advanced physics calculations and accurate accounts details are a couple of examples to use **BigDecimal**. For other normal operations that does not require accurate value float or double can be used.

These videos uploaded by Krishantha Dinesh was a huge help to create this article.

These are the other references that are used to write this blog.