Floating Point library in m68k Assembler on Amiga
09 August 2018
Part 1 - some theory
Someone told me lately: “If you haven’t developed a floating-point library, go home and do it. It’s a nice weekend project.”
I followed this advice.
I must say, it took longer than a weekend. :) But it was a great experience to see how those numbers are generated and handled, and how they 'jitter' at the last bits of precision.
The Amiga offers a number of programming languages, including C/C++ and more high level languages like Pascal or Oberon, and some Basic dialects like AMOS, BlitzBasic and others.
(I’m posting the full assembler source code at the end of the post.
As the first part of this blog I’d like to write a little about the theory of floating-point numbers.
One of the floating point standards is IEEE 754.
The sign is pretty clear, it says whether the number is positive or negative.
The 8 bit exponent basically encodes the ‘floating-point’ shift value to the left and right.
The 23 bit mantissa combines the integer part of the floating-point number and the fraction part.
The integer part in the mantissa can go through a ‘normalisation’ process, which means that the first ‘1’ in a binary form of the number matters. And everything before that is ignored, considering the number is in a 32 bit register.
Let’s take the number 12.45.
That is how it would be stored in the mantissa.
There is more to it, read upon it here if you want: https:/en.wikipedia.org/wiki/IEEE_754
Part 2 - the implementation - dec2bin (decimal to binary)
We make a few slight simplifications to the IEEE 754 standard so that this implementation is not fully compliant.
Now, how does it work in practice to get a decimal number into the computer as IEEE 754 representation.
Say, the number is:
Converting the integer part into binary form is pretty trivial. We just copy the value
As next step we have to calculate the bit length of that number because it is later stored in the exponent.
Here is the assembler code for that:
; d0 copied to d6 ; if int_part (d6) = 0 then no need to do anything cmpi.l #0,d6 beq .loop_count_int_bits_end ; now shift left until we find the first 1 ; counter in d2 .loop_count_int_bits btst.l #$1f,d6 ; bit 32 set? bne.s .loop_count_int_bits_done addq #1,d2 ; inc counter lsl.l #1,d6 bra .loop_count_int_bits .loop_count_int_bits_done move.l #32,d3 sub.l d2,d3 ; 32 - 1. bit of int move.l d3,d2 .loop_count_int_bits_end
In register d2 is the result, the bit length of the integer part.
The fraction part is a little more tricky. Bringing it into a binary form requires some thought.
I found that an algorithm that translates the fraction into binary form depends on the number of digits.
This loop can be repeated until there are no more bits in the fraction part. Or, the loop only repeats for the number of „free“ fraction bits left in the mantissa.
The threshold value, 5000 here, depends on the number of digits of the fraction part.
Here is the code to convert the fraction into binary value:
; now prepare fraction in d1 .prepare_fract_bits ; the algorithm is to: ; check if d1 > 5000 (4 digits) ; if yes -> mark '1' and substract 5000 ; if no -> mark '0' ; shift left (times 2) ; repeat until no more available bits in mantisse, which here is d3 move.l #5000,d4 ; threshold .loop_fract_bits subi.l #1,d3 ; d3 is position of the bit that represents 5000 clr.l d6 cmp.l d4,d1 blt .fract_under_threshold sub.l d4,d1 bset d3,d6 .fract_under_threshold or.l d6,d7 lsl.l #1,d1 ; d1 * 2 cmpi.l #0,d3 ; are we done? bgt .loop_fract_bits .prepare_fract_bits_end
The above code positions the fraction bit directly into the output register d7. And only so many bits are generated as there is space available in the mantissa.
Now we have the mantissa complete.
What’s missing is the exponent.
; at this point we have the mantissa complete ; d0 still holds the source integer part ; d2 still holds the exp. data ; (int part size, which is 0 for d0 = 0 because we don't hide the 'hidden bit') ; d7 is the result register ; all other registers may be used freely ; if d0 = 0 goto end cmpi.l #0,d0 beq .prepare_exp_bits_end .prepare_exp_bits ; Excess = 127 move.l #127,d0 ; we don't need d0 any longer add.l d2,d0 ; size of int part on top of excess move.l #23,d3 lsl.l d3,d0 ; shift into right position or.l d0,d7 .prepare_exp_bits_end
Notice, there is a special case. If the integer part is 0, delivered in d0, then we’ll make the exponent 0, too.
That’s basically it for the decimal to binary operation.
Test code for that is straight forward.
; dec2bin test code move.l #12,d0 ; integer part => 1010 move.l #4500,d1 ; fract part ; subroutine expects d0, d1 to be filled ; result: the IEEE 754 number is in d7 bsr dec2bin move.l #%01000001111000111001100110011001,d3 ; this what we expect cmp.l d3,d7 beq assert_pass move.l #1,d3 bra assert_end assert_pass move.l #0,d3 assert_end illegal ;include ; include "dec2bin.i"
The test code compares the subroutine output with a manually setup binary number that we expect.
Part 3 - the implementation - bin2dec (binary to decimal)
We want to convert back from the binary float number to the decimal representation with the integer part (with sign) and the fraction part in separate output registers.
In register d0 we expect the floating point number as input.
Let’s start extracting the exponent, because we need to get the integer part bit length that is encoded there.
We’ll make a copy of the input register where we operate on, because we mask out everything but the exponent bits.
.extract_exponent move.l d0,d1 andi.l #$7f800000,d1 ; mask out all but exp move.l #23,d2 lsr.l d2,d1 ; right align ; if int part = 0 cmpi.w #0,d1 beq .extract_sign subi.w #127,d1 ; d1 is now the size of int part
As next step we’ll extract the integer part bits.
.extract_mantisse_int move.l d0,d2 ; copy andi.l #$007fffff,d2 ; mask out all but mantisse move.l #23,d3 sub.l d1,d3 ; what we figured out above (int part size) lsr.l d3,d2 ; right align move.l d2,d6 ; result ; d6 now contains the int part
We also have to extract the sign bit and merge it with the integer part in register d6.
As next important and more tricky step is converting back the fraction part of the mantissa into a decimal representation.
First we have to extract the mantissa bits again, similarly as we did in the last step.
What do the ‘1’ bits in the fraction mantissa represent?
I.e.: assuming those bits:
Now, if each ‘1’ represents 5000 we have the following:
Here is the code:
clr.l d7 ; prepare output clr.l d1 ; used for division remainder move.l #1,d4 ; divisor (1, 2, 4, 8, ... ; equivalent to 2^-1, 2^-2, 2^-4, ...) .loop_fract subi.l #1,d2 ; d2 current bit to test for '1' lsl.l #1,d4 ; divisor - multiply by 2 on each loop cmpi.w #0,d4 ; loop end? if 0 we shifted out of the word boundary beq .loop_fract_end btst.l d2,d3 ; if set we have to devide beq .loop_fract ; no need to devide if 0 move.l #5000,d5 ; we devide 5000 add.l d1,d5 ; add remainder from previous calculation divu.w d4,d5 ; divide clr.l d6 ; clear for quotient add.w d5,d6 ; copy lower 16 bit of the division result (the quotient) lsl.l #1,d6 ; *2 add.l d6,d7 ; accumulate the quotient and.l #$ffff0000,d5 ; the new remainder move.l #16,d1 ; number of bits to shift remainder word lsr.l d1,d5 ; shift move.l d5,d1 ; copy new remainder bra .loop_fract .loop_fract_end
If we look at the
Let’s add a test case.
; test code for dec2bin2dec ; move.l #12345,d0 ; integer part => 1010 move.l #5001,d1 ; fract part ; subroutine expects d0, d1 to be filled ; result: the IEEE 754 number is in d7 bsr dec2bin move.l d7,d0 ; input for the back conversion bsr bin2dec cmpi.l #12345,d6 bne error cmpi.l #5001,d7 bne error moveq #0,d0 ;success illegal error moveq #1,d0 ;error illegal include "dec2bin.i" include "bin2dec.i"
Since we have now both operations, we can use dec2bin and bin2dec in combination.
We provide input for dec2bin, then let the result run through bin2dec and compare original input to the output.
I must say that there is indeed a precision loss. The last (fourth) digit can be off up-to 5, so we have a precision loss of up-to 5 thousandth.
That can clearly be improved. But for this little project this result is acceptable.
In the next „parts“ I’d like to implement operations for addition, subtraction, division and multiplication.
Here are the sources: m68k-fp-lib on GitHub