ejolson
Posts: 10233
Joined: Tue Mar 18, 2014 11:47 am

MicroPython double precision

Mon Sep 13, 2021 2:33 pm

Does anyone know if there is an official MicroPython for the Pico available that has been built with support for double-precision floating-point arithmetic?

The one I downloaded from the getting started defaults to single precision.

From what I understand this is a compile time setting. I understand rebuilding the uf2 is an option. Are there instructions building a double-precision version for the Pico? Has anyone tried or done this?

hippy
Posts: 12718
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: MicroPython double precision

Mon Sep 13, 2021 3:53 pm

Funnily enough I had noticed your comment about MicroPython only being single precision in the BBC Basic thread. I looked into that yesterday and found it's a simple fix -

mpconfigport.h - around line 65 ...

Code: Select all

#define MICROPY_FLOAT_IMPL                      (MICROPY_FLOAT_IMPL_FLOAT)
Needs to change to ...

Code: Select all

#define MICROPY_FLOAT_IMPL                      (MICROPY_FLOAT_IMPL_DOUBLE)
To build as a cloned port ...

Code: Select all

cd ~/pico/micropython/ports
cp -r rp2 rp2double
cd rp2double
nano mpconfigport.h           # As above
rm -r -f build
mkdir build
cd build
cmake ..
make
My version of the kahan test mentioned -

Code: Select all

import time

#        123456789-12345
flt = 14.392726722865724
print("")
print("Floating point format test")
print("")
print(flt)
print("")
print("14.4               -> {:0.1f}".format(flt))
print("14.39              -> {:0.2f}".format(flt))
print("14.393             -> {:0.3f}".format(flt))
print("14.3927            -> {:0.4f}".format(flt))
print("14.39273           -> {:0.5f}".format(flt))
print("14.392727          -> {:0.6f}".format(flt))
print("14.3927267         -> {:0.7f}".format(flt))
print("14.39272672        -> {:0.8f}".format(flt))
print("14.392726723       -> {:0.9f}".format(flt))
print("14.3927267229      -> {:0.10f}".format(flt))
print("14.39272672287     -> {:0.11f}".format(flt))
print("14.392726722866    -> {:0.12f}".format(flt))
print("14.3927267228657   -> {:0.13f}".format(flt))
print("14.39272672286572  -> {:0.14f}".format(flt))
print("14.392726722865724 -> {:0.15f}".format(flt))

# https://www.raspberrypi.org/forums/viewtopic.php?f=144&t=316761&start=625
#
# 14.392726722865724
# Total elapsed time 8.883 seconds

print("")
print("Floating point kahan test ... please wait - takes about 80 seconds")
print("")
tstart = time.ticks_us()
n = 1_000_000
s = 0
c = 0
for k in range(1, n+1):
    v = 1 / k + c
    snext = s + v
    c = v - (snext - s)
    s = snext
tstop = time.ticks_us()
print(s)
print("14.392726722865724 -> {:0.15f}".format(s))
print("Total elapsed time {} seconds".format((tstop-tstart)/1_000_000))
On stock firmware (float) ..

Code: Select all

Floating point format test

14.39273

14.4               -> 14.4
14.39              -> 14.39
14.393             -> 14.393
14.3927            -> 14.3927
14.39273           -> 14.39273
14.392727          -> 14.392733          <-- Diverges here (float)
14.3927267         -> 14.3927326
14.39272672        -> 14.39273262
14.392726723       -> 14.392732620
14.3927267229      -> 14.3927326202
14.39272672287     -> 14.39273262024
14.392726722866    -> 14.392732620239
14.3927267228657   -> 14.3927326202393
14.39272672286572  -> 14.39273262023926
14.392726722865724 -> 14.392732620239258

Floating point kahan test ... please wait - takes about 80 seconds

14.39273
14.392726722865724 -> 14.392727851867676
Total elapsed time 72.93056 seconds
Using firmware build for "double" ...

Code: Select all

Floating point format test

14.39272672286573

14.4               -> 14.4
14.39              -> 14.39
14.393             -> 14.393
14.3927            -> 14.3927
14.39273           -> 14.39273
14.392727          -> 14.392727
14.3927267         -> 14.3927267
14.39272672        -> 14.39272672
14.392726723       -> 14.392726723
14.3927267229      -> 14.3927267229
14.39272672287     -> 14.39272672287
14.392726722866    -> 14.392726722866
14.3927267228657   -> 14.3927267228657
14.39272672286572  -> 14.39272672286573  <-- Diverges here (double)
14.392726722865724 -> 14.392726722865725

Floating point kahan test ... please wait - takes about 80 seconds

14.39272672286573
14.392726722865724 -> 14.392726722865725
Total elapsed time 76.64488 seconds
The time penalty for going to double doesn't seem that bad here.

ejolson
Posts: 10233
Joined: Tue Mar 18, 2014 11:47 am

Re: MicroPython double precision

Mon Sep 13, 2021 4:22 pm

hippy wrote:
Mon Sep 13, 2021 3:53 pm
Funnily enough I had noticed your comment about MicroPython only being single precision in the BBC Basic thread. I looked into that yesterday and found it's a simple fix -

mpconfigport.h - around line 65 ...

Code: Select all

#define MICROPY_FLOAT_IMPL                      (MICROPY_FLOAT_IMPL_FLOAT)
Needs to change to ...

Code: Select all

#define MICROPY_FLOAT_IMPL                      (MICROPY_FLOAT_IMPL_DOUBLE)
To build as a cloned port ...

Code: Select all

cd ~/pico/micropython/ports
cp -r rp2 rp2double
cd rp2double
nano mpconfigport.h           # As above
rm -r -f build
mkdir build
cd build
cmake ..
make
My version of the kahan test mentioned -

Code: Select all

import time

#        123456789-12345
flt = 14.392726722865724
print("")
print("Floating point format test")
print("")
print(flt)
print("")
print("14.4               -> {:0.1f}".format(flt))
print("14.39              -> {:0.2f}".format(flt))
print("14.393             -> {:0.3f}".format(flt))
print("14.3927            -> {:0.4f}".format(flt))
print("14.39273           -> {:0.5f}".format(flt))
print("14.392727          -> {:0.6f}".format(flt))
print("14.3927267         -> {:0.7f}".format(flt))
print("14.39272672        -> {:0.8f}".format(flt))
print("14.392726723       -> {:0.9f}".format(flt))
print("14.3927267229      -> {:0.10f}".format(flt))
print("14.39272672287     -> {:0.11f}".format(flt))
print("14.392726722866    -> {:0.12f}".format(flt))
print("14.3927267228657   -> {:0.13f}".format(flt))
print("14.39272672286572  -> {:0.14f}".format(flt))
print("14.392726722865724 -> {:0.15f}".format(flt))

# https://www.raspberrypi.org/forums/viewtopic.php?f=144&t=316761&start=625
#
# 14.392726722865724
# Total elapsed time 8.883 seconds

print("")
print("Floating point kahan test ... please wait - takes about 80 seconds")
print("")
tstart = time.ticks_us()
n = 1_000_000
s = 0
c = 0
for k in range(1, n+1):
    v = 1 / k + c
    snext = s + v
    c = v - (snext - s)
    s = snext
tstop = time.ticks_us()
print(s)
print("14.392726722865724 -> {:0.15f}".format(s))
print("Total elapsed time {} seconds".format((tstop-tstart)/1_000_000))
On stock firmware (float) ..

Code: Select all

Floating point format test

14.39273

14.4               -> 14.4
14.39              -> 14.39
14.393             -> 14.393
14.3927            -> 14.3927
14.39273           -> 14.39273
14.392727          -> 14.392733          <-- Diverges here (float)
14.3927267         -> 14.3927326
14.39272672        -> 14.39273262
14.392726723       -> 14.392732620
14.3927267229      -> 14.3927326202
14.39272672287     -> 14.39273262024
14.392726722866    -> 14.392732620239
14.3927267228657   -> 14.3927326202393
14.39272672286572  -> 14.39273262023926
14.392726722865724 -> 14.392732620239258

Floating point kahan test ... please wait - takes about 80 seconds

14.39273
14.392726722865724 -> 14.392727851867676
Total elapsed time 72.93056 seconds
Using firmware build for "double" ...

Code: Select all

Floating point format test

14.39272672286573

14.4               -> 14.4
14.39              -> 14.39
14.393             -> 14.393
14.3927            -> 14.3927
14.39273           -> 14.39273
14.392727          -> 14.392727
14.3927267         -> 14.3927267
14.39272672        -> 14.39272672
14.392726723       -> 14.392726723
14.3927267229      -> 14.3927267229
14.39272672287     -> 14.39272672287
14.392726722866    -> 14.392726722866
14.3927267228657   -> 14.3927267228657
14.39272672286572  -> 14.39272672286573  <-- Diverges here (double)
14.392726722865724 -> 14.392726722865725

Floating point kahan test ... please wait - takes about 80 seconds

14.39272672286573
14.392726722865724 -> 14.392726722865725
Total elapsed time 76.64488 seconds
The time penalty for going to double doesn't seem that bad here.
Thanks for running that test!

I find it interesting that the speed penalty for double precision is dwarfed by other inefficiencies in the interpreter. From what I understand single precision is not accurate enough for banking, science, engineering or mathematics and not fast enough for machine learning.

I wonder why single precision was chosen as the default.

hippy
Posts: 12718
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: MicroPython double precision

Mon Sep 13, 2021 6:14 pm

ejolson wrote:
Mon Sep 13, 2021 4:22 pm
I find it interesting that the speed penalty for double precision is dwarfed by other inefficiencies in the interpreter. From what I understand single precision is not accurate enough for banking, science, engineering or mathematics and not fast enough for machine learning.

I wonder why single precision was chosen as the default.
Main reason is probably that's how it is for other ports so that's how it got implemented for the RP2040.

I would also guess that most applications run on a microcontroller don't require double precision so it makes sense to default to single precision.

User avatar
jahboater
Posts: 8361
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: MicroPython double precision

Mon Sep 13, 2021 6:36 pm

ejolson wrote:
Mon Sep 13, 2021 4:22 pm
I find it interesting that the speed penalty for double precision is dwarfed by other inefficiencies in the interpreter. From what I understand single precision is not accurate enough for banking, science, engineering or mathematics and not fast enough for machine learning.

I wonder why single precision was chosen as the default.
For the C language, speed is critical and yet double precision is the default (and it has always been so).

I believe single precision is used to save memory when allocating large arrays, it is rarely used to improve speed.

User avatar
scruss
Posts: 5150
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON

Re: MicroPython double precision

Mon Sep 13, 2021 6:51 pm

jahboater wrote:
Mon Sep 13, 2021 6:36 pm
I believe single precision is used to save memory when allocating large arrays, it is rarely used to improve speed.
With regular computers, that's definitely true. With micro controllers, very few of them have double precision hardware floating point and you are working with constrained memory. The Cortex-M7 in the Teensy 4 is the only one I know of that has hardware double-precision floating point support built in. Cortex-M4, as seen in many other MicroPython boards, has 32-bit floating point in hardware. The Cortex-M0+ in the Pico has no floating point support beyond the software library shipped in the ROM.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him

hippy
Posts: 12718
Joined: Fri Sep 09, 2011 10:34 pm
Location: UK

Re: MicroPython double precision

Mon Sep 13, 2021 6:59 pm

jahboater wrote:
Mon Sep 13, 2021 6:36 pm
For the C language, speed is critical and yet double precision is the default (and it has always been so).
As an amateur C programmer you might have to explain that to me. I thought one always got 32-bit single precision if one specified "float", 64-bit "double", and I am guessing 128-bit for "long double". I am not seeing where any "default" comes into it

trejan
Posts: 5137
Joined: Tue Jul 02, 2019 2:28 pm

Re: MicroPython double precision

Mon Sep 13, 2021 7:06 pm

ejolson wrote:
Mon Sep 13, 2021 4:22 pm
From what I understand single precision is not accurate enough for banking, science, engineering or mathematics and not fast enough for machine learning.
IEEE 754 floating point is trap for the unwary due to its limitations on representing numbers. If you need high precision then you use a bignum library not floats or doubles.

Guess the output of this short program.

Code: Select all

#include <stdio.h>

int main()
{
  double total = 0.0;

  for (int i = 0; i < 10000000; i++) {
    total += 0.2;
  }

  printf("%f\n", total);
}

User avatar
scruss
Posts: 5150
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON

Re: MicroPython double precision

Mon Sep 13, 2021 8:08 pm

trejan wrote:
Mon Sep 13, 2021 7:06 pm
If you need high precision then you use a bignum library not floats or doubles.
... and yet BASIC on the TI-99/4A managed it without bignums:

Code: Select all

>list
10 T=0
20 FOR I=1 TO 10000000
30 T=T+0.2
40 NEXT I
50 PRINT T
60 END
>run
 2000000 

** DONE **
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him

trejan
Posts: 5137
Joined: Tue Jul 02, 2019 2:28 pm

Re: MicroPython double precision

Mon Sep 13, 2021 8:24 pm

scruss wrote:
Mon Sep 13, 2021 8:08 pm
trejan wrote:
Mon Sep 13, 2021 7:06 pm
If you need high precision then you use a bignum library not floats or doubles.
... and yet BASIC on the TI-99/4A managed it without bignums:
It uses a different floating point number format. TI BASIC can accurately store 0.2 unlike IEEE 754. It isn't as efficient if you're implementing it in hardware though. IEEE 754 is basically Intel's design.

ejolson
Posts: 10233
Joined: Tue Mar 18, 2014 11:47 am

Re: MicroPython double precision

Mon Sep 13, 2021 8:44 pm

scruss wrote:
Mon Sep 13, 2021 8:08 pm
trejan wrote:
Mon Sep 13, 2021 7:06 pm
If you need high precision then you use a bignum library not floats or doubles.
... and yet BASIC on the TI-99/4A managed it without bignums:

Code: Select all

>list
10 T=0
20 FOR I=1 TO 10000000
30 T=T+0.2
40 NEXT I
50 PRINT T
60 END
>run
 2000000 

** DONE **
How long does it take a TI-99/4A to add all that up?

User avatar
jahboater
Posts: 8361
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: MicroPython double precision

Mon Sep 13, 2021 10:28 pm

Hippy,
hippy wrote:
Mon Sep 13, 2021 6:59 pm
As an amateur C programmer you might have to explain that to me. I thought one always got 32-bit single precision if one specified "float", 64-bit "double", and I am guessing 128-bit for "long double". I am not seeing where any "default" comes into it
The literal 3.14 is a double. You have to add a suffix F to make it a float, or L to make it a long double.

The library functions, for example sin() default to double, that is, they take a double argument and return a double.
You have to use sinf() for floats and sinl() for long double.

So the familiar sin(3.14) gets a double argument as expected and returns a double.
For float that would be sinf(3.14f), not so nice.

And so on.

If "double" doesnt read well, I just do

typedef double real;

and all is well.
Last edited by jahboater on Mon Sep 13, 2021 11:11 pm, edited 1 time in total.

User avatar
Paeryn
Posts: 3519
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: MicroPython double precision

Mon Sep 13, 2021 10:33 pm

<edit: oh, jahboater replied whilst I was typing away...>
hippy wrote:
Mon Sep 13, 2021 6:59 pm
jahboater wrote:
Mon Sep 13, 2021 6:36 pm
For the C language, speed is critical and yet double precision is the default (and it has always been so).
As an amateur C programmer you might have to explain that to me. I thought one always got 32-bit single precision if one specified "float", 64-bit "double", and I am guessing 128-bit for "long double". I am not seeing where any "default" comes into it
I presume jahboater is referring to the fact that a floating point literal without a size suffix is a double. That means that in the following function

Code: Select all

float scale( float x )
{
  return x * 1.1;
}
the compiler will promote x to a double, do the multiplication and then demote the result to a float. To avoid the automatic promotion / demotion you have to tell the compiler that 1.1 is only a float e.g.

Code: Select all

float scale( float x )
{
  return x * 1.1f;
}
Also a lot of people probably just use the maths functions like sin() even when passing a float and assigning the returned value to a float (sin() takes and returns double, the single precision version is sinf()).

And long double isn't necessarily a 128-bit float, it only has to be at least as long as a double (64-bit) but it could be Intel's extended (80-bit). GCC has __float128 for quad precision floats where available and the suffix for such literals is q.
Last edited by Paeryn on Mon Sep 13, 2021 10:36 pm, edited 1 time in total.
She who travels light — forgot something.
Please note that my name doesn't start with the @ character so can people please stop writing it as if it does!

User avatar
jahboater
Posts: 8361
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: MicroPython double precision

Mon Sep 13, 2021 10:35 pm

trejan wrote:
Mon Sep 13, 2021 8:24 pm
It uses a different floating point number format. TI BASIC can accurately store 0.2 unlike IEEE 754. It isn't as efficient if you're implementing it in hardware though. IEEE 754 is basically Intel's design.
Since 2008, 0.2 can be stored.
That is IEEE 754 supports decimal, as well as binary, floating-point.

You can use that now in C as _Decimal32, _Decimal64, and _Decimal128.
Last edited by jahboater on Tue Sep 14, 2021 8:42 am, edited 1 time in total.

User avatar
jahboater
Posts: 8361
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: MicroPython double precision

Mon Sep 13, 2021 10:37 pm

Paeryn wrote:
Mon Sep 13, 2021 10:33 pm
And long double isn't necessarily a 128-bit float, it only has to be at least as long as a double (64-bit) but it could be Intel's extended (80-bit). GCC has __float128 for quad precision floats where available and the suffix for such literals is q.
On the Pi, that's one advantage of 64-bit mode.
Long double is 128-bits for aarch64, and only 64-bits for ARM32.

trejan
Posts: 5137
Joined: Tue Jul 02, 2019 2:28 pm

Re: MicroPython double precision

Mon Sep 13, 2021 11:18 pm

jahboater wrote:
Mon Sep 13, 2021 10:35 pm
Since IEEE 754-2008, 0.2 can be stored, and 0.3 + 0.3 + 0.3 == 1.0
That is IEEE 754 supports decimal, as well as binary, floating-point.

You can use that now in C as _Decimal32, _Decimal64, and _Decimal128.
Yep but DFP is optional so not everything supports it. The FPUs in the various Pi SoCs don't. The stock GCC used for the Pico SDK doesn't support it either.

User avatar
scruss
Posts: 5150
Joined: Sat Jun 09, 2012 12:25 pm
Location: Toronto, ON

Re: MicroPython double precision

Tue Sep 14, 2021 12:34 am

trejan wrote:
Mon Sep 13, 2021 8:24 pm
It uses a different floating point number format.
I know. And if you thought that DFP was slow, the bigfloats you recommended are positively glacial. Your test code was the "never do this" that you'd get in any numerical analysis intro, too.
ejolson wrote:
Mon Sep 13, 2021 8:44 pm
How long does it take a TI-99/4A to add all that up?
I really don't want to know. I used abbeyj/ti99basic: TI-99/4A BASIC as a scripting language, which claims to be a portable emulator of the VM that TI BASIC ran in.

An MSX 1, which also uses decimal floating point, would take over 13 hours to calculate that sum.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him

User avatar
jahboater
Posts: 8361
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: MicroPython double precision

Tue Sep 14, 2021 1:23 am

scruss wrote:
Tue Sep 14, 2021 12:34 am
trejan wrote:
Mon Sep 13, 2021 8:24 pm
It uses a different floating point number format.
I know. And if you thought that DFP was slow, the bigfloats you recommended are positively glacial. Your test code was the "never do this" that you'd get in any numerical analysis intro, too.
ejolson wrote:
Mon Sep 13, 2021 8:44 pm
How long does it take a TI-99/4A to add all that up?
I really don't want to know. I used abbeyj/ti99basic: TI-99/4A BASIC as a scripting language, which claims to be a portable emulator of the VM that TI BASIC ran in.

An MSX 1, which also uses decimal floating point, would take over 13 hours to calculate that sum.
I'll stick with binary, its close enough for most things except currency perhaps.
The example takes 16 milliseconds on a Pi4.

ejolson
Posts: 10233
Joined: Tue Mar 18, 2014 11:47 am

Re: MicroPython double precision

Tue Sep 14, 2021 1:27 am

scruss wrote:
Tue Sep 14, 2021 12:34 am
trejan wrote:
Mon Sep 13, 2021 8:24 pm
It uses a different floating point number format.
I know. And if you thought that DFP was slow, the bigfloats you recommended are positively glacial. Your test code was the "never do this" that you'd get in any numerical analysis intro, too.
ejolson wrote:
Mon Sep 13, 2021 8:44 pm
How long does it take a TI-99/4A to add all that up?
I really don't want to know. I used abbeyj/ti99basic: TI-99/4A BASIC as a scripting language, which claims to be a portable emulator of the VM that TI BASIC ran in.

An MSX 1, which also uses decimal floating point, would take over 13 hours to calculate that sum.
I thought it was maybe an emulator.

It seems a million terms is too few for C on a modern x86 processor. I adapted the C code from

viewtopic.php?p=1912396#p1912396

to create a Linux test for both single and double precision.

On a Ryzen 1700 the result was

Code: Select all

$ gcc -O2 -Wall -o kahan kahan.c
$ ./kahan 
n=1000000000

21.300481502347942
Elapsed double time 4.163 seconds

21.300481796264648
Elapsed float time 4.611 seconds
This implies double precision is about 10 percent faster. There was an even greater advantage for the PDP-11 on which the C programming language was originally developed; however, I think the Living Computers Museum

https://www.livingcomputers.org/

replaced misspiggy with an emulator so it's difficult to check.

For reference the code is

Code: Select all

#include <stdio.h>
#include <stdarg.h>

#ifdef PICO
#define NMAX 1000000
#include <pico/stdlib.h>
#include <pico/time.h>
#include "tusb.h"
unsigned int ticks_ms(){
    absolute_time_t t=get_absolute_time();
    return to_ms_since_boot(t);
}
#else
#define NMAX 1000000000
#include <time.h>
#include <unistd.h>
#include <stdint.h>
unsigned int ticks_ms(){
    static struct timespec s,t;
    clock_gettime(CLOCK_MONOTONIC_RAW,&t);
    if(!s.tv_sec) s=t;
    return (t.tv_sec-s.tv_sec)*1000+(t.tv_nsec-s.tv_nsec)/1000000;
}
int sleep_ms(uint32_t usec){
    return usleep((useconds_t)usec*1000);
}
#endif

double dosum(){
    unsigned int n=NMAX;
    double s=0;
    double c=0;
    for(unsigned int k=1;k<=n;k++){
        double v=1.0/k+c;
        double snext=s+v;
        c=v-(snext-s);
        s=snext;
    }
    return s;
}

float dosumf(){
    unsigned int n=NMAX;
    float s=0;
    float c=0;
    for(unsigned int k=1;k<=n;k++){
        float v=1.0f/k+c;
        float snext=s+v;
        c=v-(snext-s);
        s=snext;
    }
    return s;
}

extern int dogprintf(const char *restrict fmt,...) 
    __attribute__((format(printf,1,2)));
int dogprintf(const char *restrict fmt,...){
    va_list ap;
    va_start(ap,fmt);
    int r=vprintf(fmt,ap);
    fflush(stdout);
    return r;
}
#define printf dogprintf

int main(){
#ifdef PICO
    stdio_init_all();
    printf("Waiting for usb host");
    while(!tud_cdc_connected()){
      printf(".");
      sleep_ms(500);
    }
    printf("\n");
#endif
    unsigned int tstart,tstop;
    double s;
    printf("n=%u\n",NMAX);

    printf("\n");
    tstart=ticks_ms();
    s=dosum();
    tstop=ticks_ms();
    printf("%.15f\n",s);
    printf("Elapsed double time %g seconds\n",(tstop-tstart)/1000.0);

    printf("\n");
    tstart=ticks_ms();
    s=dosumf();
    tstop=ticks_ms();
    printf("%.15f\n",s);
    printf("Elapsed float time %g seconds\n",(tstop-tstart)/1000.0);

    sleep_ms(500);
    return 0;
}
I think -DPICO allows the code to still compile for the Pico, but I didn't check what the performance differences were for that platform.
Last edited by ejolson on Tue Sep 14, 2021 1:51 am, edited 1 time in total.

User avatar
jahboater
Posts: 8361
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: MicroPython double precision

Tue Sep 14, 2021 1:43 am

ejolson wrote:
Tue Sep 14, 2021 1:27 am
This implies double precision is about 10 percent faster. There was an even greater advantage for the PDP-11 on which the C programming language was originally developed; however, I think the Living Computers Museum
That's interesting.
Double is slightly slower on the Pi4:

Code: Select all

$ gcc -s -O2 try.c -o try
$ ./try
n=1000000000

21.300481502347942
Elapsed double time 7.626 seconds

21.300481796264648
Elapsed float time 7.149 seconds

ejolson
Posts: 10233
Joined: Tue Mar 18, 2014 11:47 am

Re: MicroPython double precision

Tue Sep 14, 2021 2:07 am

jahboater wrote:
Tue Sep 14, 2021 1:43 am
ejolson wrote:
Tue Sep 14, 2021 1:27 am
This implies double precision is about 10 percent faster. There was an even greater advantage for the PDP-11 on which the C programming language was originally developed; however, I think the Living Computers Museum
That's interesting.
Double is slightly slower on the Pi4:

Code: Select all

$ gcc -s -O2 try.c -o try
$ ./try
n=1000000000

21.300481502347942
Elapsed double time 7.626 seconds

21.300481796264648
Elapsed float time 7.149 seconds
Did you set the CPU governor to performance? If not, there may be a slight performance hit on the first timing as the CPU ramps up its clock speed.

On 32-bit Raspberry Pi OS using gcc 8.3.0 on the Pi 4B I get

Code: Select all

$ ./kahan 
n=1000000000

21.300481502347942
Elapsed double time 10.692 seconds

21.300481796264648
Elapsed float time 9.35 seconds
which is slower than your times. Is your Pi over clocked?

I tried a few other x86 machines ranging from a 486DX2 to more recent XEON and EPYC servers. In each case the single and double precision speeds were within about 10 percent of each other.

I may have misspoken about the PDP-11. Although the hardware only supported double precision, it's possible the compiler did not bother to maintain single precision accuracy at intermediate points in a computation. In that case execution speed would be essentially identical using either float or double.

Trying to time anything on the emulator generates random numbers for me. If anyone has Unix running on vintage PDP-11 hardware, here is a version of kahan.c that will compile using the Kernighan and Ritchey C compiler.

Code: Select all

#include <stdio.h>
#include <sys/types.h>
#include <sys/timeb.h>

#define NMAX 1000000l
static double toc_start;
double toc(){
    struct timeb ts;
    ftime(&ts);
    return ts.time+ts.millitm/1000.0-toc_start;
}
double tic(){
    toc_start+=toc();
    return toc_start;
}

double dosum(){
    double s=0;
    double c=0;
    long k;
    for(k=1;k<=NMAX;k++){
        double v=1.0/k+c;
        double snext=s+v;
        c=v-(snext-s);
        s=snext;
    }
    return s;
}

float dosumf(){
    float s=0;
    float c=0;
    long k;
    for(k=1;k<=NMAX;k++){
        float v=1.0/k+c;
        float snext=s+v;
        c=v-(snext-s);
        s=snext;
    }
    return s;
}

int main(){
    double s,t;
    printf("n=%lu\n",NMAX);

    printf("\n");
    tic();
    s=dosum();
    t=toc();
    printf("%.15f\n",s);
    printf("Elapsed double time %g seconds\n",t);

    printf("\n");
    tic();
    s=dosumf();
    t=toc();
    printf("%.15f\n",s);
    printf("Elapsed float time %g seconds\n",t);

    return 0;
}
Please post any results here for comparison.

Note that double precision on graphics cards can be as much as 32 times slower than single precision. What I find surprising is how close the double and single precision versions of MicroPython are even though the Pico does everything related to floating point in software. Could it really?
Last edited by ejolson on Tue Sep 14, 2021 3:59 am, edited 2 times in total.

ejolson
Posts: 10233
Joined: Tue Mar 18, 2014 11:47 am

Re: MicroPython double precision

Tue Sep 14, 2021 3:50 am

ejolson wrote:
Tue Sep 14, 2021 1:27 am
This implies double precision is about 10 percent faster.
Here is an interesting data point. If I accidentally leave the suffix off of 1.0f in the single precision dosumf routine I get

Code: Select all

$ ./kahan ; # Ryzen 1700
n=1000000000

21.300481502347942
Elapsed double time 4.191 seconds

21.300481796264648
Elapsed float time 6.394 seconds
which shows single precision can turn out to be much slower than double precision if one uses the wrong type of floating point constants.

User avatar
jahboater
Posts: 8361
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: MicroPython double precision

Tue Sep 14, 2021 5:22 am

ejolson wrote:
Tue Sep 14, 2021 2:07 am
Did you set the CPU governor to performance? If not, there may be a slight performance hit on the first timing as the CPU ramps up its clock speed.
I changed to "performance" with similar results.
ejolson wrote:
Tue Sep 14, 2021 2:07 am
On 32-bit Raspberry Pi OS using gcc 8.3.0 on the Pi 4B I get

Code: Select all

$ ./kahan 
n=1000000000

21.300481502347942
Elapsed double time 10.692 seconds

21.300481796264648
Elapsed float time 9.35 seconds
which is slower than your times. Is your Pi over clocked?
This Pi runs Raspberry Pi OS 64-bit, GCC 11.2, CPU clock 2.1GHz.
Dropping the overclock gives:

Code: Select all

$ ./try
n=1000000000

21.300481502347942
Elapsed double time 10.68 seconds

21.300481796264648
Elapsed float time 10.015 seconds
which are close enough to your figures.

User avatar
jahboater
Posts: 8361
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: MicroPython double precision

Tue Sep 14, 2021 5:43 am

ejolson wrote:
Tue Sep 14, 2021 3:50 am
which shows single precision can turn out to be much slower than double precision if one uses the wrong type of floating point constants.
Yes, it is doing extra single/double conversions. Here is the loop with the conversions in bold:
.L10:
// try.c:48: float v=1.0/k+c;
scvtf d1, w0 // tmp103, k
// try.c:48: float v=1.0/k+c;
fcvt d2, s2 // tmp106, c
// try.c:47: for(unsigned int k=1;k<=n;k++){
add w0, w0, 1 // k, k,
fmov s3, s0 // s, <retval>
// try.c:47: for(unsigned int k=1;k<=n;k++){
cmp w0, w1 // k, tmp109
// try.c:48: float v=1.0/k+c;
fdiv d1, d4, d1 // tmp104, tmp105, tmp103
// try.c:48: float v=1.0/k+c;
fadd d1, d1, d2 // tmp107, tmp104, tmp106
// try.c:48: float v=1.0/k+c;
fcvt s1, d1 // v, tmp107
// try.c:49: float snext=s+v;
fadd s0, s0, s1 // <retval>, <retval>, v
// try.c:50: c=v-(snext-s);
fsub s2, s0, s3 // tmp108, <retval>, s
// try.c:50: c=v-(snext-s);
fsub s2, s1, s2 // c, v, tmp108
// try.c:47: for(unsigned int k=1;k<=n;k++){
bne .L10 //,
The 64-bit FDIV is probably slower too.

fanoush
Posts: 976
Joined: Mon Feb 27, 2012 2:37 pm

Re: MicroPython double precision

Tue Sep 14, 2021 12:48 pm

Paeryn wrote:
Mon Sep 13, 2021 10:33 pm
I presume jahboater is referring to the fact that a floating point literal without a size suffix is a double. That means that in the following function

Code: Select all

float scale( float x )
{
  return x * 1.1;
}
the compiler will promote x to a double, do the multiplication and then demote the result to a float. To avoid the automatic promotion / demotion you have to tell the compiler that 1.1 is only a float e.g.

Code: Select all

float scale( float x )
{
  return x * 1.1f;
}
there are also gcc options

Code: Select all

-fsingle-precision-constant -Wdouble-promotion -Wfloat-conversion 
that would change the default to float not requiring the 'f' everywhere an also warns if any promotion to double like you described is happening.
Useful with Cortex M4F to stay in HW 32bit floats as much as possible.

Return to “MicroPython”