User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

[SOLVED] Pi 3B+ single core performance better than Pi 3B?

Thu Mar 22, 2018 6:02 pm

Long ago I started (single core) performance comparison of high frequency Arduinos.
Then ESP8266 and ESP32 numbers were added, and a 2.8GHz Intel number.
Later I added Pi numbers as well, updated 2 days ago with Pi 3B number:
https://forum.arduino.cc/index.php?topi ... msg3413818
Image

So basically the Due is most performant Arduino, "only" 83 times slower than Intel.
ESPs are better than all Arduinos, and PIs are even better.
And until today all made sense, Pi 3B is 1200/900=4/3 times better than Pi 2B (34μs versus 45μs).

Today I compiled q32.c with -O3 on my new Pi 3B+ and only get the same number as for Pi 3B.
I would have expected factor 1400/1200=7/6 better than Pi 3B.
Pi 3B+ is real, see 191Mbit/s over lan below (A), although my laptop gets 386Mbit/s with same speedtest-cli.

The code is just excessive search for minimal magic 3x3 square consisting of distinct primes.
Here you can download it, or see (B) below.
https://stamm-wilbrandt.de/en/forum/q32.c

I would have expected at least search time <30μs after forcing CPU frequency to 1.4GHz and running as root ...

What am I missing here, why does 3B+ show same single core integer performance as 3B?

Code: Select all

pi@raspberrypi3Bplus:~ $ sudo su
root@raspberrypi3Bplus:/home/pi# echo 1400000 >  /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq 
root@raspberrypi3Bplus:/home/pi# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
1400000
root@raspberrypi3Bplus:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

34us
root@raspberrypi3Bplus:/home/pi# 

(A)

Code: Select all

pi@raspberrypi3Bplus:~ $ speedtest-cli 
Retrieving speedtest.net configuration...
Testing from Kabel BW (46.223.20.147)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by PfalzKom (Ludwigshafen) [11.73 km]: 24.753 ms
Testing download speed................................................................................
Download: 191.04 Mbit/s
Testing upload speed....................................................................................................
Upload: 19.59 Mbit/s
pi@raspberrypi3Bplus:~ $ 

(B)

Code: Select all

/* determine minimal prime 3x3 magic square; for more details see bottom */

#include <stdio.h>
#include <sys/time.h>
#include <stdint.h>

uint32_t B[]={0x35145105,0x4510414,0x11411040,0x45144001};

#define Prime(i) ((B[(i)>>5] & (0x80000000UL >> ((i)%32))) != 0)

#define forall_odd_primes_less_than(p, m, block) \
  for((p)=3; (p)<(m); (p)+=2)                    \
    if (Prime((p)))                              \
      block

uint8_t p,a,b,c,d;
struct timeval tv0,tv1;

int main(void)
{
  gettimeofday(&tv1, NULL);      // wait for usec change
  do  gettimeofday(&tv0, NULL);  while (tv0.tv_usec == tv1.tv_usec);

  forall_odd_primes_less_than(p, 64,
    forall_odd_primes_less_than(a, p,
      if Prime(2*p-a)
      {
        forall_odd_primes_less_than(b, p,
          if ( (b!=a) && Prime(2*p-b) )
          {
            c= 3*p - (a+b);

            if ( (c<2*p) && (2*p-c!=a) && (2*p-c!=b) && Prime(c) && Prime(2*p-c) )
            {
              if (2*a+b>2*p)
              {
                d = 2*a + b - 2*p;   // 3*p - (3*p-(a+b)) - (2*p-a)

                if ( (d!=a) && (d!=b) && (d!=2*p-c) && Prime(d) && Prime(2*p-d) )
                {
                  gettimeofday(&tv1, NULL);

                  printf("%3u|%3u|%3u|\n%3u|%3u|%3u|\n%3u|%3u|%3u|\n",
                    a,b,c,2*p-d,p,d,2*p-c,2*p-b,2*p-a);

                  printf("\n%ldus\n",
                    1000000*(tv1.tv_sec-tv0.tv_sec)+tv1.tv_usec-tv0.tv_usec);
                  return 0;
                }
              }
            }
          }
        )
      }
    )
  )
}

/*

it always exists this by rotation and flippings (= is p, -/+ is less/greater p)
--?
?=?
???

proof by enumeration of all possibilities

++
 =
 --

+-  +-+ +-+
 =   =  +=-
 +- -+- -+-

        +-+
        -=
        -+-

    +--
     =
     +-

-+  -+  -++
 =  +=- +=-
 -+  -+ --+

        -+-
        +=-
         -+

    -+
    -=
     -+

--
 =
 ++



row/column/diagonal sum is 3*p

a b 3*p-(a+b)=c   - - +

  p 2*a+b-2*p=d   + = -

    2*p-a         - + +
*/
Last edited by HermannSW on Fri Mar 23, 2018 1:39 am, edited 1 time in total.
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

el_grappaduro
Posts: 14
Joined: Thu Mar 22, 2018 7:06 pm

Re: Pi 3B+ single core performance better than Pi 3B?

Thu Mar 22, 2018 7:10 pm

This is Pi 2 running at 600 MHz:

Code: Select all

root@raspberrypi:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
600000
root@raspberrypi:~# vcgencmd measure_clock arm
frequency(45)=600000000
root@raspberrypi:~# ./g32 
 47| 29|101|
113| 59|  5|
 17| 89| 71|

45us
This is the same Pi 2 running at 900 MHz:

Code: Select all

root@raspberrypi:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
900000
root@raspberrypi:~# vcgencmd measure_clock arm
frequency(45)=900000000
root@raspberrypi:~# ./g32 
 47| 29|101|
113| 59|  5|
 17| 89| 71|

29us
It seems you're running all the time at 600 MHz. Have you ever checked real clockspeeds using 'vcgencmd measure_clock arm'?

ejolson
Posts: 9482
Joined: Tue Mar 18, 2014 11:47 am

Re: Pi 3B+ single core performance better than Pi 3B?

Thu Mar 22, 2018 8:43 pm

HermannSW wrote:
Thu Mar 22, 2018 6:02 pm
Today I compiled q32.c with -O3 on my new Pi 3B+ and only get the same number as for Pi 3B.
Maybe both Pi computers are running in low power 600MHz mode because of a substandard power supply.

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 12:17 am

el_grappaduro wrote:
Thu Mar 22, 2018 7:10 pm
This is Pi 2 running at 600 MHz:

Code: Select all

root@raspberrypi:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
600000
root@raspberrypi:~# vcgencmd measure_clock arm
frequency(45)=600000000
root@raspberrypi:~# ./g32 
 47| 29|101|
113| 59|  5|
 17| 89| 71|

45us
This is the same Pi 2 running at 900 MHz:

Code: Select all

root@raspberrypi:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq 
900000
root@raspberrypi:~# vcgencmd measure_clock arm
frequency(45)=900000000
root@raspberrypi:~# ./g32 
 47| 29|101|
113| 59|  5|
 17| 89| 71|

29us
It seems you're running all the time at 600 MHz. Have you ever checked real clockspeeds using 'vcgencmd measure_clock arm'?
I don't know why you get these numbers, but that seems to be really the problem.

I just measured frequency as you requested, and it indeed is 1.4GHz.
But the number is worse than your Pi2 number?!?!?

Code: Select all

pi@raspberrypi3Bplus:~ $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
1400000
pi@raspberrypi3Bplus:~ $ vcgencmd measure_clock arm
frequency(45)=1400146000
pi@raspberrypi3Bplus:~ $ ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

34us
pi@raspberrypi3Bplus:~ $ whoami
pi
pi@raspberrypi3Bplus:~ $ 
ejolson wrote:
Thu Mar 22, 2018 8:43 pm
Maybe both Pi computers are running in low power 600MHz mode because of a substandard power supply.
In that case measurement would not show 1.4GHZ, right?
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 12:22 am

Aaah -- it seems to be just a Linux timing issue for too small amount of time -- now once Pi 3B+ showed 15μs !
But running many times again only shows 34μs or 35μs.

I will try 1,000,000 loops as in this blog posting to get away from these low time values:
https://www.ibm.com/developerworks/comm ... erformance

Code: Select all

pi@raspberrypi3Bplus:~ $ ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

15us
pi@raspberrypi3Bplus:~ $ ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

35us
pi@raspberrypi3Bplus:~ $ 
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 12:45 am

1M loops take 40s which is even worse than before,.

This is the changed code:
https://stamm-wilbrandt.de/en/forum/q32.1M.c

Code: Select all

pi@raspberrypi3Bplus:~ $ diff q32.c q32.1M.c
17a18
> unsigned i, N=1000000;
23a25
> for(i=1; i<=N; ++i)
40a43,44
> if (i==N)
> {
48a53
> }
pi@raspberrypi3Bplus:~ $ 

But running frequency measurement in 2nd ssh session in parallel shows that it really runs only in 600MHz although min scaling frequency is 1400000?!?!?!

Code: Select all

pi@raspberrypi3Bplus:~ $ vcgencmd measure_clock arm
frequency(45)=600000000
pi@raspberrypi3Bplus:~ $ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1400000
pi@raspberrypi3Bplus:~ $ 

So what is the correct method to force CPU to 1.4GHz?


P.S:
The numbers make sense, 45μs for 600MHz on Pi 2, and 29μs fir 900MHz as measured by @el_grappaduro correlate with the 15μs I measured once for Pi 3B+:

Code: Select all

bc -ql
45/(900/600)
30.00000000000000000000
45/(1400/600)
19.28571428571428571431


P.P.S:
I found another USB hub that explicitely states it can do DC5V and 2.5A, and another power adapter than can do 2.5A.
With that I am able to boot the Pi 3b+ successfully.
But again current scaling frequency drops to 600MHZ although minimal scaling frequency is set to 1.4GHz.
It seems I have a power supply problem, better than before as in this thread, but still limiting CPU power:
viewtopic.php?f=28&t=208773
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

[SOLVED] Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 1:28 am

OK, I disconnected 5V step up converter from one of my robots and used 600mAh 25C lipo to power Pi 3B+ (see image below).
Now all is fine, the 25C guarantee that Pi 3B+ gets whatever it needs (25C can deliver 5A at 5V).
And now EVERY single run shows 14μs !

Code: Select all

root@raspberrypi3Bplus:/home/pi# vcgencmd measure_clock arm
frequency(45)=1400146000
root@raspberrypi3Bplus:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
root@raspberrypi3Bplus:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
root@raspberrypi3Bplus:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
root@raspberrypi3Bplus:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
root@raspberrypi3Bplus:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
root@raspberrypi3Bplus:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
root@raspberrypi3Bplus:/home/pi# ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

14us
root@raspberrypi3Bplus:/home/pi# 

It is interesting that 1M loops take slightly longer per loop (17μs):

Code: Select all

root@raspberrypi3Bplus:/home/pi# ./q32.1M
 47| 29|101|
113| 59|  5|
 17| 89| 71|

17674786us
root@raspberrypi3Bplus:/home/pi#

The 2.8GHz Intel CPU shows better time with 1M loops (2.5) than with single run (5):

Code: Select all

$ ./q32
 47| 29|101|
113| 59|  5|
 17| 89| 71|

5us
$ ./q32.1M
 47| 29|101|
113| 59|  5|
 17| 89| 71|

2514477us
$ 

Image


P.S:
I just removed the 5V step up converted and directly connected 4.06V loaded 25C lipo to Pi 3B+.
Pi works, but again current frequency drops to 600MHz.
So powering Pi 3B+ with 5V and 2.5A is essential to get high CPU frequencies working.
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

el_grappaduro
Posts: 14
Joined: Thu Mar 22, 2018 7:06 pm

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 8:18 am

HermannSW wrote:
Fri Mar 23, 2018 12:17 am
ejolson wrote:
Thu Mar 22, 2018 8:43 pm
Maybe both Pi computers are running in low power 600MHz mode because of a substandard power supply.
In that case measurement would not show 1.4GHZ, right?
Linux always reports wrong clockspeeds and 'vcgencmd measure_clock arm' only shows actual value. When you checked your Pi was idle (no performance needed). Then it's 1400 MHz. When you run something more demanding that needs performance it gets down to 600 MHz. You get the high clockspeed only when not needed :lol:

I have not been aware of this until recently: viewtopic.php?f=63&t=208057&p=1287591#p1287370

This other vcgencmd command is interesting since displaying whether the problem occured since last boot. Then you know you have to invest in a better power supply (3 coming since affected on all 3 out of my 3 Pi)

User avatar
jahboater
Posts: 8104
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 8:31 am

Instead of the legacy gettimeofday() you might like to try the posix clock_gettime( CLOCK_MONOTONIC, ...
gettimeofday reports clock on the wall time (as CLOCK_REALTIME), while CLOCK_MONOTONIC reports a hi-res monotonically increasing counter. CLOCK_MONOTONIC_RAW is the same but without any interference from NTP.
clock_gettime reports the time in nanoseconds.
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* nanoseconds */
};
so to return the time in nanoseconds as a 64-bit unsigned number:-
return (uint64_t)now.tv_sec * 1000000000U + (uint64_t)now.tv_nsec;
or your trick to wait for a new second will still work.
There are several other useful clocks available, including CPU time CLOCK_PROCESS_CPUTIME_ID

el_grappaduro
Posts: 14
Joined: Thu Mar 22, 2018 7:06 pm

Re: Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 8:56 am

HermannSW wrote:
Fri Mar 23, 2018 12:45 am
But running frequency measurement in 2nd ssh session in parallel shows that it really runs only in 600MHz although min scaling frequency is 1400000?!?!?!
Yes. It explained here: viewtopic.php?f=29&t=82373

I use Rpi monitor https://rpi-experiences.blogspot.com/p/rpi-monitor.html on all my Pi but was not aware of the problem since software use the Linux way to get clockspeed which seems to be wrong? I don't understand why?

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Fri Mar 23, 2018 5:53 pm

Thanks for the information and the links.
Especially the thread with undervoltage was important to me.
Image

I saw that symbol on HDMI monitor often, not being aware of its meaning.

Determining CPU frequency is not that important for runtime of q32.c since all PIs have exactly two available frequencies. Given the short (microseconds) runtime it is unlikely that a CPU changes its frequency during program runs. So it either runs at 600MHz, or at the high frequency. For Pi 3B that is either 34μs at 600MHz or 14μs at 1400Mz.

I just measured all PIs again while enforcing runs under high CPU frequency and updated the table in the other thread:
https://forum.arduino.cc/index.php?topi ... msg3413818
Image

Then I created this diagram for only the PIs and the Intel CPU:
Image

For the PI 3Bs the measured values totally make sense:

Code: Select all

$ bc -ql
17/(1400/1200)
14.57142857142857142865
34/(1400/600)
14.57142857142857142859
Last edited by HermannSW on Wed Jun 06, 2018 12:39 pm, edited 1 time in total.
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Sat Mar 24, 2018 9:41 pm

HermannSW wrote:
Fri Mar 23, 2018 5:53 pm
Especially the thread with undervoltage was important to me.
Image

I saw that symbol on HDMI monitor often, not being aware of its meaning.
Today I received the official Pi power supply (delivery after 1 day 👍).
I am happy that I have not seen under voltage symbol on HDMI monitor with it, even taking videos.
HermannSW wrote:
Fri Mar 23, 2018 5:53 pm
Determining CPU frequency is not that important for runtime of q32.c since all PIs have exactly two available frequencies. Given the short (microseconds) runtime it is unlikely that a CPU changes its frequency during program runs.
I should have verified that :D

These are the timings for 1000 runs of q32: <ADD>average over 1000 mesurements is 16.111μs</ADD>

Code: Select all

pi@raspberrypi3Bplus:~ $ for((i=1; i<=1000; i++)); do ./q32 | grep us ; done | sort -n | uniq -c
    241 14us
    495 15us
     78 16us
     40 17us
     20 18us
     18 19us
     27 20us
     43 21us
     26 22us
      2 23us
      2 26us
      1 27us
      1 28us
      1 37us
      1 38us
      1 39us
      1 64us
      1 113us
      1 223us
pi@raspberrypi3Bplus:~ $ 
Since the normal values for 600MHz/1400MHz are 34μs/14μs my previous statement that CPU frequency change during runtime of q32 is unlikely is either wrong, or taking μs times under Raspbian/Linux is not as accurate as I thought.
Last edited by HermannSW on Mon Apr 16, 2018 2:37 pm, edited 1 time in total.
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Sat Mar 24, 2018 11:28 pm

I did reboot with the other new power supply I ordered from amazon.
That can even do 5V/3A instead of 5.1V/2.5A of official Raspberry power supply.

Previous run was done with forcing CPU frequency to 1400.
Seems not to be needed, similar runtime distribution without forcing CPU frequency to 1400MHz:

Code: Select all

pi@raspberrypi3Bplus:~ $ cat /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq
600000
pi@raspberrypi3Bplus:~ $ for((i=1; i<=1000; i++)); do ./q32 | grep us ; done | sort -n | uniq -c
    128 14us
    488 15us
    171 16us
     64 17us
     22 18us
     30 19us
     20 20us
     24 21us
     33 22us
      2 23us
      3 24us
      1 25us
      1 26us
      2 34us
      4 35us
      2 36us
      1 42us
      1 43us
      1 45us
      2 46us
pi@raspberrypi3Bplus:~ $ 
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Mon Feb 10, 2020 5:58 am

I realized that numbers for Pi4B are missing (not available at last posting time).

Until now I determined all q32.c numbers for executable generated with "gcc -O6".
https://stamm-wilbrandt.de/en/forum/q32.c
This time I wanted to compare different optimization levels as well (2/3/6).
And I installed clang compiler (with maximal optimization level 3).

Bash function "tst" together with "paste" command provide tabular output.
I did insert one 12us dummy entry for each test compiler+option.
This ensures that top rows until 19us are displayed side-by-side.
Remember that you have to subtract 1 for 12us count.
Bottom sed replacements are not perfect either, but I consider this table good enough.
Best runtimes are now 12/14/17/27/30 microseconds for Pi 4B/3B+/3B/2B/Zero[W].

Code: Select all

🍓 function tst { printf "%14s\n" $2; (echo "  12us";
> for((i=1; i<=$1; ++i)); do ./q32.$2; done) | grep us |
> printf "%6s\n" `cat` | sort -n | uniq -c; }
🍓 N=1000; paste -d";" <(tst N gcc.O2) <(tst N gcc.O3)  \
> <(tst N gcc.O6) <(tst N clang.O2) <(tst N clang.O3) | \
> sed "s/^;/              ;/;s/;;/;              ;/g"
        gcc.O2;        gcc.O3;        gcc.O6;      clang.O2;      clang.O3
     56   12us;     61   12us;     65   12us;      2   12us;      1   12us
    461   13us;    427   13us;    431   13us;    137   13us;    123   13us
    196   14us;    230   14us;    247   14us;    402   14us;    396   14us
    138   15us;    153   15us;    145   15us;    221   15us;    211   15us

Code: Select all

    103   16us;     96   16us;     77   16us;    119   16us;    147   16us
     32   17us;     24   17us;     27   17us;     88   17us;    102   17us
      9   18us;      6   18us;      6   18us;     23   18us;     14   18us
      1   19us;      1   19us;      1   19us;      5   19us;      3   19us
      1   31us;      1   25us;      1   26us;      1   20us;      1   20us
      1   49us;      1   62us;      1   70us;      1   21us;      1   35us
      1   73us;      1   84us;              ;      1  147us;      1   36us
      1  834us;              ;;      1 1555us;      1   41us
      1 1307us;              ;;              ;
🍓 
Last edited by HermannSW on Mon Feb 10, 2020 6:11 am, edited 4 times in total.
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

ejolson
Posts: 9482
Joined: Tue Mar 18, 2014 11:47 am

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Mon Feb 10, 2020 6:04 am

HermannSW wrote:
Mon Feb 10, 2020 5:58 am

Code: Select all

🍓 function tst { printf "%14s\n" $2; (echo -e "  11us\n  12us";
> for((i=1; i<=$1; ++i)); do ./q32.$2; done) | grep us |
> printf "%6s\n" `cat` | sort -n | uniq -c; }
Making a Raspberry Pi prompt may be the best use of Unicode emoticons I've seen so far. Is it really a strawberry? Results are interesting too!

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Mon Feb 10, 2020 6:23 am

ejolson wrote:
Mon Feb 10, 2020 6:04 am
HermannSW wrote:
Mon Feb 10, 2020 5:58 am

Code: Select all

🍓 function tst { printf "%14s\n" $2; (echo -e "  11us\n  12us";
...
Making a Raspberry Pi prompt may be the best use of Unicode emoticons I've seen so far. Is it really a strawberry?
Yes, a strawberry, closest to raspberry I was able to find, see this thread for details:
"Should RPF work on getting Raspberry Unicode symbol?"
https://www.raspberrypi.org/forums/view ... 6&t=262945
Image
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Tue Jan 26, 2021 7:47 am

Just tested new Raspberry Pico microcontroller with q32.c;
did need small changes since gettimeofday() did not work, used time_us_32():
viewtopic.php?f=145&t=300265&p=1804700#p1804700

Code: Select all

Compiled on Aug 13 2017, 15:25:34.
Port /dev/ttyACM0, 09:01:21

Press CTRL-A Z for help on special keys

 47| 29|101|
113| 59|  5|
 17| 89| 71|

357us

357us (compiled with -O3) is slower than ESP microcontrollers (and PIs of course), but faster than all Arduinos tested:
Image
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Mon Feb 01, 2021 8:31 pm

I don't know why, but today q32.pico.c did take only 330µs instead of 357µs for running q32.pico.c compiled with -O3 on 130MHz Pico, which is still slower than 304µs of 80MHz ESP8266.

I experimented today:
"Overclocking Pico with up to 270MHz works"
viewtopic.php?f=145&t=301902

Overclocked with 270MHz Pico took 158µs today, only slightly more than the 149µs of the 160MHz ESP32.
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

ejolson
Posts: 9482
Joined: Tue Mar 18, 2014 11:47 am

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Mon Feb 01, 2021 9:36 pm

HermannSW wrote:
Mon Feb 01, 2021 8:31 pm
I don't know why, but today q32.pico.c did take only 330µs instead of 357µs for running q32.pico.c compiled with -O3 on 130MHz Pico, which is still slower than 304µs of 80MHz ESP8266.

I experimented today:
"Overclocking Pico with up to 270MHz works"
viewtopic.php?f=145&t=301902

Overclocked with 270MHz Pico took 158µs today, only slightly more than the 149µs of the 160MHz ESP32.
Is there any way to run a benchmark that takes at least a couple seconds to execute so one can get an accurate timing?

User avatar
HermannSW
Posts: 5171
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany

Re: [SOLVED] Pi 3B+ single core performance better than Pi 3B?

Tue Feb 02, 2021 8:35 am

The timing is accurately taken with time_us_32().
You can just add a for loop and determine time for say 1,000 or 1,000,000 runs:
https://gist.github.com/Hermann-SW/6772 ... 543d9d3dbe
I did that years ago and got the same relative performance numbers.
For a micro controller running 158us is long (Pico runs most instructions at 1 cycle, at 130MHz default clock).

q32.c is a simple runtime test, single core, only integer arithmetic.

There are other benchmarks you can use for checking memory use or else.


While answering here, I just added diagram to the thread showing linear dependency:
viewtopic.php?f=145&t=301902&p=1810769#p1810769
Image
https://hermann-sw.github.io/planar_graph_playground
https://stamm-wilbrandt.de/en#raspcatbt
https://github.com/Hermann-SW/memrun
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://stamm-wilbrandt.de/en/Raspberry_camera.html

Return to “General discussion”