- 32 bit: 580 mA
- 64 bit: 730 mA
Re: A Pi Pie Chart
the things I do for you, Eric ...
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
I find it amusing that 580 mA is the same value Tom's Hardware reports in
https://www.tomshardware.com/uk/reviews ... 2-w-review
for the Pi Zero 2 is under load. I wonder if they ran the same Lorenz 96 stress test to get that number.
It seems Tom missed the bigger news that 64-bit takes
730/580=1.26
times more power while delivering
1207.32/705.102=1.71
times more performance than 32-bit mode.
Unless the additional power budget blows a fuse or causes throttling, it would seem 64-bit is greener. That's not surprising as it is the native mode of operation for the cores used in the Raspberry Pi Zero 2.
Re: A Pi Pie Chart
It's greener if you keep the cores busy all the time, but most of a computer's time is spent idling, so 32-bit's (very slightly) lower idle current might win out. Also, it's relatively easy to cause thermal throttling in 64-bit mode, which will throw performance right off.
BGA packages don't have the longest life under varying thermal load, and I'm wondering if the really complex micro-wiring/BGA package that the Zero 2 W has might be even more fragile. I'd recommend experimenting, but if the other official resellers are anything like our partner company in the US, they'll have sold out of Zero 2 Ws already.
BGA packages don't have the longest life under varying thermal load, and I'm wondering if the really complex micro-wiring/BGA package that the Zero 2 W has might be even more fragile. I'd recommend experimenting, but if the other official resellers are anything like our partner company in the US, they'll have sold out of Zero 2 Ws already.
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
Weirdly, the four-core Ampere Altra instances on the Oracle cloud are still free. The advantage is all those Linux and ARM programming skills learned with a real Raspberry Pi computer are immediately transferable. I'm still looking for the catch. I wonder what it is.
Re: A Pi Pie Chart
I installed gcc version 11.2 on the 4-core ARM Neoverse-N1 instance in the free-tier Oracle cloud and obtainedejolson wrote: ↑Wed May 26, 2021 6:51 pmThe free tier consists ofI ran the Pi pichart program and discovered this virtual machine is the equivalent of 107 original Raspberry Pi model B computers.
- 4 Ampere Altra ARM cores with 24 GB RAM.
For reference the output isCode: Select all
$ ./pichart-openmp -t "4-core Altra" # Free Oracle A1 instance pichart -- Raspberry Pi Performance OPENMP version 36 Prime Sieve P=14630843 Workers=4 Sec=0.269277 Mops=3469.76 Merge Sort N=16777216 Workers=8 Sec=0.432369 Mops=931.272 Fourier Transform N=4194304 Workers=8 Sec=0.194219 Mflops=2375.53 Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.11563 Mflops=27858.1 The 4-core Altra has Raspberry Pi ratio=107.378 Making pie charts...done.
Code: Select all
$ ./pichart-openmp -t "4-core Altra (gcc-11.2)"
pichart -- Raspberry Pi Performance OPENMP version 36
Prime Sieve P=14630843 Workers=4 Sec=0.27152 Mops=3441.1
Merge Sort N=16777216 Workers=8 Sec=0.435675 Mops=924.205
Fourier Transform N=4194304 Workers=8 Sec=0.158582 Mflops=2909.37
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.107189 Mflops=30051.8
The 4-core Altra (gcc-11.2) has Raspberry Pi ratio=114.664
Making pie charts...done.
I wonder whether the new ARM Neoverse-N2 based Graviton 3 processors coming to the Amazon cloud will be much faster.
https://www.nextplatform.com/2021/12/02 ... rver-chip/
Re: A Pi Pie Chart
I just performed a memory upgrade on that A6-9225 notebook computer from 4GB to 8GB. The new memory is rated the same speed as before, but the new Pi ratio is only 29.7. Unfortunately, the compiler and Linux distribution changed as well, so it's not a fair comparison.ejolson wrote: ↑Thu May 16, 2019 5:53 pmFor reference, the runs for the other notebook computers areThere is a link to the current source code from the first post of this thread if you would like to make your own Pi pie charts.Code: Select all
$ ./pichart-openmp -t "A6-9225" pichart -- Raspberry Pi Performance OPENMP version 30 Prime Sieve P=14630843 Workers=2 Sec=0.653126 Mops=1430.55 Merge Sort N=16777216 Workers=4 Sec=1.47106 Mops=273.717 Fourier Transform N=4194304 Workers=4 Sec=1.05959 Mflops=435.427 Lorenz 96 N=32768 K=16384 Workers=2 Sec=0.274606 Mflops=11730.4 The A6-9225 has Raspberry Pi ratio=33.3926 Making pie charts...done.
Code: Select all
$ ./pichart-openmp -t "A6-9225 w/8GB"
pichart -- Raspberry Pi Performance OPENMP version 30
Prime Sieve P=14630843 Workers=2 Sec=0.652379 Mops=1432.19
Merge Sort N=16777216 Workers=4 Sec=1.47173 Mops=273.592
Fourier Transform N=4194304 Workers=2 Sec=1.84656 Mflops=249.856
Lorenz 96 N=32768 K=16384 Workers=2 Sec=0.252105 Mflops=12777.3
The A6-9225 w/8GB has Raspberry Pi ratio=29.6962
Making pie charts...done.
Anyway, in addition to comparing different hardware at the same point in time, checking performance of a particular machine over time is another important use of benchmarks such as the Pi pie chart program. Could it be the fan in the laptop is now clogged with dust?
Re: A Pi Pie Chart
Here is a revisit to the Pi 3B+ now running the 64-bit version of Void Linux. The CPU governor was set to performance, active cooling employed and no throttling observed.ejolson wrote: ↑Mon Jan 17, 2022 9:00 pmAnyway, in addition to comparing different hardware at the same point in time, checking performance of a particular machine over time is another important use of benchmarks such as the Pi pie chart program. Could it be the fan in the laptop is now clogged with dust?
Code: Select all
$ ./pichart-openmp -t "Pi 3B+ (64-bit)"
pichart -- Raspberry Pi Performance OPENMP version 36
Prime Sieve P=14630843 Workers=4 Sec=1.26867 Mops=736.462
Merge Sort N=16777216 Workers=8 Sec=1.26322 Mops=318.752
Fourier Transform N=4194304 Workers=4 Sec=2.41423 Mflops=191.106
Lorenz 96 N=32768 K=16384 Workers=4 Sec=2.21405 Mflops=1454.9
The Pi 3B+ (64-bit) has Raspberry Pi ratio=14.1929
Making pie charts...done.
$ gcc --version
gcc (GCC) 10.2.1 20201203
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Everything is faster except the merge sort, which coincidentally sorts 32-bit integers. I wonder whether sorting 64-bit integers would tell a different story.
Note also the original 32-bit runs were performed using the MIT/Intel Cilk parallel processing extensions in gcc version 6.4 which were removed in later versions of the compiler. The present runs have been performed with gcc version 10.2.1 using equivalent OpenMP calls, which unfortunately exhibit slightly different performance characteristics.
Re: A Pi Pie Chart
Pi Zero 2W - 32 bit
64 bit result
Code: Select all
pi@zero2w:~/piechart/pichart-36 $ ./pichart-openmp -t 'Pi Zero 2'
pichart -- Raspberry Pi Performance OPENMP version 36
Prime Sieve P=14630843 Workers=4 Sec=2.17265 Mops=430.04
Merge Sort N=16777216 Workers=8 Sec=1.28051 Mops=314.448
Fourier Transform N=4194304 Workers=8 Sec=2.6063 Mflops=177.022
Lorenz 96 N=32768 K=16384 Workers=4 Sec=4.35261 Mflops=740.068
The Pi Zero 2 has Raspberry Pi ratio=10.2443
Making pie charts...done.
Code: Select all
pi@zero2w64:~/pichart-36 $ ./pichart-openmp -t "Pi Zero 2"
pichart -- Raspberry Pi Performance OPENMP version 36
Prime Sieve P=14630843 Workers=4 Sec=1.80282 Mops=518.26
Merge Sort N=16777216 Workers=8 Sec=1.61981 Mops=248.58
Fourier Transform N=4194304 Workers=8 Sec=2.64514 Mflops=174.423
Lorenz 96 N=32768 K=16384 Workers=4 Sec=2.49811 Mflops=1289.46
The Pi Zero 2 has Raspberry Pi ratio=11.5851
Making pie charts...done.
History doesn’t repeat itself, it rarely even rhymes.
Re: A Pi Pie Chart
Slightly different from what I got at launch, upthread. What compiler/optimization did you use?
‘Remember the Golden Rule of Selling: “Do not resort to violence.”’ — McGlashan.
Pronouns: he/him
Pronouns: he/him
Re: A Pi Pie Chart
GCC 10.2. I hadn't seen your post!
Clang 11.0 does slightly better
Code: Select all
pi@zero2w:~/pichart-36 $ ./pichart-openmp -t 'Pi Zero 2'
pichart -- Raspberry Pi Performance OPENMP version 36
Prime Sieve P=14630843 Workers=4 Sec=2.10842 Mops=443.142
Merge Sort N=16777216 Workers=4 Sec=1.38864 Mops=289.963
Fourier Transform N=4194304 Workers=4 Sec=2.27309 Mflops=202.971
Lorenz 96 N=32768 K=16384 Workers=4 Sec=4.27075 Mflops=754.252
The Pi Zero 2 has Raspberry Pi ratio=10.516
Making pie charts...done.
History doesn’t repeat itself, it rarely even rhymes.
Re: A Pi Pie Chart
From what I can tell clang is faster for two of the benchmark problems and slower for the other two. The rating represented by the Pi ratio suggests clang is about the same on average. Repeated runs would be needed to verify that.lurk101 wrote: ↑Wed Mar 16, 2022 10:38 pmGCC 10.2. I hadn't seen your post!
Clang 11.0 does slightly betterCode: Select all
pi@zero2w:~/pichart-36 $ ./pichart-openmp -t 'Pi Zero 2' pichart -- Raspberry Pi Performance OPENMP version 36 Prime Sieve P=14630843 Workers=4 Sec=2.10842 Mops=443.142 Merge Sort N=16777216 Workers=4 Sec=1.38864 Mops=289.963 Fourier Transform N=4194304 Workers=4 Sec=2.27309 Mflops=202.971 Lorenz 96 N=32768 K=16384 Workers=4 Sec=4.27075 Mflops=754.252 The Pi Zero 2 has Raspberry Pi ratio=10.516 Making pie charts...done.
Re: A Pi Pie Chart
Very close. I was just comparing the final ratios. Do you have any idea why Lorenz is significantly faster in 64-bit mode, when the others are about the same?
History doesn’t repeat itself, it rarely even rhymes.
Re: A Pi Pie Chart
Lorenz is the only code with vectorisable floating point. While I haven't looked at the assembler, maybe in 64-bit mode the additional registers available and short vector operations play a significant role.
It's also possible that more effort was spent tuning the optimiser as people actually use 64-bit ARM for technical computing whereas 32-bit not so much.
Re: A Pi Pie Chart
I discovered it's possible to change the CPU frequency on a Pi by selecting performance for the scaling_governor and then setting scaling_max_freq to different values. Here is a script that needs to be run as root and does this for the frequency range from 600 to 1400MHz.
Code: Select all
#!/bin/bash
echo performance >/sys/devices/system/cpu/cpufreq/policy0/scaling_governor
let i=6
while test $i -le 14
do
echo ${i}00000 > /sys/devices/system/cpu/cpufreq/policy0/scaling_max_freq
sleep 1
vcgencmd measure_clock arm
./pichart-openmp -t ${i}00MHz >${i}00MHz.txt
let i=i+1
done
Note that each computational task scales differently depending on how much it relies on CPU speed versus how much it relies on memory bandwidth.
The curve for the Fourier transform is surprisingly lumpy and scales the worst. At the same time prime sieve enjoys near-linear scaling with CPU frequency--likely because it employs a huge number of bitwise operations to conserve memory.
Last edited by ejolson on Sat Apr 09, 2022 12:22 am, edited 2 times in total.
Re: A Pi Pie Chart
Comparing similar graphs for different computers shows how the relative balance between CPU speed and memory bandwidth affects the four computational tasks performed by the Pi pie chart program. To this end, I ran the same set of tests on the Pi 4B in 32-bit mode and compared the results.
The biggest difference in the two graphs is for the Lorenz 96 dynamical simulation. Since the Lorenz 96 runs faster in 64-bit mode, then memory bandwidth may form a more noticeable bottleneck in that case. This is supported by the graph, which shows the Lorenz 96 curve for the Pi 3B+ running in 64-bit mode well below the corresponding curve for the Pi 4B.
If anyone could extend the Pi 4B to higher-frequency overclock settings, that would be interesting. Just post the output of the Pi chart program at each frequency (maybe best to redo the whole curve) and I'd be happy to graph the data.
Last edited by ejolson on Sat Apr 09, 2022 4:18 pm, edited 1 time in total.
Re: A Pi Pie Chart
Here is another Ryzen result, this time for a 6-core Pro 4650G APU.ejolson wrote: ↑Sun Feb 07, 2021 5:15 amCode: Select all
$ ./pichart-openmp ; # Ryzen 7 Pro 1700 (8 cores) pichart -- Raspberry Pi Performance OPENMP version 36 Prime Sieve P=14630843 Workers=16 Sec=0.119876 Mops=7794.13 Merge Sort N=16777216 Workers=32 Sec=0.171299 Mops=2350.59 Fourier Transform N=4194304 Workers=16 Sec=0.153614 Mflops=3003.46 Lorenz 96 N=32768 K=16384 Workers=32 Sec=0.0595906 Mflops=54055.9 My Computer has Raspberry Pi ratio=207.369 Making pie charts...done.
Code: Select all
$ ./pichart-openmp -t "Ryzen 5 Pro 4650G"
pichart -- Raspberry Pi Performance OPENMP version 36
Prime Sieve P=14630843 Workers=12 Sec=0.109507 Mops=8532.16
Merge Sort N=16777216 Workers=24 Sec=0.168878 Mops=2384.28
Fourier Transform N=4194304 Workers=12 Sec=0.171241 Mflops=2694.29
Lorenz 96 N=32768 K=16384 Workers=24 Sec=0.0373754 Mflops=86185.7
The Ryzen 5 Pro 4650G has Raspberry Pi ratio=232.791
Making pie charts...done.
Re: A Pi Pie Chart
Performance versus frequency results for the Pi 4B running in 64-bit mode were posted in
viewtopic.php?p=1993500#p1993500
The resulting graph looks like
What's immediately noticeable is how erratic the Lorenz 96 curve appears. Although no throttling was reported by vcgencmd, it's possible there were still heat-related performance effects.
Could certain CPU frequencies mesh with the 500 MHz AXI bus better?
I wonder if the way the Lorenz 96 curve jumps up and down is repeatable. Maybe the variations are simply due to lucky or unlucky memory allocations.
Why does it do that?
Re: A Pi Pie Chart
Here is a log of the temperatures during a run.
You can see the clock speed rising, but the temperature only rises a little, peaking at 56C at the 2100MHz stage.
Since throttling starts at 80C and the SoC apparently is OK up to 120C, these seem low.
The idle speed is 400MHz to save power.
Code: Select all
Time CPU Core Vcore Temp Health
08:44:30 400 200 0.840 33.6 OK
08:45:30 600 258 0.840 33.6 OK
08:46:30 600 258 0.840 35.0 OK
08:47:30 600 258 0.840 34.0 OK
08:48:30 600 258 0.840 35.0 OK
08:49:30 600 258 0.840 35.5 OK
08:50:30 700 288 0.840 36.5 OK
08:51:30 700 288 0.840 36.5 OK
08:52:30 700 288 0.840 35.0 OK
08:53:30 700 288 0.840 36.5 OK
08:54:30 700 288 0.840 36.5 OK
08:55:30 800 317 0.840 36.0 OK
08:56:30 800 317 0.840 37.0 OK
08:57:30 800 317 0.840 36.5 OK
08:58:30 800 317 0.840 37.9 OK
08:59:30 900 346 0.840 37.9 OK
Time CPU Core Vcore Temp Health
09:00:30 900 346 0.840 37.0 OK
09:01:31 900 346 0.840 37.9 OK
09:02:31 900 346 0.840 40.4 OK
09:03:31 1000 376 0.972 39.4 OK
09:04:31 1000 376 0.972 38.9 OK
09:05:31 1000 376 0.972 40.4 OK
09:06:31 1000 376 0.972 41.3 OK
09:07:31 1100 405 0.972 42.3 OK
09:08:31 1100 405 0.972 40.4 OK
09:09:31 1100 405 0.972 45.7 OK
09:10:31 1200 435 0.972 41.3 OK
09:11:31 1200 435 0.972 40.9 OK
09:12:31 1200 435 0.972 40.9 OK
09:13:31 1300 464 0.972 43.3 OK
09:14:31 1300 464 0.972 42.3 OK
09:15:31 1300 464 0.972 42.3 OK
Time CPU Core Vcore Temp Health
09:16:31 1400 493 0.972 43.8 OK
09:17:31 1400 493 0.972 42.3 OK
09:18:31 1400 493 0.972 46.7 OK
09:19:31 1500 522 0.972 44.8 OK
09:20:31 1500 522 0.972 43.3 OK
09:21:31 1500 522 0.972 48.7 OK
09:22:31 1600 551 0.972 47.7 OK
09:23:31 1600 551 0.972 46.2 OK
09:24:31 1700 581 0.972 44.8 OK
09:25:31 1700 581 0.972 47.2 OK
09:26:31 1700 581 0.972 45.7 OK
09:27:31 1800 611 0.972 48.7 OK
09:28:31 1800 611 0.972 49.6 OK
09:29:31 1900 640 0.972 46.7 OK
09:30:32 1900 640 0.972 49.1 OK
09:31:32 1900 640 0.972 46.2 OK
Time CPU Core Vcore Temp Health
09:32:32 2000 668 0.972 48.2 OK
09:33:32 2000 668 0.972 49.1 OK
09:34:32 2100 700 0.972 47.7 OK
09:35:32 2100 700 0.972 50.1 OK
09:36:32 2100 700 0.972 56.0 OK
09:37:32 2100 700 0.972 47.2 OK
09:38:32 2100 700 0.972 47.2 OK
09:39:32 2100 700 0.972 44.3 OK
09:40:32 2100 700 0.972 44.3 OK
09:41:32 2100 700 0.972 43.8 OK
09:42:32 2100 700 0.972 43.8 OK
09:43:32 2100 700 0.972 43.8 OK
09:44:32 2100 700 0.972 43.3 OK
09:45:32 2100 700 0.972 43.8 OK
09:46:32 2100 700 0.972 42.3 OK
09:47:32 2100 700 0.972 42.3 OK
Re: A Pi Pie Chart
The temperature does seem fine. Is that a script which calls vcgencmd or your own C program for printing the stats?jahboater wrote: ↑Thu Apr 14, 2022 9:23 amHere is a log of the temperatures during a run.
You can see the clock speed rising, but the temperature only rises a little, peaking at 56C at the 2100MHz stage.
Since throttling starts at 80C and the SoC apparently is OK up to 120C, these seem low.
The idle speed is 400MHz to save power.Code: Select all
Time CPU Core Vcore Temp Health 08:44:30 400 200 0.840 33.6 OK 08:45:30 600 258 0.840 33.6 OK 08:46:30 600 258 0.840 35.0 OK 08:47:30 600 258 0.840 34.0 OK 08:48:30 600 258 0.840 35.0 OK 08:49:30 600 258 0.840 35.5 OK 08:50:30 700 288 0.840 36.5 OK 08:51:30 700 288 0.840 36.5 OK 08:52:30 700 288 0.840 35.0 OK 08:53:30 700 288 0.840 36.5 OK 08:54:30 700 288 0.840 36.5 OK 08:55:30 800 317 0.840 36.0 OK 08:56:30 800 317 0.840 37.0 OK 08:57:30 800 317 0.840 36.5 OK 08:58:30 800 317 0.840 37.9 OK 08:59:30 900 346 0.840 37.9 OK Time CPU Core Vcore Temp Health 09:00:30 900 346 0.840 37.0 OK 09:01:31 900 346 0.840 37.9 OK 09:02:31 900 346 0.840 40.4 OK 09:03:31 1000 376 0.972 39.4 OK 09:04:31 1000 376 0.972 38.9 OK 09:05:31 1000 376 0.972 40.4 OK 09:06:31 1000 376 0.972 41.3 OK 09:07:31 1100 405 0.972 42.3 OK 09:08:31 1100 405 0.972 40.4 OK 09:09:31 1100 405 0.972 45.7 OK 09:10:31 1200 435 0.972 41.3 OK 09:11:31 1200 435 0.972 40.9 OK 09:12:31 1200 435 0.972 40.9 OK 09:13:31 1300 464 0.972 43.3 OK 09:14:31 1300 464 0.972 42.3 OK 09:15:31 1300 464 0.972 42.3 OK Time CPU Core Vcore Temp Health 09:16:31 1400 493 0.972 43.8 OK 09:17:31 1400 493 0.972 42.3 OK 09:18:31 1400 493 0.972 46.7 OK 09:19:31 1500 522 0.972 44.8 OK 09:20:31 1500 522 0.972 43.3 OK 09:21:31 1500 522 0.972 48.7 OK 09:22:31 1600 551 0.972 47.7 OK 09:23:31 1600 551 0.972 46.2 OK 09:24:31 1700 581 0.972 44.8 OK 09:25:31 1700 581 0.972 47.2 OK 09:26:31 1700 581 0.972 45.7 OK 09:27:31 1800 611 0.972 48.7 OK 09:28:31 1800 611 0.972 49.6 OK 09:29:31 1900 640 0.972 46.7 OK 09:30:32 1900 640 0.972 49.1 OK 09:31:32 1900 640 0.972 46.2 OK Time CPU Core Vcore Temp Health 09:32:32 2000 668 0.972 48.2 OK 09:33:32 2000 668 0.972 49.1 OK 09:34:32 2100 700 0.972 47.7 OK 09:35:32 2100 700 0.972 50.1 OK 09:36:32 2100 700 0.972 56.0 OK 09:37:32 2100 700 0.972 47.2 OK 09:38:32 2100 700 0.972 47.2 OK 09:39:32 2100 700 0.972 44.3 OK 09:40:32 2100 700 0.972 44.3 OK 09:41:32 2100 700 0.972 43.8 OK 09:42:32 2100 700 0.972 43.8 OK 09:43:32 2100 700 0.972 43.8 OK 09:44:32 2100 700 0.972 43.3 OK 09:45:32 2100 700 0.972 43.8 OK 09:46:32 2100 700 0.972 42.3 OK 09:47:32 2100 700 0.972 42.3 OK
If you reboot the Pi and run the tests again with a clean page allocator does the Lorenz still behave the same way at higher clock speeds? You can select to run only the Lorenz test with the -r8 option (at the expense of the Pi ratio being wrong after that).
Re: A Pi Pie Chart
It is a home written C program that calls vcgencmd and one or two other things.
Despite spawning vcgencmd it is much faster than a shell script and lightweight enough not to artificially trigger the CPU scaling governor. The next refinement is to use the mailboxes directly, but this works. Same usage as vmstat.
Code: Select all
/*
* Raspberry Pi System Monitor
*
* This code is lightweight to avoid triggering the frequency scaling governor.
*
* It would be preferable to use the mailbox directly. However compared to the shell
* script this C version does one less call to vcgencmd (collects CPU and CORE in one go),
* removes the script overheads, and improves the presentation.
* Worst case execution time per sample (700MHz Pi1) 87ms.
*
* USAGE: pistat [delay [count]]
* Delay and count behave as vmstat see "man vmstat" for info.
*
* LIMITS (without simple code change)
* MEMORY >= 256MB && MEMORY <= 128GB
* CPU >= 100MHz && CPU < 10GHz
* CORE >= 100MHz && CORE < 1GHz
* TEMP >= 10C && TEMP < 100C
* VOLTS < 10.0
*
* stress-ng --cpu 0 --cpu-method fft
*
* Last delta: 22/04/2022 (record min and max temperatures)
*/
#define _GNU_SOURCE 1
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <stdint.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <time.h>
#include <errno.h>
#define chr(c) (char)((c) | 48)
#define scan(p,r,c) ({ do --r; while( *p++ != c && r ); p[-1] == c; })
#define result(st) fgets(in, IOSIZE, fp); s = in + st
#define collect(com,st) fp = popen("vcgencmd " #com, "r"); result(st)
#define inline __inline__ __attribute__((always_inline))
#define error(s) { write(2, s "\n", sizeof(s)); _exit(1); }
typedef uint16_t u16;
typedef uint32_t u32;
typedef double real;
#define IOSIZE 1024
static inline char *
utoa( char *p, u32 n )
{
do
*--p = chr(n % 10);
while( n /= 10 );
return p;
}
/*
* Get the Pi description including the memory size
* For example: Raspberry Pi 4 Model B Rev 1.4 8GB
*/
static char *
model( char * const in, char * const buf )
{
const int fd = open("/proc/device-tree/model", O_RDONLY);
if( fd < 0 )
return in;
char *tail, *p = in + read(fd, in, IOSIZE);
close(fd);
/*
* Get the Pi's memory size.
* Also properly validate vcgencmd (no checks are done for the periodic update).
*/
errno = 0;
FILE *fp = popen("vcgencmd get_config total_mem", "r");
if( fp == NULL || errno )
error("cannot run vcgencmd");
fgets(buf, IOSIZE, fp);
pclose(fp);
if( memcmp(buf, "total_mem=", 10) )
error("total_mem field missing");
u32 mem = (u32)strtoul(buf + 10, &tail, 10);
if( *tail != '\n' || mem < 256 || mem > 128*1024 || errno )
error("total_mem field invalid");
char *e = buf + 8;
if( mem < 1024 )
*e = 'M';
else
{
mem /= 1024;
*e = 'G';
}
e[1] = 'B';
e = utoa(e, mem);
*p++ = ' ';
p = mempcpy(p, e, (size_t)((buf + 10) - e));
fp = popen("vcgencmd get_mem gpu", "r");
fgets(buf, IOSIZE, fp);
pclose(fp);
e = memchr(buf + 4, '\n', IOSIZE - 4);
p = mempcpy(mempcpy(p, " (gpu ", 7), buf + 4, (size_t)(e - (buf + 4)));
return mempcpy(p, "B)\n", 3);
}
// Stats so far (time spent at each CPU frequency)
static char *
time_in_state( char *p, char *buf )
{
const int fd = open("/sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state", O_RDONLY);
if( fd < 0 )
return p;
const ssize_t file_len = read(fd, buf, IOSIZE);
close(fd);
bool align = false;
ssize_t rem = file_len;
for( char *t = buf, *ptr = buf; rem > 0 && scan(ptr, rem, '\n'); t = ptr )
{
if( t[6] != ' ' )
{
align = true;
break;
}
}
rem = file_len;
for( char *ptr = buf; rem > 0 && scan(ptr, rem, '\n'); buf = ptr )
{
size_t len = 7;
if( buf[6] == ' ' )
{
if( align )
*p++ = ' ';
len = 6;
}
// discard last three zero's giving MHz to match values below
p = mempcpy(mempcpy(p, buf, len - 3), buf + len, (size_t)(ptr - buf) - len);
}
return p;
}
// bump 199.99 to 200
static inline void
inc( char *p, size_t len )
{
u32 n = 0;
do
n = n * 10 + (*p++ & 15);
while( --len );
utoa(p, n + 1);
}
int
main( int argc, const char *argv[] )
{
char buf[IOSIZE], in[IOSIZE];
char *s, *p;
u32 lines = 15, delay = 0, count = 1;
size_t len;
FILE *fp;
time_t t;
u16 tmp;
if( argc > 1 )
{
char *tail;
delay = (u32)strtoul(argv[1], &tail, 10);
if( tail == argv[1] )
error("invalid delay");
if( argc > 2 )
{
count = (u32)strtoul(argv[2], &tail, 10);
if( tail == argv[2] )
error("invalid count");
}
else
count = UINT32_MAX;
}
// headers
write(1, in, (size_t)(time_in_state(model(in, buf), buf) - in));
buf[5] = buf[2] = ':';
real cur_temp, min_temp = 1000.0, max_temp = 0.0;
for( u32 i = 1; i <= count; ++i )
{
memset(p = buf + 8, ' ', 64);
// CPU MHz "frequency(48)=2100000000"
collect(measure_clock arm core, 14);
len = 3 + (s[9] != '\n');
if( s[len] == '9' )
inc(s, len);
memcpy((p += 8) - len, s, len);
// CORE MHz "frequency(1)=200000000"
result(13);
if( s[3] == '9' )
inc(s, 3);
memcpy((p += 8) - 3, s, 3);
pclose(fp);
// VOLTAGE "volt=1.2000V"
collect(measure_volts, 5);
len = 5 + (s[5] != '0');
if( s[len-2] == '0' && s[len-1] == '0' )
len -= 2;
memcpy((p += 9) - len, s, len);
pclose(fp);
// TEMPERATURE "temp=39.0'C"
collect(measure_temp, 5);
cur_temp = strtod(s, NULL);
if( cur_temp < min_temp )
min_temp = cur_temp;
if( cur_temp > max_temp )
max_temp = cur_temp;
memcpy((p += 8) - 4, s, 4);
pclose(fp);
p += sprintf(p, " %.1f %.1f", min_temp, max_temp);
// HEALTH "throttled=0x0\n"
collect(get_throttled, 10);
memcpy(&tmp, s + 2, 2);
if( tmp == 0x0A30 ) // "0\n"
memcpy((p += 8) - 3, "OK\n", 4);
else
p = stpcpy(p + 2, s);
pclose(fp);
// TIME
time(&t);
struct tm const * const now = localtime(&t);
buf[0] = chr(now->tm_hour / 10);
buf[1] = chr(now->tm_hour % 10);
buf[3] = chr(now->tm_min / 10);
buf[4] = chr(now->tm_min % 10);
buf[6] = chr(now->tm_sec / 10);
buf[7] = chr(now->tm_sec % 10);
// Print record
if( ++lines == 16 )
{
write(1, "Time CPU Core Vcore Temp....Min....Max Health\n", 64);
lines = 0;
}
write(1, buf, (size_t)(p - buf));
if( i < count )
sleep(delay);
}
}
Here is the Lorenz only run, immediately after a reboot.
Looks just as random. Interesting. The other benchmarks were reasonably linear, so why is this one different?
Code: Select all
frequency(48)=600169920
frequency(1)=258583008
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=1.33693 Mflops=2409.42
The 600MHz has Raspberry Pi ratio=2.56334
Making pie charts...done.
frequency(48)=700154304
frequency(1)=288101088
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=1.09299 Mflops=2947.17
The 700MHz has Raspberry Pi ratio=2.69575
Making pie charts...done.
frequency(48)=800191424
frequency(1)=317157728
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.994161 Mflops=3240.14
The 800MHz has Raspberry Pi ratio=2.76038
Making pie charts...done.
frequency(48)=900175808
frequency(1)=346609856
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.931206 Mflops=3459.2
The 900MHz has Raspberry Pi ratio=2.8059
Making pie charts...done.
frequency(48)=1000212864
frequency(1)=376457504
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.847189 Mflops=3802.25
The 1000MHz has Raspberry Pi ratio=2.87301
Making pie charts...done.
frequency(48)=1100249984
frequency(1)=405435072
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.776729 Mflops=4147.17
The 1100MHz has Raspberry Pi ratio=2.93606
Making pie charts...done.
frequency(48)=1200287104
frequency(1)=434953120
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.71486 Mflops=4506.09
The 1200MHz has Raspberry Pi ratio=2.99763
Making pie charts...done.
frequency(48)=1300324224
frequency(1)=464247072
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.714179 Mflops=4510.39
The 1300MHz has Raspberry Pi ratio=2.99834
Making pie charts...done.
frequency(48)=1400361344
frequency(1)=493659680
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.646167 Mflops=4985.13
The 1400MHz has Raspberry Pi ratio=3.0743
Making pie charts...done.
frequency(48)=1500398464
frequency(1)=522452640
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.569199 Mflops=5659.23
The 1500MHz has Raspberry Pi ratio=3.17334
Making pie charts...done.
frequency(48)=1600382848
frequency(1)=551377472
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.553911 Mflops=5815.42
The 1600MHz has Raspberry Pi ratio=3.19502
Making pie charts...done.
frequency(48)=1700419968
frequency(1)=581831552
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.564015 Mflops=5711.24
The 1700MHz has Raspberry Pi ratio=3.18061
Making pie charts...done.
frequency(48)=1800457088
frequency(1)=611600128
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.66178 Mflops=4867.52
The 1800MHz has Raspberry Pi ratio=3.05601
Making pie charts...done.
frequency(48)=1900494080
frequency(1)=639997568
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.729599 Mflops=4415.06
The 1900MHz has Raspberry Pi ratio=2.98237
Making pie charts...done.
frequency(48)=2000478464
frequency(1)=668658688
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.638214 Mflops=5047.25
The 2000MHz has Raspberry Pi ratio=3.08384
Making pie charts...done.
frequency(48)=2100515584
frequency(1)=699996096
pichart -- Raspberry Pi Performance OPENMP version 36
Lorenz 96 N=32768 K=16384 Workers=2 Sec=0.712677 Mflops=4519.89
The 2100MHz has Raspberry Pi ratio=2.99992
Making pie charts...done.
Last edited by jahboater on Fri Apr 22, 2022 10:50 am, edited 2 times in total.
Re: A Pi Pie Chart
Here is the comparison of the two runs:
In a way the second run looks worse. Even with the normalization at the beginning giving the new run a 5 percent advantage, it manages to fall behind the original run at 1300 MHz and doesn't ever catch up in a significant way.
I find it amusing that 2000 MHz was a local maximum of performance for both runs and that the speed of the AXI bus divides evenly into 2000.
From what I can tell, the Lorenz 96 simulation is the only computational task in the group which might be vectorizable. Maybe NEON instructions are stalled for an extra cycle compared to non-vector operations when the AXI bus speed doesn't mesh with the CPU speed. If so, then this could explain the weirdness in the frequency scaling results.
I asked the dog developer for a second opinion, but the only reply was some barking about computer literacy and electromigration. For that I migrated Fido back to the dog house.
Re: A Pi Pie Chart
According toejolson wrote: ↑Thu Jan 06, 2022 3:50 amI installed gcc version 11.2 on the 4-core ARM Neoverse-N1 instance in the free-tier Oracle cloud and obtainedThe updated compiler led to slightly slower integer benchmarks but slightly faster floating point with the net result being about 6 percent faster.Code: Select all
$ ./pichart-openmp -t "4-core Altra (gcc-11.2)" pichart -- Raspberry Pi Performance OPENMP version 36 Prime Sieve P=14630843 Workers=4 Sec=0.27152 Mops=3441.1 Merge Sort N=16777216 Workers=8 Sec=0.435675 Mops=924.205 Fourier Transform N=4194304 Workers=8 Sec=0.158582 Mflops=2909.37 Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.107189 Mflops=30051.8 The 4-core Altra (gcc-11.2) has Raspberry Pi ratio=114.664 Making pie charts...done.
I wonder whether the new ARM Neoverse-N2 based Graviton 3 processors coming to the Amazon cloud will be much faster.
https://www.nextplatform.com/2021/12/02 ... rver-chip/
https://aws.amazon.com/blogs/aws/new-am ... rocessors/
the Graviton 3 is now generally available. A 4-core instance with 8GB RAM costs 0.145 US$ per hour. Therefore, prior to the supply-chain problems an 8GB Pi 4B would pay for itself in
75/0.145 = 517 hours = 21 days.
On the other hand the Graviton 3 processors are reportedly faster, so it might be reasonable to compare the performance-normalized cost.
To this end I spun up an c7g.xlarge instance with Amazon Linux which seems to be based on RedHat 7 and then Ubuntu Server 22.04 for comparison.
The single and multi-core results were
Code: Select all
Pi Ratios for the c7g.xlarge EC2 Instance
Amazon Linux Ubuntu 22.04
serial 4-core serial 4-core
gcc 39.182 137.97 47.162 165.52
clang 41.597 46.773 166.58
The first thing of interest is that Ubuntu was uniformly about 20 percent faster. It is not clear to me whether this was on account of newer compilers in Ubuntu or a lucky placement of the VM instance in the cloud. In order to not spend the monthly dog-treat budget, no further investigation was performed.
I also noticed the Graviton 3 seemed about 40 percent faster than the ARM instances in the Oracle cloud. However, additional performance tuning and benchmarks would be required to draw meaningful conclusions.
At any rate, upon recalling the 4B yields a 31.42 Pi ratio, the numbers reported here suggest it would take
21 (165.52/31.42) = 113 days
for a Pi 4B to pay for itself after taking performance into account.
When comparing the cost trade offs between on-premise and cloud there are other factors such as networking and electricity that need to be taken into account. While the hybrid cloud solutions promoted by IBM
https://www.ibm.com/cloud/hybrid
appear to combine the best of both options, it should be pointed out that Power10 is not generally available in the cloud while capable ARM instances may be found in many places.
Could the Raspberry Pi be part of an effective hybrid cloud strategy? As he that passeth by, and meddleth with strife belonging not to him, is like one that taketh a dog by the ears, I decided to let the sleeping dog developer lie.
Maybe I'll ask the question later on our walk to the the park.
The transcript from the run is
Code: Select all
$ ./pichart-openmp -t c7g.xlarge
pichart -- Raspberry Pi Performance OPENMP version 36
Prime Sieve P=14630843 Workers=4 Sec=0.181739 Mops=5141.03
Merge Sort N=16777216 Workers=8 Sec=0.403133 Mops=998.81
Fourier Transform N=4194304 Workers=8 Sec=0.0815835 Mflops=5655.23
Lorenz 96 N=32768 K=16384 Workers=4 Sec=0.0774795 Mflops=41575.2
The c7g.xlarge has Raspberry Pi ratio=165.519
Making pie charts...done.
$ ./pichart-serial -t c7g.xlarge
pichart -- Raspberry Pi Performance Serial version 36
Prime Sieve P=14630843 Workers=2 Sec=0.7451 Mops=1253.96
Merge Sort N=16777216 Workers=2 Sec=1.58285 Mops=254.385
Fourier Transform N=4194304 Workers=1 Sec=0.319812 Mflops=1442.64
Lorenz 96 N=32768 K=16384 Workers=2 Sec=0.186274 Mflops=17292.9
The c7g.xlarge has Raspberry Pi ratio=47.1621
Making pie charts...done.
$ gcc --version
gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I find it reassuring, despite the other problems in the world, that there is still significant progress being made in cloud infrastructure and performance.
Re: A Pi Pie Chart
Relative to the Graviton 3 the performance-normalized hourly charge for the Pi 4B would beejolson wrote: ↑Wed May 27, 2020 6:58 amFollowing an updated version of the calculation given in
https://www.raspberrypi.org/forums/view ... 0#p1537398
implies the price I can charge my little brother for using the Pi 4B has been reduced from US$ 0.05712 to about US$ 0.04873 per hour.
0.145*31.42/165.52 = US$ 0.02752 per hour.
Therefore, even after the recent money-printing and inflation, the new Graviton 3 instances are about twice as cost effective as the original Graviton. While that's not as cheap as the free 4-core ARM instances in the Oracle cloud, I'm sticking with Raspberry Pi since the included version of Mathematica more than offsets the cost of paying the electric bill.
Re: A Pi Pie Chart
Since the four-core Graviton 2 instance measured previously gave a Pi-ratio of about 100.148 while the new Graviton 3 measured 165.52, then the new processor is 1.65 times faster, assuming the differences between Ubuntu 20.04 and 22.04 are not significant.ejolson wrote: ↑Wed May 25, 2022 8:18 amRelative to the Graviton 3 the performance-normalized hourly charge for the Pi 4B would beejolson wrote: ↑Wed May 27, 2020 6:58 amFollowing an updated version of the calculation given in
https://www.raspberrypi.org/forums/view ... 0#p1537398
implies the price I can charge my little brother for using the Pi 4B has been reduced from US$ 0.05712 to about US$ 0.04873 per hour.
0.145*31.42/165.52 = US$ 0.02752 per hour.
Another comparison between Graviton 2 and 3 processors appears in
https://www.daemonology.net/blog/2022-0 ... ton-3.html
That study found a 1.4 times speedup when compiling a number of open-source projects using the Graviton 3.
It's possible some of the speedup is due to differences in how the storage was provisioned on the respective instances. On the other hand 1.4 lies between the extremes of the speedups observed for the individual tests in the Pi chart program, which makes it a plausible result for a compute-bound task.
I'd provide a detailed analysis except the local web here server is offline due to increased security on account of the war. There is also the problem of currently being surrounded by mosquitos that are attracted to the LCD display.
Although the BARK™ may be worse than the BYTE

https://wiki.theretrowagon.com/wiki/BYT-8
those insect bites are more irritating than any bark I've known.
Re: A Pi Pie Chart
The links to the Pie Pi Chart source code from the original post all seem to be dead!
Anyone have a copy?
Anyone have a copy?
History doesn’t repeat itself, it rarely even rhymes.