Sometime ago there was a thread on running the HPL benchmark which solves systems of linear equations on the Pi 3B. That thread was entitled "Pi3 incorrect results under load (possibly heat related)" and thus the title of this thread. In the previous thread it was eventually determined that the incorrect results were not heat related, but caused by power transients that were often fixable by using an over voltage setting. As the 3B+ runs almost 17 percent faster than the original 3B, I would expect a Linpack speed of about 7.5 Gflops, provided of course that it doesn't crash instead. Note the 3B attains a speed of 6.4 Gflops on the HPL benchmark.
Gareth Halfacree recently posted a number of benchmark scores for the 3B+. Unfortunately his reported Linpack result is about 200 Mflops which is nearly 30 times slower than properlytuned Linpack timings. Is there anyone with a 3B+ and the interest to run the HPL highperformance Linpack benchmark on it? I'm wondering if improvements to the onboard power regulator and thermal management allow the 3B+ to perform reliably and whether the speed is really the 7.5 Gflops I expect. You can follow the instructions for compiling and running HPL from here.
Re: Pi3B+ Hopefully Correct Results Under Load
Just for the record... what you discovered back then inspired us to you use this special Linpack version to develop sane DVFS settings for a couple of other ARM boards in the meantime (Pine64 being the first one two years ago). It's important to test through all available DVFS operating points and as such the whole procedure needs to be fully automated: https://github.com/ehoutsma/StabilityTester
After adjusting one or two variables related to sysfs nodes (or replacing with VC4 commands) it should work directly on Pi 3 B and B+.
At least
Code: Select all
CURVOLT=$(cat ${REGULATOR_HANDLER}${REGULATOR_MICROVOLT})
Code: Select all
CURVOLT=$(vcgencmd measure_volts  cut f2 d=  sed 's/000//')
Re: Pi3B+ Hopefully Correct Results Under Load
Thanks, except I didn't discover it: Vince Weaver at the University of Maine discovered that the Raspberry Pi produced incorrect results when solving systems of linear equations; Kazushige Goto developed the optimized linear algebra subroutine library that became OpenBLAS, and Jack Dongara created the HPL HighPerformance Linpack benchmark used to compare supercomputers.
If the Pi 3B+ achieves 7.5 Gflops, that would place it 29 among the world's fastest supercomputers in 1993. It would then have stayed on the list until 1997, at which point the top 500 supercomputers in the world all became faster than the Raspberry Pi. From my point of view 1997 is not that long ago, though I suppose it really is.
Re: Pi3B+ Hopefully Correct Results Under Load
I'm happy to report the Pi 3B+ which I've been using does not lockup at standard clock settings. Moreover, it is able to correctly solve systems of linear equations using the OpenBLAS subroutine library tested with the Linpack benchmark. Note that I have compiled OpenBLAS from source, because the version distributed with Raspbian is slow having been compiled for ARMv6 compatibility. For the MPI library and Fortran compiler I used
$ aptget install libopenmpidev gfortran
to install the standard binary packages from Raspbian. The resultsindicate that the 3B+ scores 6.718 Gflops, which is about about 4.76 percent faster than the original Raspberry Pi 3B at default clock settings. While this is less than the expected 16 percent based on clock speed, I am quite happy that correct answers were always produced. It would be great if someone else could confirm this as a best effort result at default settings or obtain a better one.
At any rate, it is a big improvement compared to my old 3B which frequently crashes at the default clock settings. No heatsink was used, however a hairdryer set to cold was directed toward the Pi during the run. As a result there was no throttling, as indicated by the outputfor the scriptNo timings were made with the hairdryer set to hot mode, nor will they be made using my Pi!
$ aptget install libopenmpidev gfortran
to install the standard binary packages from Raspbian. The results
Code: Select all
================================================================================
HPLinpack 2.2  HighPerformance Linpack benchmark  February 24, 2016
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 8000
NB : 256
PMAP : Rowmajor process mapping
P : 1
Q : 1
PFACT : Left
NBMIN : 2
NDIV : 2
RFACT : Right
BCAST : 2ring
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words

 The matrix A is randomly generated for each test.
 The following scaled residual check will be computed:
Axb_oo / ( eps * (  x _oo *  A _oo +  b _oo ) * N )
 The relative machine precision (eps) is taken to be 1.110223e16
 Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops

WR02R2L2 8000 256 1 1 50.82 6.718e+00
HPL_pdgesv() start time Mon Apr 23 17:00:48 2018
HPL_pdgesv() end time Mon Apr 23 17:01:39 2018

Axb_oo/(eps*(A_oo*x_oo+b_oo)*N)= 0.0025941 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.

End of Tests.
================================================================================
At any rate, it is a big improvement compared to my old 3B which frequently crashes at the default clock settings. No heatsink was used, however a hairdryer set to cold was directed toward the Pi during the run. As a result there was no throttling, as indicated by the output
Code: Select all
0 frequency(45)=600000000 temp=39.2'C volt=1.2000V throttled=0x0
3 frequency(45)=600000000 temp=38.6'C volt=1.2000V throttled=0x0
6 frequency(45)=600000000 temp=37.6'C volt=1.2000V throttled=0x0
9 frequency(45)=1400002000 temp=39.2'C volt=1.3750V throttled=0x0
12 frequency(45)=1400000000 temp=39.7'C volt=1.3750V throttled=0x0
15 frequency(45)=1400002000 temp=39.7'C volt=1.3750V throttled=0x0
18 frequency(45)=1400000000 temp=49.4'C volt=1.3750V throttled=0x0
21 frequency(45)=1400000000 temp=53.7'C volt=1.3750V throttled=0x0
24 frequency(45)=1400000000 temp=55.8'C volt=1.3750V throttled=0x0
27 frequency(45)=1400000000 temp=58.5'C volt=1.3750V throttled=0x0
30 frequency(45)=1400000000 temp=60.1'C volt=1.3750V throttled=0x0
33 frequency(45)=1400000000 temp=60.1'C volt=1.3813V throttled=0x0
36 frequency(45)=1400000000 temp=60.1'C volt=1.3813V throttled=0x0
39 frequency(45)=1400000000 temp=61.2'C volt=1.3813V throttled=0x0
42 frequency(45)=1400000000 temp=61.2'C volt=1.3813V throttled=0x0
45 frequency(45)=1400000000 temp=61.2'C volt=1.3813V throttled=0x0
48 frequency(45)=1400002000 temp=61.2'C volt=1.3813V throttled=0x0
51 frequency(45)=1399998000 temp=61.2'C volt=1.3813V throttled=0x0
54 frequency(45)=1400000000 temp=61.2'C volt=1.3813V throttled=0x0
57 frequency(45)=1400000000 temp=62.3'C volt=1.3813V throttled=0x0
60 frequency(45)=1400000000 temp=61.2'C volt=1.3813V throttled=0x0
63 frequency(45)=1400000000 temp=60.7'C volt=1.3813V throttled=0x0
66 frequency(45)=1400000000 temp=52.6'C volt=1.3750V throttled=0x0
69 frequency(45)=1400000000 temp=49.4'C volt=1.3750V throttled=0x0
72 frequency(45)=1400000000 temp=47.2'C volt=1.3750V throttled=0x0
75 frequency(45)=600000000 temp=43.5'C volt=1.2000V throttled=0x0
78 frequency(45)=600000000 temp=41.9'C volt=1.2000V throttled=0x0
81 frequency(45)=600000000 temp=40.8'C volt=1.2000V throttled=0x0
84 frequency(45)=600000000 temp=39.7'C volt=1.2000V throttled=0x0
87 frequency(45)=600000000 temp=39.7'C volt=1.2000V throttled=0x0
Code: Select all
#!/bin/bash
# 00 (0x00001): undervoltage
# 01 (0x00002): arm frequency capped
# 02 (0x00004): currently throttled
# 17 (0x20000): arm frequency capped has occured
# 18 (0x40000): throttling has occured
let t=0
while true
do
frequency=`vcgencmd measure_clock arm`
temp=`vcgencmd measure_temp`
volt=`vcgencmd measure_volts core`
throttled=`vcgencmd get_throttled`
echo $t $frequency $temp $volt $throttled
let t=$t+3
sleep 3
done

 Posts: 413
 Joined: Fri Apr 12, 2013 9:27 am
 Location: Essex, UK
Re: Pi3B+ Hopefully Correct Results Under Load
Before comparing Linpack benchmark speeds, we should remember that there are three official versions. Results, currently up to 2014, are available from Netlib in:
http://netlib.org/benchmark/performance.pdf
The first operates on a matrix of order 100 in a Fortran environment at 64 bits floating point precision. In 1996, Netlib accepted my C version as suitable for PCs (there as LipackPC.c). This is the one I run on Raspberry Pi systems. On the Raspberry Pi 3B+, the Double Precision speed obtained was 210 MFLOPS via a 32 bit compilation and 397 MFLOPS at 64 bits (that is useful isn’t it?)  see the following that also includes SP results up to 605 MFLOPS:.
viewtopic.php?f=31&t=44080&start=75#p1300116
The second Linpack benchmark is for solving a system of equations of order 1000, with no restriction on the method or its implementation. Then we have the HighPerformance Linpack.
I ran the HPL benchmark on my original RPi 3 and results are at:
viewtopic.php?f=31&t=44080&p=1026831&hilit=hpl#p1026831
The benchmark no longer exists on my Raspbian SD card. Is there one that I can just download and run, as, at this time, I don’t have time to play with complicated installation.
To me, there appears to be something seriously wrong with the implementation. I have included results for an Intel Atom that indicate the performance profiles I would expect. I would not expect those increases in MFLOPS on doubling the problem size or, assuming my affinity directives were correct, the number of cores used. Then someone might provide an explanation, justifying the results.
http://netlib.org/benchmark/performance.pdf
The first operates on a matrix of order 100 in a Fortran environment at 64 bits floating point precision. In 1996, Netlib accepted my C version as suitable for PCs (there as LipackPC.c). This is the one I run on Raspberry Pi systems. On the Raspberry Pi 3B+, the Double Precision speed obtained was 210 MFLOPS via a 32 bit compilation and 397 MFLOPS at 64 bits (that is useful isn’t it?)  see the following that also includes SP results up to 605 MFLOPS:.
viewtopic.php?f=31&t=44080&start=75#p1300116
The second Linpack benchmark is for solving a system of equations of order 1000, with no restriction on the method or its implementation. Then we have the HighPerformance Linpack.
I ran the HPL benchmark on my original RPi 3 and results are at:
viewtopic.php?f=31&t=44080&p=1026831&hilit=hpl#p1026831
The benchmark no longer exists on my Raspbian SD card. Is there one that I can just download and run, as, at this time, I don’t have time to play with complicated installation.
To me, there appears to be something seriously wrong with the implementation. I have included results for an Intel Atom that indicate the performance profiles I would expect. I would not expect those increases in MFLOPS on doubling the problem size or, assuming my affinity directives were correct, the number of cores used. Then someone might provide an explanation, justifying the results.
Re: Pi3B+ Hopefully Correct Results Under Load
I have a binary that will run on the most recent version of Raspbian. As you mentioned, the compilation and runtime configuration can be complicated. There is also the question whether vcgencmd, used to monitor temperature and clock speed while the program runs, creates noticeable overhead slowing things down and whether OpenMPI is the best MPI library to use on the Pi.RoyLongbottom wrote: ↑Wed Apr 25, 2018 11:52 amThe benchmark no longer exists on my Raspbian SD card. Is there one that I can just download and run, as, at this time, I don’t have time to play with complicated installation.
To me, there appears to be something seriously wrong with the implementation. I have included results for an Intel Atom that indicate the performance profiles I would expect. I would not expect those increases in MFLOPS on doubling the problem size or, assuming my affinity directives were correct, the number of cores used. Then someone might provide an explanation, justifying the results.
For these reasons I am not certain my Gflop numbers are the best possible. Also for these reasons it is a better verification for someone to think through the configuration issues independently to see if they obtain speeds which are consistent or better. Even so, I'll see about putting my binary someplace for download, as such things have also helped people diagnose stability and cooling issues. It is interesting that your Pi 3B also crashed for matrices where n=8000. From a stability point of view the Pi 3B+ is much improved over the original 3B.
Re: Pi3B+ Hopefully Correct Results Under Load
This recent run of the downloadable binary independently compiled by Dr Weaver and made available here suggests 6.78 Gflop as a besteffort Linpack score for the 3B+ computer.