Agreed. If 32-bit integers and addresses are sufficient, then 32-bit mode can be faster (due in part to memory bandwidth effects). Another comparison, using code similar but not exactly the same as in sysbench, shows 32-bit integers performing at twice the speed as 64-bit integers on x86 hardware running in 64-bit mode. At the same time 32-bit integers are 15 times faster than 64-bit integers on Raspberry Pi 3B running in 32-bit mode. It would be interesting to run the same test with the 3B in 64-bit mode to see how things change.tkaiser wrote: ↑Sun Mar 25, 2018 5:13 pmNot all, see the C-Ray test results and explanation: https://libre.computer/2018/03/21/raspb ... omparison/
3 different Cortex-A53 SoCs, all clocked between 1.4GHz and 1.5GHz so the 32-bit vs. 64-bit difference can be studied in detail.
Re: 64-bit operating system
Re: 64-bit operating system
ejolson,
My little test below runs in about 16.5 seconds no matter if the type used is a 32 or 64 bit integer on my Intel 64 bit Surface Pro. After many runs I could lean toward saying the 64 bit version is half a second or so slower.
I would think something is very broken if using 64 bit ints on a 64 bit machine is half as fast as using 32 bit ints.
That seemed so unlikely I had to try for myself....shows 32-bit integers performing at twice the speed as 64-bit integers on x86 hardware running in 64-bit mode.
My little test below runs in about 16.5 seconds no matter if the type used is a 32 or 64 bit integer on my Intel 64 bit Surface Pro. After many runs I could lean toward saying the 64 bit version is half a second or so slower.
Code: Select all
#include <stdio.h>
#include <stdint.h>
typedef uint32_t fibo_t;
//typedef uint64_t fibo_t;
fibo_t fibo (fibo_t n) {
if (n == 0)
{
return 0;
}
if (n < 2)
{
return 1;
}
return (fibo(n - 2) + fibo(n - 1));
}
int main(int argc, char* argv[])
{
for (fibo_t n = 0; n <= 46; n++)
{
fibo_t f = fibo (n);
printf("fibo(%lu) = %lu\n", n, f);
}
return(0);
}
Memory in C++ is a leaky abstraction .
Re: 64-bit operating system
I suggest looking at the classic work for x86 instruction timings:-
http://www.agner.org/optimize/instruction_tables.pdf
http://www.agner.org/optimize/instruction_tables.pdf
Re: 64-bit operating system
jahboater,
Interesting document thanks. Does Intel even still publish instruction timings for their CPUs?
I'm going to give it a miss though. There is no way I'm about to rewrite my code in assembler, spending months wading through that 300 plus page document to find the optimal way to do everything. Also I'm not about to modify the code generator of my compilers to change the instruction sequences they generate.
Also, I think you will find that it is impossible to know the execution time of any single instruction in your program anymore. Modern processors are loaded with pipelines, out of order execution, speculative execution, parallel dispatch etc. All of which makes the execution time of any instruction quite variable, depending on what is around it and how your program flows.
That's before we talk about the very large variability in execution time of any instruction depending on your programs use of multiple levels of cache memory. It's pretty much not worth the effort to micro-optimize by selecting instructions when your program can be slowed by a factor of 10 or 1000 or more by bad use of cache.
Interesting document thanks. Does Intel even still publish instruction timings for their CPUs?
I'm going to give it a miss though. There is no way I'm about to rewrite my code in assembler, spending months wading through that 300 plus page document to find the optimal way to do everything. Also I'm not about to modify the code generator of my compilers to change the instruction sequences they generate.
Also, I think you will find that it is impossible to know the execution time of any single instruction in your program anymore. Modern processors are loaded with pipelines, out of order execution, speculative execution, parallel dispatch etc. All of which makes the execution time of any instruction quite variable, depending on what is around it and how your program flows.
That's before we talk about the very large variability in execution time of any instruction depending on your programs use of multiple levels of cache memory. It's pretty much not worth the effort to micro-optimize by selecting instructions when your program can be slowed by a factor of 10 or 1000 or more by bad use of cache.
Memory in C++ is a leaky abstraction .
Re: 64-bit operating system
Definitely.
But the discussion here seemed to be going down a hole, so some actual measured numbers make a fun change.
You were comparing 32-bit and 64-bit times - both running in 64-bit mode. As you found they are almost identical. 64-bit division takes longer for obvious reasons, oddly 64-bit multiply is sometimes a cycle faster, and perhaps the extra prefix byte for 16 and 64 bit instructions may make a slight difference.
This document is interesting too http://www.agner.org/optimize/microarchitecture.pdf
Index here http://www.agner.org/optimize/#manuals
Re: 64-bit operating system
jahboater,
Which I find rather unbelievable.
I'm not sure about the prefix bytes, did not looks so hard at the generated code. But the 64 bit fibo() function I posted above is 112 bytes longer than the 783 bytes of the 32 bit version.
Actually, I'm wondering why they are both so huge for such a short, simple function?
Yes, because that is what ejolson was talking about when he said "shows 32-bit integers performing at twice the speed as 64-bit integers on x86 hardware running in 64-bit mode".You were comparing 32-bit and 64-bit times - both running in 64-bit mode.
Which I find rather unbelievable.
I'm not sure about the prefix bytes, did not looks so hard at the generated code. But the 64 bit fibo() function I posted above is 112 bytes longer than the 783 bytes of the 32 bit version.
Actually, I'm wondering why they are both so huge for such a short, simple function?
Memory in C++ is a leaky abstraction .
Re: 64-bit operating system
They are both identical in size on my PC (gcc 7.3).
Code: Select all
text data bss dec hex filename
1169 544 8 1721 6b9 try
[/quote]Just the usual C overheard I suppose. crt0 (the startup code), the GOT table for the linker etc.
You have to write stuff in assembler to get really tiny (a few bytes) programs.
The printf needs adjusting perhaps between 32/64 bits.
Re: 64-bit operating system
fibo 64 bit integers is:
and 32 bit integers is:
You can see a "48" prefix byte for "decq" compared to "decl". Its not many.
Code: Select all
80 fibo:
81 0000 55 pushq %rbp #
82 0001 31ED xorl %ebp, %ebp # add_acc_9
83 0003 53 pushq %rbx #
84 0004 488D5FFE leaq -2(%rdi), %rbx #, ivtmp.15
85 0008 4883EC08 subq $8, %rsp #,
86 .L3:
87 # try.c:9: if (n == 0)
88 000c 4883FBFE cmpq $-2, %rbx #, ivtmp.15
89 0010 7416 je .L4 #,
90 # try.c:13: if (n < 2)
91 0012 4883FBFF cmpq $-1, %rbx #, ivtmp.15
92 0016 7414 je .L5 #,
93 # try.c:17: return (fibo(n - 2) + fibo(n - 1));
94 0018 4889DF movq %rbx, %rdi # ivtmp.15,
95 001b 48FFCB decq %rbx # ivtmp.15
96 001e E8DDFFFF call fibo #
96 FF
97 0023 4801C5 addq %rax, %rbp # _2, add_acc_9
98 0026 EBE4 jmp .L3 #
99 .L4:
100 # try.c:11: return 0;
101 0028 31C0 xorl %eax, %eax # _4
102 002a EB05 jmp .L2 #
103 .L5:
104 # try.c:15: return 1;
105 002c B8010000 movl $1, %eax #, _4
105 00
106 .L2:
107 0031 4801E8 addq %rbp, %rax # add_acc_9, tmp93
108 # try.c:18: }
109 0034 5A popq %rdx #
110 0035 5B popq %rbx #
111 0036 5D popq %rbp #
112 0037 C3 ret
Code: Select all
80 fibo:
81 0000 55 pushq %rbp #
82 0001 31ED xorl %ebp, %ebp # add_acc_9
83 0003 53 pushq %rbx #
84 0004 8D5FFE leal -2(%rdi), %ebx #, ivtmp.15
85 0007 4883EC08 subq $8, %rsp #,
86 .L3:
87 # try.c:9: if (n == 0)
88 000b 83FBFE cmpl $-2, %ebx #, ivtmp.15
89 000e 7412 je .L4 #,
90 # try.c:13: if (n < 2)
91 0010 83FBFF cmpl $-1, %ebx #, ivtmp.15
92 0013 7411 je .L5 #,
93 # try.c:17: return (fibo(n - 2) + fibo(n - 1));
94 0015 89DF movl %ebx, %edi # ivtmp.15,
95 0017 FFCB decl %ebx # ivtmp.15
96 0019 E8E2FFFF call fibo #
96 FF
97 001e 01C5 addl %eax, %ebp # _2, add_acc_9
98 0020 EBE9 jmp .L3 #
99 .L4:
100 # try.c:11: return 0;
101 0022 31C0 xorl %eax, %eax # _4
102 0024 EB05 jmp .L2 #
103 .L5:
104 # try.c:15: return 1;
105 0026 B8010000 movl $1, %eax #, _4
105 00
106 .L2:
107 002b 01E8 addl %ebp, %eax # add_acc_9, tmp102
108 # try.c:18: }
109 002d 5A popq %rdx #
110 002e 5B popq %rbx #
111 002f 5D popq %rbp #
112 0030 C3 ret
Re: 64-bit operating system
The speed of a recursive implementation of the Fibonacci sequence mostly reflects function call overhead. Integer arithmetic, in this case, is a trivial part of the total execution time. Try running the code I mentioned if you want to see a difference.
Re: 64-bit operating system
This does a lot of divisions (probably much like sysbench).
if(n%prime[k]==0) return 0;
64-bit division takes much longer than 32-bit division, because its doing more, and that dominates the time.
Most other instructions take the same time, even things like popcount.
Again, see http://www.agner.org/optimize/instruction_tables.pdf
I'm not sure what the point of comparing the speed of 32 and 64-bit operations in the same 64-bit mode is?
The real difference is comparing 64-bit division on a 32-bit platform (Pi) with a 64-bit platform (say Odroid C2), both with the same cpu.
pi3+ (Cortex A53 in 32 bit mode, gcc 7.3, 1.4GHz)
Found a total of 664579 primes (64-bit)
real 0m14.691s
user 0m14.680s
sys 0m0.000s
Odroid C2 (Cortex A53 in 64 bit mode, gcc 7.3, 1.68GHz)
Found a total of 664579 primes (64-bit)
real 0m2.689s
user 0m2.680s
sys 0m0.000s
Correcting for clock speed gives 3.22 sec for 64bit mode compared to 14.69 sec for 32-bit mode,
note the Pi does the division with a library call.
Now the same thing with 32-bit division that the Pi can do:-
Pi
Found a total of 664579 primes (32-bit)
real 0m3.332s
user 0m3.330s
sys 0m0.000s
odroid c2
Found a total of 664579 primes (32-bit)
real 0m2.493s
user 0m2.480s
sys 0m0.000s
Correcting for clock speed gives 2.99 sec for 64bit mode compared to 3.33 sec for 32-bit mode.
Perhaps this is a plausible measure of the 32/64 speed difference 11.4% faster for this little benchmark.
Re: 64-bit operating system
Another way of looking at this is that the RPF did some very clever engineering with the 3B+ to extend the life of the 40nm SOC by the 16% clock speed hike. Perhaps going to 64-bit might give say 10-15% speed increase again, with no change to the SOC - extending its life even more and delaying the expensive die shrink.
Re: 64-bit operating system
In some operations on the VPU we do hit SDRAM bandwidth problems (moving video around), I do wonder if this can happen when you move to 64bit integers - you are accessing the RAM more, which may be a cause of very slight slowdowns when you have a very (very) high number of SDRAM accesses going on.
Principal Software Engineer at Raspberry Pi Ltd.
Working in the Applications Team.
Working in the Applications Team.
Re: 64-bit operating system
Yes agreed, and 64-bit pointers too.jamesh wrote: ↑Mon Mar 26, 2018 9:05 amIn some operations on the VPU we do hit SDRAM bandwidth problems (moving video around), I do wonder if this can happen when you move to 64bit integers - you are accessing the RAM more, which may be a cause of very slight slowdowns when you have a very (very) high number of SDRAM accesses going on.
For scalar stuff though (e.g. not arrays or big structs), there is generally less memory access because there are twice the number of registers. For example it is much less common to put local scalar variables on the stack when you have 31 integer registers and 32 float registers - and so the stack needs adjusting less often and function calls are faster (how often do you have more than 31 integer/pointer local variables!).
Re: 64-bit operating system
jahboater,
But, the sizes I quoted are for only the fibo() function itself. Excluding the rest of the code. The fibo() using 64 bit ints is bigger. They both seem to be huge for such a small simple function. I am using GCC version 5.4.0 and -O3
Hmm...that's odd. My generated code looks very different:
fibo() with 64 bit ints:
fibo() with 32 bit ints:
Strangely enough the resulting executables are exactly the same size when using 32 bit ints or 64.They are both identical in size on my PC (gcc 7.3).....ust the usual C overheard I suppose.
But, the sizes I quoted are for only the fibo() function itself. Excluding the rest of the code. The fibo() using 64 bit ints is bigger. They both seem to be huge for such a small simple function. I am using GCC version 5.4.0 and -O3
Hmm...that's odd. My generated code looks very different:
fibo() with 64 bit ints:
Code: Select all
0000000000400600 <fibo>:
400600: 48 85 ff test %rdi,%rdi
400603: 0f 84 36 03 00 00 je 40093f <fibo+0x33f>
400609: 48 83 ff 01 cmp $0x1,%rdi
40060d: 0f 86 26 03 00 00 jbe 400939 <fibo+0x339>
400613: 41 57 push %r15
400615: 41 56 push %r14
400617: 48 8d 47 fe lea -0x2(%rdi),%rax
40061b: 41 55 push %r13
40061d: 41 54 push %r12
40061f: 55 push %rbp
400620: 53 push %rbx
400621: 48 83 ec 68 sub $0x68,%rsp
400625: 48 89 04 24 mov %rax,(%rsp)
400629: 48 c7 44 24 08 00 00 movq $0x0,0x8(%rsp)
400630: 00 00
400632: 48 85 c0 test %rax,%rax
400635: 0f 84 fa 02 00 00 je 400935 <fibo+0x335>
40063b: 48 83 f8 01 cmp $0x1,%rax
40063f: 0f 84 bb 02 00 00 je 400900 <fibo+0x300>
400645: 48 83 e8 02 sub $0x2,%rax
400649: 48 c7 44 24 38 00 00 movq $0x0,0x38(%rsp)
400650: 00 00
400652: 48 89 44 24 10 mov %rax,0x10(%rsp)
400657: 48 85 c0 test %rax,%rax
40065a: 0f 84 ee 02 00 00 je 40094e <fibo+0x34e>
400660: 48 83 f8 01 cmp $0x1,%rax
400664: 0f 84 d8 02 00 00 je 400942 <fibo+0x342>
40066a: 48 83 e8 02 sub $0x2,%rax
40066e: 48 c7 44 24 40 00 00 movq $0x0,0x40(%rsp)
400675: 00 00
400677: 48 89 44 24 18 mov %rax,0x18(%rsp)
40067c: 48 85 c0 test %rax,%rax
40067f: 0f 84 77 02 00 00 je 4008fc <fibo+0x2fc>
400685: 48 83 f8 01 cmp $0x1,%rax
400689: 0f 84 61 02 00 00 je 4008f0 <fibo+0x2f0>
40068f: 48 83 e8 02 sub $0x2,%rax
400693: 48 c7 44 24 48 00 00 movq $0x0,0x48(%rsp)
40069a: 00 00
40069c: 48 89 44 24 20 mov %rax,0x20(%rsp)
4006a1: 48 85 c0 test %rax,%rax
4006a4: 0f 84 0b 02 00 00 je 4008b5 <fibo+0x2b5>
4006aa: 48 83 f8 01 cmp $0x1,%rax
4006ae: 0f 84 f5 01 00 00 je 4008a9 <fibo+0x2a9>
4006b4: 48 83 e8 02 sub $0x2,%rax
4006b8: 48 c7 44 24 50 00 00 movq $0x0,0x50(%rsp)
4006bf: 00 00
4006c1: 48 85 c0 test %rax,%rax
4006c4: 48 89 44 24 28 mov %rax,0x28(%rsp)
4006c9: 0f 84 52 01 00 00 je 400821 <fibo+0x221>
4006cf: 48 83 f8 01 cmp $0x1,%rax
4006d3: 0f 84 c1 01 00 00 je 40089a <fibo+0x29a>
4006d9: 48 83 e8 02 sub $0x2,%rax
4006dd: 48 c7 44 24 58 00 00 movq $0x0,0x58(%rsp)
4006e4: 00 00
4006e6: 48 85 c0 test %rax,%rax
4006e9: 48 89 44 24 30 mov %rax,0x30(%rsp)
4006ee: 0f 84 e3 00 00 00 je 4007d7 <fibo+0x1d7>
4006f4: 45 31 f6 xor %r14d,%r14d
4006f7: 48 83 f8 01 cmp $0x1,%rax
4006fb: 4c 8d 68 fe lea -0x2(%rax),%r13
4006ff: 0f 84 53 01 00 00 je 400858 <fibo+0x258>
400705: 4d 85 ed test %r13,%r13
400708: 0f 84 8e 00 00 00 je 40079c <fibo+0x19c>
40070e: 45 31 ff xor %r15d,%r15d
400711: 49 83 fd 01 cmp $0x1,%r13
400715: 4d 8d 65 fe lea -0x2(%r13),%r12
400719: 0f 84 c1 00 00 00 je 4007e0 <fibo+0x1e0>
40071f: 4d 85 e4 test %r12,%r12
400722: 74 47 je 40076b <fibo+0x16b>
400724: 0f 1f 40 00 nopl 0x0(%rax)
400728: 49 83 fc 01 cmp $0x1,%r12
40072c: 74 72 je 4007a0 <fibo+0x1a0>
40072e: 4c 89 e3 mov %r12,%rbx
400731: 31 ed xor %ebp,%ebp
400733: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400738: 48 8d 7b fe lea -0x2(%rbx),%rdi
40073c: 48 83 eb 01 sub $0x1,%rbx
400740: e8 bb fe ff ff callq 400600 <fibo>
400745: 48 01 c5 add %rax,%rbp
400748: 48 83 fb 01 cmp $0x1,%rbx
40074c: 75 ea jne 400738 <fibo+0x138>
40074e: 49 83 fc ff cmp $0xffffffffffffffff,%r12
400752: 4e 8d 7c 3d 01 lea 0x1(%rbp,%r15,1),%r15
400757: 74 25 je 40077e <fibo+0x17e>
400759: 4d 85 e4 test %r12,%r12
40075c: 49 8d 44 24 ff lea -0x1(%r12),%rax
400761: 74 17 je 40077a <fibo+0x17a>
400763: 49 89 c4 mov %rax,%r12
400766: 4d 85 e4 test %r12,%r12
400769: 75 bd jne 400728 <fibo+0x128>
40076b: 31 c0 xor %eax,%eax
40076d: 49 01 c7 add %rax,%r15
400770: 4d 85 e4 test %r12,%r12
400773: 49 8d 44 24 ff lea -0x1(%r12),%rax
400778: 75 e9 jne 400763 <fibo+0x163>
40077a: 49 83 c7 01 add $0x1,%r15
40077e: 4d 01 fe add %r15,%r14
400781: 49 83 fd ff cmp $0xffffffffffffffff,%r13
400785: 74 24 je 4007ab <fibo+0x1ab>
400787: 4d 85 ed test %r13,%r13
40078a: 49 8d 45 ff lea -0x1(%r13),%rax
40078e: 74 17 je 4007a7 <fibo+0x1a7>
400790: 49 89 c5 mov %rax,%r13
400793: 4d 85 ed test %r13,%r13
400796: 0f 85 72 ff ff ff jne 40070e <fibo+0x10e>
40079c: 31 c0 xor %eax,%eax
40079e: eb 45 jmp 4007e5 <fibo+0x1e5>
4007a0: b8 01 00 00 00 mov $0x1,%eax
4007a5: eb c6 jmp 40076d <fibo+0x16d>
4007a7: 49 83 c6 01 add $0x1,%r14
4007ab: 4c 01 74 24 58 add %r14,0x58(%rsp)
4007b0: 48 83 7c 24 30 ff cmpq $0xffffffffffffffff,0x30(%rsp)
4007b6: 74 38 je 4007f0 <fibo+0x1f0>
4007b8: 48 8b 4c 24 30 mov 0x30(%rsp),%rcx
4007bd: 48 89 c8 mov %rcx,%rax
4007c0: 48 83 e8 01 sub $0x1,%rax
4007c4: 48 85 c9 test %rcx,%rcx
4007c7: 74 21 je 4007ea <fibo+0x1ea>
4007c9: 48 85 c0 test %rax,%rax
4007cc: 48 89 44 24 30 mov %rax,0x30(%rsp)
4007d1: 0f 85 1d ff ff ff jne 4006f4 <fibo+0xf4>
4007d7: 31 c0 xor %eax,%eax
4007d9: e9 7f 00 00 00 jmpq 40085d <fibo+0x25d>
4007de: 66 90 xchg %ax,%ax
4007e0: b8 01 00 00 00 mov $0x1,%eax
4007e5: 49 01 c6 add %rax,%r14
4007e8: eb 9d jmp 400787 <fibo+0x187>
4007ea: 48 83 44 24 58 01 addq $0x1,0x58(%rsp)
4007f0: 48 8b 74 24 58 mov 0x58(%rsp),%rsi
4007f5: 48 01 74 24 50 add %rsi,0x50(%rsp)
4007fa: 48 83 7c 24 28 ff cmpq $0xffffffffffffffff,0x28(%rsp)
400800: 74 29 je 40082b <fibo+0x22b>
400802: 48 8b 54 24 28 mov 0x28(%rsp),%rdx
400807: 48 89 d0 mov %rdx,%rax
40080a: 48 83 e8 01 sub $0x1,%rax
40080e: 48 85 d2 test %rdx,%rdx
400811: 74 12 je 400825 <fibo+0x225>
400813: 48 85 c0 test %rax,%rax
400816: 48 89 44 24 28 mov %rax,0x28(%rsp)
40081b: 0f 85 ae fe ff ff jne 4006cf <fibo+0xcf>
400821: 31 c0 xor %eax,%eax
400823: eb 7a jmp 40089f <fibo+0x29f>
400825: 48 83 44 24 50 01 addq $0x1,0x50(%rsp)
40082b: 48 8b 4c 24 50 mov 0x50(%rsp),%rcx
400830: 48 01 4c 24 48 add %rcx,0x48(%rsp)
400835: 48 83 7c 24 20 ff cmpq $0xffffffffffffffff,0x20(%rsp)
40083b: 74 30 je 40086d <fibo+0x26d>
40083d: 48 8b 74 24 20 mov 0x20(%rsp),%rsi
400842: 48 89 f0 mov %rsi,%rax
400845: 48 83 e8 01 sub $0x1,%rax
400849: 48 85 f6 test %rsi,%rsi
40084c: 74 19 je 400867 <fibo+0x267>
40084e: 48 89 44 24 20 mov %rax,0x20(%rsp)
400853: e9 49 fe ff ff jmpq 4006a1 <fibo+0xa1>
400858: b8 01 00 00 00 mov $0x1,%eax
40085d: 48 01 44 24 58 add %rax,0x58(%rsp)
400862: e9 51 ff ff ff jmpq 4007b8 <fibo+0x1b8>
400867: 48 83 44 24 48 01 addq $0x1,0x48(%rsp)
40086d: 48 8b 54 24 48 mov 0x48(%rsp),%rdx
400872: 48 01 54 24 40 add %rdx,0x40(%rsp)
400877: 48 83 7c 24 18 ff cmpq $0xffffffffffffffff,0x18(%rsp)
40087d: 74 40 je 4008bf <fibo+0x2bf>
40087f: 48 8b 4c 24 18 mov 0x18(%rsp),%rcx
400884: 48 89 c8 mov %rcx,%rax
400887: 48 83 e8 01 sub $0x1,%rax
40088b: 48 85 c9 test %rcx,%rcx
40088e: 74 29 je 4008b9 <fibo+0x2b9>
400890: 48 89 44 24 18 mov %rax,0x18(%rsp)
400895: e9 e2 fd ff ff jmpq 40067c <fibo+0x7c>
40089a: b8 01 00 00 00 mov $0x1,%eax
40089f: 48 01 44 24 50 add %rax,0x50(%rsp)
4008a4: e9 59 ff ff ff jmpq 400802 <fibo+0x202>
4008a9: b8 01 00 00 00 mov $0x1,%eax
4008ae: 48 01 44 24 48 add %rax,0x48(%rsp)
4008b3: eb 88 jmp 40083d <fibo+0x23d>
4008b5: 31 c0 xor %eax,%eax
4008b7: eb f5 jmp 4008ae <fibo+0x2ae>
4008b9: 48 83 44 24 40 01 addq $0x1,0x40(%rsp)
4008bf: 48 8b 74 24 40 mov 0x40(%rsp),%rsi
4008c4: 48 01 74 24 38 add %rsi,0x38(%rsp)
4008c9: 48 83 7c 24 10 ff cmpq $0xffffffffffffffff,0x10(%rsp)
4008cf: 0f 84 93 00 00 00 je 400968 <fibo+0x368>
4008d5: 48 8b 54 24 10 mov 0x10(%rsp),%rdx
4008da: 48 89 d0 mov %rdx,%rax
4008dd: 48 83 e8 01 sub $0x1,%rax
4008e1: 48 85 d2 test %rdx,%rdx
4008e4: 74 6c je 400952 <fibo+0x352>
4008e6: 48 89 44 24 10 mov %rax,0x10(%rsp)
4008eb: e9 67 fd ff ff jmpq 400657 <fibo+0x57>
4008f0: b8 01 00 00 00 mov $0x1,%eax
4008f5: 48 01 44 24 40 add %rax,0x40(%rsp)
4008fa: eb 83 jmp 40087f <fibo+0x27f>
4008fc: 31 c0 xor %eax,%eax
4008fe: eb f5 jmp 4008f5 <fibo+0x2f5>
400900: b8 01 00 00 00 mov $0x1,%eax
400905: 48 01 44 24 08 add %rax,0x8(%rsp)
40090a: 48 83 2c 24 01 subq $0x1,(%rsp)
40090f: 48 8b 04 24 mov (%rsp),%rax
400913: 48 83 f8 ff cmp $0xffffffffffffffff,%rax
400917: 0f 85 15 fd ff ff jne 400632 <fibo+0x32>
40091d: 48 8b 44 24 08 mov 0x8(%rsp),%rax
400922: 48 83 c4 68 add $0x68,%rsp
400926: 5b pop %rbx
400927: 5d pop %rbp
400928: 48 83 c0 01 add $0x1,%rax
40092c: 41 5c pop %r12
40092e: 41 5d pop %r13
400930: 41 5e pop %r14
400932: 41 5f pop %r15
400934: c3 retq
400935: 31 c0 xor %eax,%eax
400937: eb cc jmp 400905 <fibo+0x305>
400939: b8 01 00 00 00 mov $0x1,%eax
40093e: c3 retq
40093f: 31 c0 xor %eax,%eax
400941: c3 retq
400942: b8 01 00 00 00 mov $0x1,%eax
400947: 48 01 44 24 38 add %rax,0x38(%rsp)
40094c: eb 87 jmp 4008d5 <fibo+0x2d5>
40094e: 31 c0 xor %eax,%eax
400950: eb f5 jmp 400947 <fibo+0x347>
400952: 48 8b 4c 24 08 mov 0x8(%rsp),%rcx
400957: 48 8b 44 24 38 mov 0x38(%rsp),%rax
40095c: 48 8d 44 08 01 lea 0x1(%rax,%rcx,1),%rax
400961: 48 89 44 24 08 mov %rax,0x8(%rsp)
400966: eb a2 jmp 40090a <fibo+0x30a>
400968: 48 8b 4c 24 38 mov 0x38(%rsp),%rcx
40096d: 48 01 4c 24 08 add %rcx,0x8(%rsp)
400972: eb 96 jmp 40090a <fibo+0x30a>
400974: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40097b: 00 00 00
40097e: 66 90 xchg %ax,%ax
Code: Select all
0000000000400600 <fibo>:
400600: 85 ff test %edi,%edi
400602: 0f 84 d8 02 00 00 je 4008e0 <fibo+0x2e0>
400608: 83 ff 01 cmp $0x1,%edi
40060b: 0f 86 c9 02 00 00 jbe 4008da <fibo+0x2da>
400611: 41 57 push %r15
400613: 41 56 push %r14
400615: 8d 47 fe lea -0x2(%rdi),%eax
400618: 41 55 push %r13
40061a: 41 54 push %r12
40061c: 55 push %rbp
40061d: 53 push %rbx
40061e: 48 83 ec 38 sub $0x38,%rsp
400622: 89 04 24 mov %eax,(%rsp)
400625: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
40062c: 00
40062d: 85 c0 test %eax,%eax
40062f: 0f 84 a1 02 00 00 je 4008d6 <fibo+0x2d6>
400635: 83 f8 01 cmp $0x1,%eax
400638: 0f 84 69 02 00 00 je 4008a7 <fibo+0x2a7>
40063e: 83 e8 02 sub $0x2,%eax
400641: c7 44 24 1c 00 00 00 movl $0x0,0x1c(%rsp)
400648: 00
400649: 89 44 24 08 mov %eax,0x8(%rsp)
40064d: 85 c0 test %eax,%eax
40064f: 0f 84 99 02 00 00 je 4008ee <fibo+0x2ee>
400655: 83 f8 01 cmp $0x1,%eax
400658: 0f 84 85 02 00 00 je 4008e3 <fibo+0x2e3>
40065e: 83 e8 02 sub $0x2,%eax
400661: c7 44 24 20 00 00 00 movl $0x0,0x20(%rsp)
400668: 00
400669: 89 44 24 0c mov %eax,0xc(%rsp)
40066d: 85 c0 test %eax,%eax
40066f: 0f 84 2e 02 00 00 je 4008a3 <fibo+0x2a3>
400675: 83 f8 01 cmp $0x1,%eax
400678: 0f 84 1a 02 00 00 je 400898 <fibo+0x298>
40067e: 83 e8 02 sub $0x2,%eax
400681: c7 44 24 24 00 00 00 movl $0x0,0x24(%rsp)
400688: 00
400689: 89 44 24 10 mov %eax,0x10(%rsp)
40068d: 85 c0 test %eax,%eax
40068f: 0f 84 d1 01 00 00 je 400866 <fibo+0x266>
400695: 83 f8 01 cmp $0x1,%eax
400698: 0f 84 bd 01 00 00 je 40085b <fibo+0x25b>
40069e: 83 e8 02 sub $0x2,%eax
4006a1: c7 44 24 28 00 00 00 movl $0x0,0x28(%rsp)
4006a8: 00
4006a9: 85 c0 test %eax,%eax
4006ab: 89 44 24 14 mov %eax,0x14(%rsp)
4006af: 0f 84 32 01 00 00 je 4007e7 <fibo+0x1e7>
4006b5: 83 f8 01 cmp $0x1,%eax
4006b8: 0f 84 8f 01 00 00 je 40084d <fibo+0x24d>
4006be: 83 e8 02 sub $0x2,%eax
4006c1: c7 44 24 2c 00 00 00 movl $0x0,0x2c(%rsp)
4006c8: 00
4006c9: 85 c0 test %eax,%eax
4006cb: 89 44 24 18 mov %eax,0x18(%rsp)
4006cf: 0f 84 d7 00 00 00 je 4007ac <fibo+0x1ac>
4006d5: 45 31 f6 xor %r14d,%r14d
4006d8: 83 f8 01 cmp $0x1,%eax
4006db: 44 8d 68 fe lea -0x2(%rax),%r13d
4006df: 0f 84 30 01 00 00 je 400815 <fibo+0x215>
4006e5: 45 85 ed test %r13d,%r13d
4006e8: 0f 84 8a 00 00 00 je 400778 <fibo+0x178>
4006ee: 45 31 ff xor %r15d,%r15d
4006f1: 41 83 fd 01 cmp $0x1,%r13d
4006f5: 45 8d 65 fe lea -0x2(%r13),%r12d
4006f9: 0f 84 b1 00 00 00 je 4007b0 <fibo+0x1b0>
4006ff: 45 85 e4 test %r12d,%r12d
400702: 74 43 je 400747 <fibo+0x147>
400704: 0f 1f 40 00 nopl 0x0(%rax)
400708: 41 83 fc 01 cmp $0x1,%r12d
40070c: 74 6e je 40077c <fibo+0x17c>
40070e: 44 89 e3 mov %r12d,%ebx
400711: 31 ed xor %ebp,%ebp
400713: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
400718: 8d 7b fe lea -0x2(%rbx),%edi
40071b: 83 eb 01 sub $0x1,%ebx
40071e: e8 dd fe ff ff callq 400600 <fibo>
400723: 01 c5 add %eax,%ebp
400725: 83 fb 01 cmp $0x1,%ebx
400728: 75 ee jne 400718 <fibo+0x118>
40072a: 41 83 fc ff cmp $0xffffffff,%r12d
40072e: 46 8d 7c 3d 01 lea 0x1(%rbp,%r15,1),%r15d
400733: 74 25 je 40075a <fibo+0x15a>
400735: 45 85 e4 test %r12d,%r12d
400738: 41 8d 44 24 ff lea -0x1(%r12),%eax
40073d: 74 17 je 400756 <fibo+0x156>
40073f: 41 89 c4 mov %eax,%r12d
400742: 45 85 e4 test %r12d,%r12d
400745: 75 c1 jne 400708 <fibo+0x108>
400747: 31 c0 xor %eax,%eax
400749: 41 01 c7 add %eax,%r15d
40074c: 45 85 e4 test %r12d,%r12d
40074f: 41 8d 44 24 ff lea -0x1(%r12),%eax
400754: 75 e9 jne 40073f <fibo+0x13f>
400756: 41 83 c7 01 add $0x1,%r15d
40075a: 45 01 fe add %r15d,%r14d
40075d: 41 83 fd ff cmp $0xffffffff,%r13d
400761: 74 24 je 400787 <fibo+0x187>
400763: 45 85 ed test %r13d,%r13d
400766: 41 8d 45 ff lea -0x1(%r13),%eax
40076a: 74 17 je 400783 <fibo+0x183>
40076c: 41 89 c5 mov %eax,%r13d
40076f: 45 85 ed test %r13d,%r13d
400772: 0f 85 76 ff ff ff jne 4006ee <fibo+0xee>
400778: 31 c0 xor %eax,%eax
40077a: eb 39 jmp 4007b5 <fibo+0x1b5>
40077c: b8 01 00 00 00 mov $0x1,%eax
400781: eb c6 jmp 400749 <fibo+0x149>
400783: 41 83 c6 01 add $0x1,%r14d
400787: 44 01 74 24 2c add %r14d,0x2c(%rsp)
40078c: 83 7c 24 18 ff cmpl $0xffffffff,0x18(%rsp)
400791: 74 2c je 4007bf <fibo+0x1bf>
400793: 8b 4c 24 18 mov 0x18(%rsp),%ecx
400797: 89 c8 mov %ecx,%eax
400799: 83 e8 01 sub $0x1,%eax
40079c: 85 c9 test %ecx,%ecx
40079e: 74 1a je 4007ba <fibo+0x1ba>
4007a0: 85 c0 test %eax,%eax
4007a2: 89 44 24 18 mov %eax,0x18(%rsp)
4007a6: 0f 85 29 ff ff ff jne 4006d5 <fibo+0xd5>
4007ac: 31 c0 xor %eax,%eax
4007ae: eb 6a jmp 40081a <fibo+0x21a>
4007b0: b8 01 00 00 00 mov $0x1,%eax
4007b5: 41 01 c6 add %eax,%r14d
4007b8: eb a9 jmp 400763 <fibo+0x163>
4007ba: 83 44 24 2c 01 addl $0x1,0x2c(%rsp)
4007bf: 8b 74 24 2c mov 0x2c(%rsp),%esi
4007c3: 01 74 24 28 add %esi,0x28(%rsp)
4007c7: 83 7c 24 14 ff cmpl $0xffffffff,0x14(%rsp)
4007cc: 74 22 je 4007f0 <fibo+0x1f0>
4007ce: 8b 54 24 14 mov 0x14(%rsp),%edx
4007d2: 89 d0 mov %edx,%eax
4007d4: 83 e8 01 sub $0x1,%eax
4007d7: 85 d2 test %edx,%edx
4007d9: 74 10 je 4007eb <fibo+0x1eb>
4007db: 85 c0 test %eax,%eax
4007dd: 89 44 24 14 mov %eax,0x14(%rsp)
4007e1: 0f 85 ce fe ff ff jne 4006b5 <fibo+0xb5>
4007e7: 31 c0 xor %eax,%eax
4007e9: eb 67 jmp 400852 <fibo+0x252>
4007eb: 83 44 24 28 01 addl $0x1,0x28(%rsp)
4007f0: 8b 4c 24 28 mov 0x28(%rsp),%ecx
4007f4: 01 4c 24 24 add %ecx,0x24(%rsp)
4007f8: 83 7c 24 10 ff cmpl $0xffffffff,0x10(%rsp)
4007fd: 74 29 je 400828 <fibo+0x228>
4007ff: 8b 74 24 10 mov 0x10(%rsp),%esi
400803: 89 f0 mov %esi,%eax
400805: 83 e8 01 sub $0x1,%eax
400808: 85 f6 test %esi,%esi
40080a: 74 17 je 400823 <fibo+0x223>
40080c: 89 44 24 10 mov %eax,0x10(%rsp)
400810: e9 78 fe ff ff jmpq 40068d <fibo+0x8d>
400815: b8 01 00 00 00 mov $0x1,%eax
40081a: 01 44 24 2c add %eax,0x2c(%rsp)
40081e: e9 70 ff ff ff jmpq 400793 <fibo+0x193>
400823: 83 44 24 24 01 addl $0x1,0x24(%rsp)
400828: 8b 54 24 24 mov 0x24(%rsp),%edx
40082c: 01 54 24 20 add %edx,0x20(%rsp)
400830: 83 7c 24 0c ff cmpl $0xffffffff,0xc(%rsp)
400835: 74 38 je 40086f <fibo+0x26f>
400837: 8b 4c 24 0c mov 0xc(%rsp),%ecx
40083b: 89 c8 mov %ecx,%eax
40083d: 83 e8 01 sub $0x1,%eax
400840: 85 c9 test %ecx,%ecx
400842: 74 26 je 40086a <fibo+0x26a>
400844: 89 44 24 0c mov %eax,0xc(%rsp)
400848: e9 20 fe ff ff jmpq 40066d <fibo+0x6d>
40084d: b8 01 00 00 00 mov $0x1,%eax
400852: 01 44 24 28 add %eax,0x28(%rsp)
400856: e9 73 ff ff ff jmpq 4007ce <fibo+0x1ce>
40085b: b8 01 00 00 00 mov $0x1,%eax
400860: 01 44 24 24 add %eax,0x24(%rsp)
400864: eb 99 jmp 4007ff <fibo+0x1ff>
400866: 31 c0 xor %eax,%eax
400868: eb f6 jmp 400860 <fibo+0x260>
40086a: 83 44 24 20 01 addl $0x1,0x20(%rsp)
40086f: 8b 74 24 20 mov 0x20(%rsp),%esi
400873: 01 74 24 1c add %esi,0x1c(%rsp)
400877: 83 7c 24 08 ff cmpl $0xffffffff,0x8(%rsp)
40087c: 0f 84 82 00 00 00 je 400904 <fibo+0x304>
400882: 8b 54 24 08 mov 0x8(%rsp),%edx
400886: 89 d0 mov %edx,%eax
400888: 83 e8 01 sub $0x1,%eax
40088b: 85 d2 test %edx,%edx
40088d: 74 63 je 4008f2 <fibo+0x2f2>
40088f: 89 44 24 08 mov %eax,0x8(%rsp)
400893: e9 b5 fd ff ff jmpq 40064d <fibo+0x4d>
400898: b8 01 00 00 00 mov $0x1,%eax
40089d: 01 44 24 20 add %eax,0x20(%rsp)
4008a1: eb 94 jmp 400837 <fibo+0x237>
4008a3: 31 c0 xor %eax,%eax
4008a5: eb f6 jmp 40089d <fibo+0x29d>
4008a7: b8 01 00 00 00 mov $0x1,%eax
4008ac: 01 44 24 04 add %eax,0x4(%rsp)
4008b0: 83 2c 24 01 subl $0x1,(%rsp)
4008b4: 8b 04 24 mov (%rsp),%eax
4008b7: 83 f8 ff cmp $0xffffffff,%eax
4008ba: 0f 85 6d fd ff ff jne 40062d <fibo+0x2d>
4008c0: 8b 44 24 04 mov 0x4(%rsp),%eax
4008c4: 48 83 c4 38 add $0x38,%rsp
4008c8: 5b pop %rbx
4008c9: 5d pop %rbp
4008ca: 83 c0 01 add $0x1,%eax
4008cd: 41 5c pop %r12
4008cf: 41 5d pop %r13
4008d1: 41 5e pop %r14
4008d3: 41 5f pop %r15
4008d5: c3 retq
4008d6: 31 c0 xor %eax,%eax
4008d8: eb d2 jmp 4008ac <fibo+0x2ac>
4008da: b8 01 00 00 00 mov $0x1,%eax
4008df: c3 retq
4008e0: 31 c0 xor %eax,%eax
4008e2: c3 retq
4008e3: b8 01 00 00 00 mov $0x1,%eax
4008e8: 01 44 24 1c add %eax,0x1c(%rsp)
4008ec: eb 94 jmp 400882 <fibo+0x282>
4008ee: 31 c0 xor %eax,%eax
4008f0: eb f6 jmp 4008e8 <fibo+0x2e8>
4008f2: 8b 4c 24 04 mov 0x4(%rsp),%ecx
4008f6: 8b 44 24 1c mov 0x1c(%rsp),%eax
4008fa: 8d 44 08 01 lea 0x1(%rax,%rcx,1),%eax
4008fe: 89 44 24 04 mov %eax,0x4(%rsp)
400902: eb ac jmp 4008b0 <fibo+0x2b0>
400904: 8b 4c 24 1c mov 0x1c(%rsp),%ecx
400908: 01 4c 24 04 add %ecx,0x4(%rsp)
40090c: eb a2 jmp 4008b0 <fibo+0x2b0>
40090e: 66 90 xchg %ax,%ax
Memory in C++ is a leaky abstraction .
Re: 64-bit operating system
ejolson,
The 32 bit result:
The 64 bit result:
Quite a difference.
Yep. Most of the code I'm running most of the time does not do intense amounts of calculation.The speed of a recursive implementation of the Fibonacci sequence mostly reflects function call overhead.
OK.Try running the code I mentioned if you want to see a difference.
The 32 bit result:
Code: Select all
$ time ./a.out; time ./a.out; time ./a.out
Found a total of 664579 primes (32-bit)
real 0m0.896s
user 0m0.859s
sys 0m0.016s
Found a total of 664579 primes (32-bit)
real 0m0.927s
user 0m0.891s
sys 0m0.016s
Found a total of 664579 primes (32-bit)
real 0m0.939s
user 0m0.906s
sys 0m0.000s
Code: Select all
$ time ./a.out; time ./a.out; time ./a.out
Found a total of 664579 primes (64-bit)
real 0m2.743s
user 0m2.734s
sys 0m0.000s
Found a total of 664579 primes (64-bit)
real 0m2.685s
user 0m2.609s
sys 0m0.016s
Found a total of 664579 primes (64-bit)
real 0m2.706s
user 0m2.656s
sys 0m0.000s
Memory in C++ is a leaky abstraction .
Re: 64-bit operating system
Yes, because 64-bit division is slow - even on 64-bit platforms, and it is horrifically slow on 32-bit platforms.
This prime number benchmark is mostly divisions/remainders.
What platform did you run those tests on?
Have you looked at the assembler (with -S -fverbose-asm)? you might even find the 64-bit division is being done by a library call on the Pi, where the 32-bit division is being done with the udiv instruction.
Re: 64-bit operating system
Thats because you are using -O3 (and a very old compiler, the last supported version was 6.4).
I never use -O3, it produces absolutely massive code that isn't always faster (it might not fit in the i-cache for example).
-Os produces the most human readable code by the way. Most people use -O2.
You can see the extra 64-bit operand size prefix byte (48) on some instructions - here is the first insn of your fibo function:
400600: 48 85 ff test %rdi,%rdi
400600: 85 ff test %edi,%edi
Its no big deal, ARM instructions are all 4 bytes - and you need a lot more of them.
Re: 64-bit operating system
jahboater,
I'm running these tests on a Microsoft Surface Pro. Sorry I don't have Pi to hand just now.
This is contrary to my past experience where -O3 has generally been the better performer.
In that case changing to -Os shrinks the fibo() function dramatically, down to about 55 bytes. The 64 bit version is a few bytes smaller.
However -Os increases the run time to 18 seconds for 32 bits and 21 seconds for 64 bits. A significant drop in performance.
-O2 turns in much the same timing as -Os.
I'm running these tests on a Microsoft Surface Pro. Sorry I don't have Pi to hand just now.
Old compiler, what? It's what is in the current Ubuntu for the Linux Subsystem for Windows.Thats because you are using -O3 (and a very old compiler, the last supported version was 6.4).
I never use -O3, it produces absolutely massive code that isn't always faster (it might not fit in the i-cache for example).
-Os produces the most human readable code by the way. Most people use -O2.
This is contrary to my past experience where -O3 has generally been the better performer.
In that case changing to -Os shrinks the fibo() function dramatically, down to about 55 bytes. The 64 bit version is a few bytes smaller.
However -Os increases the run time to 18 seconds for 32 bits and 21 seconds for 64 bits. A significant drop in performance.
-O2 turns in much the same timing as -Os.
Memory in C++ is a leaky abstraction .
Re: 64-bit operating system
FYI, the bootable Gentoo 64-bit image for the RPi3 has now been updated to support the B+.
For more details, please see this post.
For more details, please see this post.
Re: 64-bit operating system
I performed a similar test using a 1.4GHz ARM Cortex-A53 core running in 64-bit mode and concluded that 9% if the speed increase was due to running in 64-bit mode. I suspect the 3% discrepancy in our results is due to the differing memory speeds, which have not been taking into account. I'll put your Raspberry Pi 3B+ timings into this table. I'm leaving your 64-bit timings out of the table for now because they appear to result from an overclocked system. It sure would be nice to have some timings of the Raspberry 3B+ running in 64-bit mode to compare.
Note that the 64-bit integer Raspberry Pi 3B+ timings, though slow, are much faster than I would have expected. I understand you used gcc-7.3, but what optimizer settings were used?
Re: 64-bit operating system
Obviously for a fair comparison it would have been same for 32 and 64 bit ints, but I cant remember if I temporarily changed -Os to -O3 for the benchmarks in my standard makefile (sorry).
-march=native -Os -mneon-for-64bits
"native" works properly with recent GCC on ARM. I believe it results in:-
-mcpu=cortex-a53 -mfpu=neon-fp-armv8
Re: 64-bit operating system
Booted on a Pi3B , expanded SD and got Desktop up then shutdown and put SD card into a Pi3B+, bingo it worksFYI, the bootable Gentoo 64-bit image for the RPi3 has now been updated to support the B+.

Have no network here, will test further at home.
I am impressed. it must be magic

It will be fun to play with NEON stuff now.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges
Raspberries are not Apples or Oranges
Re: 64-bit operating system
Agreed.jamesh wrote: ↑Mon Mar 19, 2018 4:01 pmMore than 4GB? Not happening for years. RAM is too expensive to put that much on and keep anywhere close to the $35 price point. And RAM prices are currently INCREASING....
I find it amazing how much RAM we 'need' nowadays, when we were doing very similar tasks in 32MB devices not that long ago. Just badly written code in my opinion.
The 64 bit question is a valid one though, why was I assuming Raspbian is ?

Anyhow, for the price of £34 (incl. delivery) I do not expect the desktop to be flying. Would I want to pay more to have more ram? I don't know, I am close to trying a Blue ray movie using Kodi. If that works, then why pay for more? I also think many of us still have PC's or powerful laptops and are not looking to replace them with a RPI. Although I am impressed with the performance.
Actually is it not true we may need more memory for the 64-bit? So everything comes at a price.
We first need to stop using Chromium on the RPI. It's overloaded with functionality, all the additional tasks that the Chr. engine has to do that we don't know just shows how ridiculously reliant we are on so much memory. People think it's a necessity, being able to watch full HD stuff on their phones. I used to code on computers that had 128K and any compiling had to be done with no public having access to computers.
Having to run stuff with fewer resources (CPU, memory) teaches us respect and how to preserve what is wasted - energy. It also enables us to learn how actually an OS is built and how to tweak it. I like the idea of being in control, and that other "regular guys" don't know what I know

Just my 3c.
Richard
Re: 64-bit operating system
Now that's an option I am willing to explore. Thank's for the infoGavinmc42 wrote: ↑Wed Apr 11, 2018 7:27 amBooted on a Pi3B , expanded SD and got Desktop up then shutdown and put SD card into a Pi3B+, bingo it worksFYI, the bootable Gentoo 64-bit image for the RPi3 has now been updated to support the B+.![]()
Have no network here, will test further at home.
I am impressed. it must be magic
It will be fun to play with NEON stuff now.

Is this how you do it?
https://wiki.gentoo.org/wiki/Raspberry_Pi
“It’s nice to be important, but it’s more important to be nice.” 

Re: 64-bit operating system
For Gentoo 64 follow Sakaki's link
https://github.com/sakaki-/gentoo-on-rpi3-64bit
I will expect LAN7515 issue, I have seen things about this in the Raspbian Linux github posts.
https://github.com/raspberrypi/linux/commits/rpi-4.14.y
Might need to stick it in a Pi3B and update soon for the Pi3B+.
As Sakaki seems to know her stuff, it may be fixed soon/already.
A real usable Pi Aarch64 Desktop?
Tried many so called ones before but the new Pi3B+ should just push it into the usable for development PC box.
Actually now thinking a PiCore version would be fun to do now, perhaps a dCore spin?
https://github.com/sakaki-/gentoo-on-rpi3-64bit
I will expect LAN7515 issue, I have seen things about this in the Raspbian Linux github posts.
https://github.com/raspberrypi/linux/commits/rpi-4.14.y
Might need to stick it in a Pi3B and update soon for the Pi3B+.
As Sakaki seems to know her stuff, it may be fixed soon/already.
A real usable Pi Aarch64 Desktop?
Tried many so called ones before but the new Pi3B+ should just push it into the usable for development PC box.
Actually now thinking a PiCore version would be fun to do now, perhaps a dCore spin?
Last edited by Gavinmc42 on Wed Apr 11, 2018 11:27 am, edited 1 time in total.
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges
Raspberries are not Apples or Oranges