rplantz
Posts: 93
Joined: Sun Jul 01, 2012 2:38 am

Cache sizes

Fri Feb 09, 2024 7:50 pm

I'm writing a book based on 64-bit Raspberry Pi OS, https://nostarch.com/introcomputerorgforarm. It's an introductory book. In the discussion about memory caches, I want to give the cache sizes of several RPi models, but I find conflicting information online. Some people say that even tools like

Code: Select all

lscpu -C
might not be accurate.

I give the following information in my book:

Code: Select all

3 A+,B,B+: L1i 4 x 32KB; L1d 4 x 32KB; L2u 512KB.
4B: L1i 4 x 48KB; L1d 4 x 32KB; L2u 1MB
5: L1i 4 x 64KB; L2d 4 x 64KB; L2u 4 x 512KB; L3u 2MB.
where L1i is Level 1 instruction, L1d is Level1 data, L2u is Level 2 unified, and 4 x means for each of the four cores.

Is this correct?
Last edited by rplantz on Sat Feb 10, 2024 4:52 pm, edited 1 time in total.

User avatar
rpdom
Posts: 23330
Joined: Sun May 06, 2012 5:17 am
Location: Chelmsford, Essex, UK

Re: Cache sizes

Fri Feb 09, 2024 10:49 pm

Did you already as something like this? viewtopic.php?t=337299
Unreadable squiggle

rplantz
Posts: 93
Joined: Sun Jul 01, 2012 2:38 am

Re: Cache sizes

Sat Feb 10, 2024 5:01 pm

Yes, but I still see conflicting specifications for the caches. For example, the paper referenced in a response to my other post shows that the Level 1 caches in RPi 3 are 16KB. But lscpu -C on my RPi 3 shows that they're 32KB.

Which is correct? Somebody, someplace knows the correct values. I'm hoping that person will answer my question so that I can give the correct numbers in my book. I don't like lying to people.

ejolson
Posts: 12155
Joined: Tue Mar 18, 2014 11:47 am

Re: Cache sizes

Sat Feb 10, 2024 5:25 pm

rplantz wrote:
Sat Feb 10, 2024 5:01 pm
Yes, but I still see conflicting specifications for the caches. For example, the paper referenced in a response to my other post shows that the Level 1 caches in RPi 3 are 16KB. But lscpu -C on my RPi 3 shows that they're 32KB.

Which is correct? Somebody, someplace knows the correct values. I'm hoping that person will answer my question so that I can give the correct numbers in my book. I don't like lying to people.
I think lscpu makes stuff up by consulting a hand crafted table, so I'd go with the officially published results over that. I recall the Pi 1 and Zero rely on cache in the integrated VPU rather than the ARM core, but I may be confused.

More meaningful would be to test level 1, 2 and 3 cache sizes empirically. Note there already exist tests that can determine the size of the working sets at which different levels of cache thrashing occur, or you could write your own.

cleverca22
Posts: 8615
Joined: Sat Aug 18, 2012 2:33 pm

Re: Cache sizes

Sat Feb 10, 2024 6:09 pm

rplantz wrote:
Sat Feb 10, 2024 5:01 pm
Yes, but I still see conflicting specifications for the caches. For example, the paper referenced in a response to my other post shows that the Level 1 caches in RPi 3 are 16KB. But lscpu -C on my RPi 3 shows that they're 32KB.
https://github.com/librerpi/rpi-open-fi ... xt#L38-L50
https://github.com/librerpi/rpi-open-fi ... .c#L63-L81

i dumped the cache id registers when on a pi3 in baremetal

if i'm decoding things right (the raw values are present, so you can try that step), it has 32kb of L1d and 32kb of L1i
ejolson wrote:
Sat Feb 10, 2024 5:25 pm
More meaningful would be to test level 1, 2 and 3 cache sizes empirically. Note there already exist tests that can determine the size of the working sets at which different levels of cache thrashing occur, or you could write your own.
the pi0-pi3 family does not have an L3 cache by default
but the open firmware could re-configure things so it has a 128kb L3 cache
but with pi3 having a 512kb L2, that would just feel wrong, L3 is supposed to be bigger!
no clue what impacts that would then have on performance

ejolson
Posts: 12155
Joined: Tue Mar 18, 2014 11:47 am

Re: Cache sizes

Sat Feb 10, 2024 6:31 pm

cleverca22 wrote:
Sat Feb 10, 2024 6:09 pm
ejolson wrote:
Sat Feb 10, 2024 5:25 pm
More meaningful would be to test level 1, 2 and 3 cache sizes empirically. Note there already exist tests that can determine the size of the working sets at which different levels of cache thrashing occur, or you could write your own.
the pi0-pi3 family does not have an L3 cache by default
but the open firmware could re-configure things so it has a 128kb L3 cache
but with pi3 having a 512kb L2, that would just feel wrong, L3 is supposed to be bigger!
no clue what impacts that would then have on performance
That's right. I originally wrote level 3 if is exists but the sentence became awkward. At any rate one can still run the synthetic test past the level 2 cache size to see if anything happens.

That's interesting about the disabled 128K level 3 cache. I wonder if there is any situation where that would be useful.

rplantz
Posts: 93
Joined: Sun Jul 01, 2012 2:38 am

Re: Cache sizes

Sat Feb 10, 2024 8:55 pm

ejolson wrote:
Sat Feb 10, 2024 5:25 pm
I think lscpu makes stuff up by consulting a hand crafted table, so I'd go with the officially published results over that.
That's what I've heard. I'm trying to avoid "hand crafting" things in my book without knowing for sure.
More meaningful would be to test level 1, 2 and 3 cache sizes empirically. Note there already exist tests that can determine the size of the working sets at which different levels of cache thrashing occur, or you could write your own.
I own only a 3. I was about to buy a 4 when the pandemic came along. I decided to hop over to the 5 but still have not gotten one.

cleverca22
Posts: 8615
Joined: Sat Aug 18, 2012 2:33 pm

Re: Cache sizes

Sun Feb 11, 2024 1:43 am

ejolson wrote:
Sat Feb 10, 2024 6:31 pm
That's interesting about the disabled 128K level 3 cache. I wonder if there is any situation where that would be useful.
on the bcm2825, there was no arm L2 cache
the VPU L2 cache was put in the path of all requests, to make up for that

but starting with the bcm2836, the arm gained its own L2, and the VPU L2 was taken out of the loop, both because its smaller then the arm L2, and so the VPU doesnt have to share

but i can see how putting it back as a sort of L3, would reduce the dma latency for small buffers
flush something from the arm caches into the "L3" cache, then DMA from "L3", and you dont have to go out to dram twice

but only if the buffer fits within 128kb
and to avoid other things from swamping the 128kb "L3", you would need to map a special 16mb chunk of ram as using that L3, and other regions not
and now dma is complicated by having to work within that special region

in theory it can work, in practice, i dont know

Return to “Bare metal, Assembly language”