carlk3
Posts: 132
Joined: Wed Feb 17, 2021 8:46 pm

CRC Calculation with DMA Sniffer

Fri Sep 15, 2023 3:35 am

I'm considering the Issue "Suggestion for calculating CRC/etc". The idea is to use the DMA Sniffer instead of software to calculate CRC for data transfers to and from SD cards. (This method is also used in pico-extras/src/rp2_common/pico_sd_card/sd_card.c.) It seems like a good idea, but I have some concerns. The Sniffer is a single, global resource. So, this will reduce opportunities for concurrency. For example, if I have two SD cards on two SPIs (or an SPI and an SDIO) that are trying to read or write simultaneously, only one can use the Sniffer. So, the other will have to wait (or fall back to software CRC calculation). So, I'd need to have some kind of mutual exclusion lock. That's fine within the library, but what if some other DMA user in the system is using the Sniffer? Is there (or should there be) some sort of global acquire/release mechanism in the SDK?

cleverca22
Posts: 7791
Joined: Sat Aug 18, 2012 2:33 pm

Re: CRC Calculation with DMA Sniffer

Fri Sep 15, 2023 5:44 am

carlk3 wrote:
Fri Sep 15, 2023 3:35 am
The idea is to use the DMA Sniffer instead of software to calculate CRC for data transfers to and from SD cards.
ive talked to somebody else that did SD on the pico, and the idea is already dead in the water

the DMA sniffer, computes the checksum over each byte (or wider) that goes thru it

SD computes 4 checksums, each one over every bit passing thru a single data lane!!
so the checksum on D0, is the sum of all bits that went thru just D0
and the checksum on D1 is only impacted by the bits on D1, with no carry from D0!!

the pico dma cant compute such a checksum, so you cant hw accelerate it

dp11
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1196
Joined: Thu Dec 29, 2011 5:46 pm

Re: CRC Calculation with DMA Sniffer

Fri Sep 15, 2023 5:57 am

Haven't thought this through in detail, but what if you had a in parallel with the existing sd pio state machine another state machine for each bit that was a shift register ( a bit like a uart) and then use DMA to calculate the CRC the Dma data isn't actually used for anything else.

carlk3
Posts: 132
Joined: Wed Feb 17, 2021 8:46 pm

Re: CRC Calculation with DMA Sniffer

Fri Sep 15, 2023 6:11 am

cleverca22 wrote:
Fri Sep 15, 2023 5:44 am

ive talked to somebody else that did SD on the pico, and the idea is already dead in the water

the DMA sniffer, computes the checksum over each byte (or wider) that goes thru it

SD computes 4 checksums, each one over every bit passing thru a single data lane!!
so the checksum on D0, is the sum of all bits that went thru just D0
and the checksum on D1 is only impacted by the bits on D1, with no carry from D0!!

the pico dma cant compute such a checksum, so you cant hw accelerate it
Good point, but surely the DMA Sniffer could still be used for the SPI or 1-bit-wide SDIO modes?

jayben
Posts: 559
Joined: Mon Aug 19, 2019 9:56 pm

Re: CRC Calculation with DMA Sniffer

Fri Sep 15, 2023 2:40 pm

It isn't difficult to create a lookup table that simultaneously computes 4 CRC values, given a 4-bit value. So you can calculate the CRCs on-the-fly, while you are writing the data to the I/O port, e.g.

Code: Select all

uint64_t qcrc=0;
 
// Update 4 CRCs given a 4-bit value 'd':
qcrc = (qcrc >> 4) ^ qcrc16r_table[(d ^ (uint8_t)qcrc) & 0xf];
This won't be as fast as DMA, but it'll be quite quick, and much faster than computing each CRC individually on a 1-bit data stream.

For details, see https://iosoft.blog/zerowi-part2/

carlk3
Posts: 132
Joined: Wed Feb 17, 2021 8:46 pm

Re: CRC Calculation with DMA Sniffer

Fri Sep 15, 2023 5:45 pm

jayben wrote:
Fri Sep 15, 2023 2:40 pm
It isn't difficult to create a lookup table that simultaneously computes 4 CRC values, given a 4-bit value. So you can calculate the CRCs on-the-fly, while you are writing the data to the I/O port, e.g.
...
This won't be as fast as DMA, but it'll be quite quick, and much faster than computing each CRC individually on a 1-bit data stream.
...
Thanks. I think that what you are proposing is similar to what this code already does:
https://github.com/ZuluSCSI/ZuluSCSI-fi ... o.cpp#L126
But, to be honest, I do not yet fully understand that code.

EDIT: Nope, looks like it is a different algorithm. Now, I wonder which is better on the CM0+, but I don't want to delve into that right now.

I'm going to do some timings to see if it's even worth considering the Sniffer for the 1-bit use cases. Also, for the 1-bit cases, when doing multi-block transfers, maybe the CRC calculation could be pipelined with the DMA, like the ZuluSCSI code does for 4-bit transfers.

carlk3
Posts: 132
Joined: Wed Feb 17, 2021 8:46 pm

Re: CRC Calculation with DMA Sniffer

Tue Sep 19, 2023 5:49 am

For SPI attached SD cards, I've been able to get a significant speedup by overlapping CRC calculation with the DMA transfer. The time to write a 512 byte block went from 377 us to 313 us. The DMA transfer of the block data takes about 244 us, but the CRC16 calculation takes only about 66 us, so there is plenty of extra time. With this change, there is nothing to be gained by using the Sniffer to calculate the CRC unless the processor cores have something better to do during that 66 us than calculating the CRC for the block while the DMA is transferring it.

carlk3
Posts: 132
Joined: Wed Feb 17, 2021 8:46 pm

Re: CRC Calculation with DMA Sniffer

Tue Sep 19, 2023 4:48 pm

carlk3 wrote: With this change, there is nothing to be gained by using the Sniffer to calculate the CRC unless the processor cores have something better to do during that 66 us than calculating the CRC for the block while the DMA is transferring it.
However, that only applies when writing. For reads, one can't calculate the CRC until the data has been received. The Sniffer hardware would be useful in this case. However, one optimization for multi-block transfers might be to delay the CRC check of a block until the DMA completion wait time of the following block.

Return to “SDK”