ErikWlfn
Posts: 19
Joined: Thu Nov 04, 2021 4:52 pm

RPI4 PCI Express Bus Hangs randomly

Fri Jun 17, 2022 3:53 pm

I'm a software engineer, porting a bare metal real time OS to the Raspberry Pi Compute Module 5. The OS is working and our applications are working. I ported our proprietary driver for the Intel 82576 Gigabit Ethernet PCI Express chip to the ARM CPU. The chip has two ports, and up to a 4 X PCI Express v1 interface. The Compute Module 4 has a single 1 X PCI Express interface.

We are doing DMA transfers to/from an uncached area of RAM below the first 1 GB. The Ethernet works for a period of time between a few minutes to an hour but eventually an access to the PCIe memory mapped registers hangs. The CPU core hangs, although the other CPU cores continue to execute the software. We added code to write progress variables that are monitored by the other cores and that is how we know where the core hangs. We are using PCI express initialization code that is based on the Circle software library and the Linux PCI Express driver.

I also did some testing to see what happens if software accesses non-present PCI Express addresses. A write to a non-existent address is ignored by the hardware (I expected a data abort exception). A read to a non-existent address causes a data abort exception some time after the actual read, at the point where the abort exception is enabled in the CPSR. Neither of those hangs the CPU core.

After the CPU core hangs, we tried to completely re-initialize the PCI bus from another core, and that works up until the first attempt to access the PCI Express memory mapped registers for the Ethernet chip. Then the other core also hangs. Re-initializing the PCI Express bus does not clear the condition that caused the bus hang.

I can cause a bus time-out by setting the DMA burst size to 512 bytes. Then I get a data abort on a PCIe access by the CPU core. Setting the DMA burst size to a smaller value does not cause a data abort on the CPU core.

We tested part of our application under Linux on the Compute Module 4. The PCI Express with the Intel 82576 driver does not hang in Linux. Based on that we believe the hardware is working properly. We are using an 82576 chip directly connected to Compute Module 4 without a PCIe socket or separate PCIe board. Our embedded OS and a nearly identical driver for the 82576 Ethernet chip are working on an Intel Atom based board that we have been selling for many years.

I am hoping someone can provide suggestions about what might cause this kind of bus hang accessing the PCI Express on the Raspberry Pi Compute Module 4. I expected that a PCIe error would cause a data abort exception, or at least the bus cycle would time out. Instead we are seeing a bus hang. It is also strange that a complete PCI Express reset does not clear the bus hang. I believe that should also reset the Ethernet chip connected to the PCI express bus.

This problem is preventing us from completing our product development, and it is also not the only problem we are having with the PCI Express. The other problem is that we have not been able to successfully perform DMA transfers to cached areas of RAM even if we perform the necessary cache clean and invalidate operations. The DMA transfers complete normally, but the data in RAM is not always correct. The data is sometimes not changed by the DMA transfer to RAM. I'm wondering if these two issues might be related.

dp11
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 857
Joined: Thu Dec 29, 2011 5:46 pm

Re: RPI4 PCI Express Bus Hangs randomly

Fri Jun 17, 2022 5:05 pm

Given that it works under Linux , that suggests something isn't right with your driver. Have your got access to a PCIe bus protocol analyser if not can you hire one?

Without seeing your driver and lots of debug output, it's hard to know whats happening. For instance is an interrupt going off at the same time as something else is happening causing the issue. Do you have some sort of deadlock situation? Or a race condition?

I would very carefully check the workings of the Linux driver. Also I'd check how the bare metal OS is setting everything up .

A single bit error can cause all sort of issues. I have bare metal project that worked fine for years till I tried gcc12.1 which now can generate unaligned accessed , but my os wasn't configured to generate aborts as I knew they would never happen. Took a while to fix that one

ErikWlfn
Posts: 19
Joined: Thu Nov 04, 2021 4:52 pm

Re: RPI4 PCI Express Bus Hangs randomly

Fri Jun 17, 2022 8:52 pm

dp11 wrote:
Fri Jun 17, 2022 5:05 pm
I would very carefully check the workings of the Linux driver. Also I'd check how the bare metal OS is setting everything up.
I'm thinking that I may have to put some kernel debug prints into Linux if the kernel log does not print out enough of the register information to see how it is initializing the hardware. The code in Linux is really not a driver. It is many drivers along with kernel code. Some of the settings come from the device tree blob files as well.

At the moment I am looking at the PCI Express Bridge Control Register 0x3E and the Secondary Discard Timer setting. I am going to try using a longer timer. Setting the bit for the timer will double the timeout. I am also going to try setting the Master-Abort Mode to 1 so that secondary bus master aborts will generate an SError signal. I think that is what causes a data abort exception.

I'm guessing that the secondary bus is discarding a delayed transaction, and because the reporting of that master abort is disabled, we don't get an SError to tell the CPU core that the transaction aborted.

We are also gong to dump out some other registers like the PCI command/status for the bridge and Ethernet chip. We found that a second CPU core can access configuration space after the problem happens. It can even reinitialize the PCI Express up until the first attempted access to the PCIe device's memory mapped registers. When that access happens, the second core also hangs.

There is a secondary bus reset bit in the bridge control register, and we may try using that on a second CPU core to see if we can recover from the problem. That might tell us if the problem is the Ethernet chip. I don't think the PCI Express root complex reset will reset the Ethernet chip on the secondary bus.

The problem happens most frequently in the interrupt service routine for the Ethernet chip where it reads the interrupt status register from the Ethernet chip. That interrupt is almost guaranteed to happen while DMA is still in progress by the Ethernet chip. Sometimes the problem happens in other parts of the driver where the Ethernet chip registers are being read or written. It also seems to only happen when we are using both Ethernet ports on the chip.

I haven't been able to make the problem happen with the compute module 4 IO board and an Intel Ethernet card. The Intel card has the same chip, but it also has an EEPROM with hardware settings and a MAC address. Our hardware does not use the EEPROM. We have similar hardware that has been working for years on an Atom based board with the same OS and Ethernet driver. Of course the PCI initialization is completely different for the Intel root complex, and much of the initialization is done by a BIOS. A lot of the OS on the ARM CPU and the Ethernet driver use the same C code, recompiled for the 32-bit ARM platform. I rewrote some of the OS that is in assembly language, and a couple of assembly language context switching functions in the Ethernet driver.

This is also complicated by the fact that our product is a dual, triple, or quad redundant CPU system, and this problem only seems to happen when we have at least two CPUs running redundantly. The redundancy is loosely synchronized via an Ethernet link between CPUs. Many Ethernet packets are sent per millisecond and the cycle time is 1 ms for controlling an industrial plant. We're using the SPI bus to communicate with Analog and Digital IO cards that we sell. We ran into a lot of problems with the Intel chip set and Intel Ethernet when we got that system working on the Atom platform a few years ago. We are using the Compute Module 4 to reduce the cost and also increase the performance for calculations.

ErikWlfn
Posts: 19
Joined: Thu Nov 04, 2021 4:52 pm

Re: RPI4 PCI Express Bus Hangs randomly

Fri Jun 17, 2022 9:10 pm

Setting the Master Abort Mode in the Bridge Control Register (0x3E) had no effect on the problem. The bus still hung and we got no exception.

The Secondary Discard Timer bit had no effect, and it seems to be unimplemented, since it reads back as zero even if we write it to a 1.

We dumped out the PCI Status register 0x06 for the Ethernet ports and bridge. No error bits were set after the hang.

We set the Secondary Bus Reset bit in the Bridge Control Register and cleared it after a 100 microsecond delay. Doing that following a hang after re-initializing PCIe in a different core did not allow the second core to access the Ethernet chip registers. The second core still hung on the first access to the PCIe memory mapped registers on the Ethernet chip. The configuration space accesses to the Ethernet chip worked without a problem. It seems that some part of the CPU bus to PCI bus logic is getting into a state where no memory transactions to the PCI Express bus will complete. A complete reset of the Compute Module 4 is the only way to recover from the problem.

dp11
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 857
Joined: Thu Dec 29, 2011 5:46 pm

Re: RPI4 PCI Express Bus Hangs randomly

Fri Jun 17, 2022 10:13 pm

Is this a now a discontinued network controller ?

ErikWlfn
Posts: 19
Joined: Thu Nov 04, 2021 4:52 pm

Re: RPI4 PCI Express Bus Hangs randomly

Sat Jun 18, 2022 2:27 am

dp11 wrote:
Fri Jun 17, 2022 10:13 pm
Is this a now a discontinued network controller ?
It was recommended for embedded applications by Intel. Yes, they stopped selling the PCIe boards that use the chip. There are motherboards that use the chip but I suspect they are now obsolete. However, the chip is still available as a separate part. Small companies have to make large purchases (lifetime buys) leaving lots of inventory to use up. Since we have the parts and a driver with all the software fixes for that chip, using a different chip is an expensive proposition.

We are using only the bare minimum features of the chip, so most of it is actually going to waste. In my opinion we got what Intel wanted to sell at the time, not what was best for our application. But, having designed around the part, we were able to meet our design goals. The reality is that the engineering is more expensive to use a different part than simply using up the expensive chips that we already purchased.

So far I have no reason to believe that the Ethernet chip is the problem, since we have it working on a few Atom based products with Intel chip-sets. We have been selling those for many years. This chip is the predecessor of an entire family of chips, and a new chip gets released about every 12 to 18 months from Intel. They all have similar features, but we mostly just use one transmit queue, one receive queue, DMA and interrupts. The PHY is built into the chip and all the chips have similar PHYs. The only unusual thing about our application is that we don't have the initialization EEPROM with the MAC address and default settings. We program the defaults in software. Most of the problems that we had with the chip were solved once we got the "secret" data for initializing the chip without the EEPROM. The PCIe on the chip has always been reliable, though we have always used it with the PCIe ports on Intel chip-sets. It's also PCIe version 1, so it doesn't require much in the way of PCIe functionality. We often use two of them to get four ports on a chip-set with two PCIe x1 "slots". We also use memory mapped registers rather than IO, so that is the same as on the Compute Module 4.

The major difference on the Compute Module 4 is the lack of cache coherency between the PCIe and the CPU cores. PCIe on Intel's chip-sets does cache snooping and supports DMA directly to cached areas of RAM. At the moment we are just copying the network packets to/from uncached RAM to perform the DMA transfers.

The hang we are seeing does not appear to be due to DMA, though I suppose DMA could have somehow left the bus in an invalid state. If it was preventing RAM access, I would have expected hangs on instructions that aren't accessing the PCIe memory mapped registers. We are consistently seeing hangs only when reading or writing to the PCIe memory mapped registers. We can also run without hangs by reducing the rate that Ethernet packets are sent and received. The 1 ms cycle time fails after a few minutes up to an hour, and running at a 100 ms rate, or with only a single port seems to run indefinitely without any hangs.

At the moment I am reviewing all the available status registers in the Intel chip, PCIe and CPU system registers to determine what information would be helpful to log on a bus hang. The system registers are separate for each core, so we will probably have to log those before every PCIe access. Any suggestions will be appreciated. Since we are not getting an abort exception on the core that hangs, I don't know that data abort related registers will be useful. So far none of the registers we have examined show any error on the PCIe, and completely reinitializing the PCIe and Ethernet chip from a different CPU core does not recover from the problem. So far we have not seen a failure on Linux, but it also can't send and receive Ethernet packets at the rates of our application.

ErikWlfn
Posts: 19
Joined: Thu Nov 04, 2021 4:52 pm

Re: RPI4 PCI Express Bus Hangs randomly

Mon Jun 20, 2022 9:24 pm

I think that we might have found the conditions causing the bus hang, but they do not violate the PCI Express standards.

Our software transmits identical data to multiple Ethernet nodes. To avoid copying data, it chains the same data (body) onto multiple Ethernet headers containing a source and destination address. That creates packets made up of a 12-byte (address) DMA buffer and a larger (usually around 1300 byte) DMA buffer. The chained buffer uses two descriptors in the ring to send the entire packet. We have the Ethernet chip configured to start transmitting only after a complete packet is in the FIFO, so there cannot be a transmit DMA under-run.

That method works fine on the Intel Atom chip-set that we have been using, and it is a valid way to use the Intel Ethernet chip with the PCIe bus. We found that using multiple buffers for a packet, with the first one being 12 bytes seems to cause the bus to sometimes hang on the Compute Module 4. If we send the packet as one large buffer, we have not gotten a bus hang so far.

Since we could not get the DMA transfers using cached RAM working properly, we were already copying the packets into uncached DMA buffers. It takes about the same amount of time to copy the multi-buffer packets into a single DMA buffer as to copy them to multiple buffers. We modified the software to copy all the buffers for a packet into a single DMA buffer and use a single ring descriptor for the transfer. So far we have not seen a bus hang.

We don't know if the problem is caused by the short 12-byte DMA transfers, or the fact that the packet is split up into two DMA transfers occurring within a few hundred nanoseconds of each other. If the problem only happens on the first DMA burst, then this work around should avoid the problem. If the problem happens on small DMA bursts after the first DMA burst, then we may still see the problem.

Linux probably pads packets to at least 60 bytes, and we also changed the software to pad all the DMA transfers to at least one 64-byte cache line. We're hoping that these changes avoid the bus hang problem, since they are similar to how Linux is probably using the Ethernet chip and the Compute Module 4 PCIe.

I'll post again after the changes have been tested for a while.

ErikWlfn
Posts: 19
Joined: Thu Nov 04, 2021 4:52 pm

Re: RPI4 PCI Express Bus Hangs randomly

Tue Jun 21, 2022 10:49 pm

The PCI Express is now working reliably with the Intel 82576 Ethernet chip. We are performing DMA transfers to/from uncached RAM and using a single DMA buffer for each Ethernet packet. We are also padding the packets to at least 64 bytes before the DMA transfer.

So far this meets our requirements for performance and timing, so I will not post further for this thread. Thank you, everyone who responded. I appreciate your help and suggestions.

Return to “Bare metal, Assembly language”