User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Coral TPU, PCIe on Pi 5

Wed Nov 15, 2023 9:52 pm

Some context, from dealing with issues on the Compute Module 4:
The Raspberry Pi 5 is here. And it has a newer, more awesome-r PCI Express bus. And there will soon be options for plugging all manner of PCIe devices into the Pi. One use case that is already popular is USB Coral TPUs used alongside something like Frigate for local camera image processing.

I and many others tried getting the PCIe version of the Coral TPU running on the CM4, to no avail. With the Pi 5, I think it's possible.

I had done some experiments with the flat FFC cable that I was privvy to in early testing, though that cable seems to be the cause of some link errors (especially if I try uprating the port to PCIe Gen 3). So as more folks start making their own impedance controlled cables and adapter boards, or as we find first-party M.2 HATs hit the market, more people will start trying all kinds of crazy things.

My latest update is I am seeing some link errors (one every 5 seconds), and errors initializing interrupts on my Pi 5 with the Coral TPU attached.

Basically:

Code: Select all

pi@pi5:~ $ dmesg | grep apex
[    2.796341] apex 0000:01:00.0: enabling device (0000 -> 0002)
[    2.797046] apex 0000:01:00.0: Couldn't initialize interrupts: -28
[    7.902239] apex 0000:01:00.0: Apex performance not throttled due to temperature
And the device enumerates using the apex PCIe driver as installed following the Coral docs:

Code: Select all

pi@pi5:~ $ sudo lspci -vvvv
...
0000:01:00.0 System peripheral: Global Unichip Corp. Coral Edge TPU (prog-if ff)
	Subsystem: Global Unichip Corp. Coral Edge TPU
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 39
	Region 0: Memory at 1800100000 (64-bit, prefetchable) [size=16K]
	Region 2: Memory at 1800000000 (64-bit, prefetchable) [size=1M]
	Capabilities: [80] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0W
		DevCtl:	CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 256 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
			ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
			ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1
			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: ReqEn-
		LnkCap2: Supported Link Speeds: 2.5-5GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [d0] MSI-X: Enable- Count=128 Masked-
		Vector table: BAR=2 offset=00046800
		PBA: BAR=2 offset=00046068
	Capabilities: [e0] MSI: Enable- Count=1/32 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [f8] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Vendor Specific Information: ID=1556 Rev=1 Len=008 <?>
	Capabilities: [108 v1] Latency Tolerance Reporting
		Max snoop latency: 0ns
		Max no snoop latency: 0ns
	Capabilities: [110 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=26016ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [200 v2] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Kernel driver in use: apex
	Kernel modules: apex
When I attempt to use it, though, for inference:

Code: Select all

[  337.156485] apex 0000:01:00.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Transmitter ID)
[  337.156488] apex 0000:01:00.0:   device [1ac1:089a] error status/mask=000010c1/00006000
[  337.156491] apex 0000:01:00.0:    [ 0] RxErr                  (First)
[  337.156494] apex 0000:01:00.0:    [ 6] BadTLP                
[  337.156496] apex 0000:01:00.0:    [ 7] BadDLLP               
[  337.156498] apex 0000:01:00.0:    [12] Timeout               
[  337.156507] pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
[  337.156511] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[  337.156513] pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00001000/00002000
[  337.156516] pcieport 0000:00:00.0:    [12] Timeout               
[  337.156726] apex 0000:01:00.0: Couldn't reinit interrupts: -28
[  337.156729] pcieport 0000:00:00.0: AER: Corrected error received: 0000:00:00.0
[  337.156734] pcieport 0000:00:00.0: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
[  337.156737] pcieport 0000:00:00.0:   device [14e4:2712] error status/mask=00001000/00002000
[  337.156739] pcieport 0000:00:00.0:    [12] Timeout               
[  337.156744] apex 0000:01:00.0: Permission checking failed.
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

jdb
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 2989
Joined: Thu Jul 11, 2013 2:37 pm

Re: Coral TPU, PCIe on Pi 5

Thu Nov 16, 2023 9:13 am

Aside from the error spam, looks like it's trying to request more interrupt vectors than the hardware is willing to support.

There are two possible mappings for routing of message-signalled interrupts - the BCM root complex MSI target, and a MSI-x target external to the RC that is completely transparent to software. External MSI-x interrupts are translated into GIC SPIs of which there are limited number on the chip.

What happens if you swap the msi-parent target from &mip1 to &pcie1 in the devicetree?

https://github.com/raspberrypi/linux/bl ... 9-L1015C23
Rockets are loud.
https://astro-pi.org

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 5:00 am

Thanks! I will try that tomorrow once I get my bench set up again (moving while trying to get a few videos done on a deadline: not recommended).

It was also recommended to set:

Code: Select all

diff --git a/arch/arm64/configs/bcm2712_defconfig b/arch/arm64/configs/bcm2712_defconfig
index 8ad2775f5..ff2c619c7 100644
--- a/arch/arm64/configs/bcm2712_defconfig
+++ b/arch/arm64/configs/bcm2712_defconfig
@@ -452,9 +452,10 @@ CONFIG_RFKILL_INPUT=y
 CONFIG_NET_9P=m
 CONFIG_NFC=m
 CONFIG_PCI=y
+CONFIG_PCI_MSI=y
 CONFIG_PCIEPORTBUS=y
 CONFIG_PCIEAER=y
-CONFIG_PCIEASPM_POWERSAVE=y
+CONFIG_PCIEASPM_PERFORMANCE=y
 CONFIG_PCIE_DPC=y
 CONFIG_UEVENT_HELPER=y
 CONFIG_DEVTMPFS=y
and recompile the kernel. I will try both options separately and see if one or the other does the trick! I'm guessing it'll be a recursive loop and we'll just uncover some other bug underneath, with my luck!
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 5:02 am

Doing a little exploration...

It seems CONFIG_PCI_MSI is set (Y) in bcm2711 arm defconfig (not arm64): https://github.com/raspberrypi/linux/bl ... onfig#L443

And it is not set in bcm2712 either.
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

jdb
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 2989
Joined: Thu Jul 11, 2013 2:37 pm

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 11:09 am

You don't need explicit support for MSI in the defconfig, it's inherited from two other options. The fact it appears in bcm2711_defconfig is probably because it's not been regenerated after 2712 support got merged.

64-bit 2712 config:

Code: Select all

 .config - Linux/arm64 6.1.61 Kernel Configuration
 > Search (PCI_MSI) ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  ┌───────────────────────────────────────────────────────────────────────── Search Results ──────────────────────────────────────────────────────────────────────────┐
  │ Symbol: PCI_MSI [=y]                                                                                                                                              │
  │ Type  : bool                                                                                                                                                      │
  │ Defined at drivers/pci/Kconfig:39                                                                                                                                 │
  │   Prompt: Message Signaled Interrupts (MSI and MSI-X)                                                                                                             │
  │   Depends on: PCI [=y]                                                                                                                                            │
  │   Location:                                                                                                                                                       │
  │     -> Device Drivers                                                                                                                                             │
  │       -> PCI support (PCI [=y])                                                                                                                                   │
  │ (1)     -> Message Signaled Interrupts (MSI and MSI-X) (PCI_MSI [=y])                                                                                             │
  │ Selects: GENERIC_MSI_IRQ [=y]                                                                                                                                     │
  │ Selected by [y]:                                                                                                                                                  │
  │   - ARM_GIC_V2M [=y] && PCI [=y] 
  
sudo modprobe configs && zgrep MSI /proc/config.gz:

Code: Select all

CONFIG_GENERIC_MSI_IRQ=y
CONFIG_GENERIC_MSI_IRQ_DOMAIN=y
CONFIG_IRQ_MSI_IOMMU=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_PCI_MSI=y
CONFIG_PCI_MSI_IRQ_DOMAIN=y
...
Rockets are loud.
https://astro-pi.org

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 1:06 pm

Ah, that makes sense. So ASPM is the only setting there that may make a difference.
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 5:24 pm

Switching the msi-parent seems to have fixed the interrupt issue... but now I'm hitting:

Code: Select all

# dmesg
[   67.872746] apex 0000:01:00.0: mapping size 0x1000 must be page aligned
[   67.872757] apex 0000:01:00.0: mapping size 0x1000 must be page aligned

# Python output
root@c10211e66ef2:~# python3 /usr/share/edgetpu/examples/classify_image.py --model /usr/share/edgetpu/examples/models/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --label /usr/share/edgetpu/examples/models/inat_bird_labels.txt --image /usr/share/edgetpu/examples/images/bird.bmp
F :39] Attempting to fetch value instead of handling error Failed precondition: Could not map pages : 6 (Invalid argument)
Aborted (core dumped)
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

trejan
Posts: 6662
Joined: Tue Jul 02, 2019 2:28 pm

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 5:31 pm

geerlingguy wrote:
Fri Nov 17, 2023 5:24 pm

Code: Select all

[   67.872746] apex 0000:01:00.0: mapping size 0x1000 must be page aligned
[   67.872757] apex 0000:01:00.0: mapping size 0x1000 must be page aligned
It wants 4k pages so build the bcm2711 kernel and set kernel=kernel8.img in /boot/firmware/config.txt

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 5:57 pm

trejan wrote:
Fri Nov 17, 2023 5:31 pm
geerlingguy wrote:
Fri Nov 17, 2023 5:24 pm

Code: Select all

[   67.872746] apex 0000:01:00.0: mapping size 0x1000 must be page aligned
[   67.872757] apex 0000:01:00.0: mapping size 0x1000 must be page aligned
It wants 4k pages so build the bcm2711 kernel and set kernel=kernel8.img in /boot/firmware/config.txt
Indeed!

I just switched to 4k kernel (just used OOTB kernel), and rebuilt the DKMS modules (see what I did), and now I can get at the TPU from Docker!
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 7:08 pm

A blog post, summarizing everything I needed to do to get it working: A PCIe Coral TPU FINALLY works on Raspberry Pi 5

Thanks so much for the help jdb (and others who have also helped in private back during the alpha testing period...). It feels like this is a nice way to finish up the week :)
Last edited by geerlingguy on Fri Nov 17, 2023 8:49 pm, edited 1 time in total.
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

jdb
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 2989
Joined: Thu Jul 11, 2013 2:37 pm

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 7:58 pm

Excellent.

We'll probably roll up all these accumulated hacks into a "pciex1_compat" or similar devicetree overlay which will let users toggle the various switches with dtparams until they find a working set.
Rockets are loud.
https://astro-pi.org

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 8:11 pm

That would be greatly appreciated!

I expect we'll see some other devices' behavior as more people start messing with the bus.
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

plugwash
Forum Moderator
Forum Moderator
Posts: 3845
Joined: Wed Dec 28, 2011 11:45 pm

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 8:25 pm

geerlingguy wrote:
Fri Nov 17, 2023 7:08 pm
A blog post, summarizing everything I needed to do to get it working: A PCIe Coral TPU FINALLY works on Raspberry Pi 5
The link in the post doesn't work.

Doing some searching it appears the actual url is https://www.jeffgeerling.com/blog/2023/ ... berry-pi-5

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Fri Nov 17, 2023 8:50 pm

plugwash wrote:
Fri Nov 17, 2023 8:25 pm
geerlingguy wrote:
Fri Nov 17, 2023 7:08 pm
A blog post, summarizing everything I needed to do to get it working: A PCIe Coral TPU FINALLY works on Raspberry Pi 5
The link in the post doesn't work.

Doing some searching it appears the actual url is https://www.jeffgeerling.com/blog/2023/ ... berry-pi-5
Whoops, thanks for noticing that—I updated the link!
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

plugwash
Forum Moderator
Forum Moderator
Posts: 3845
Joined: Wed Dec 28, 2011 11:45 pm

Re: Coral TPU, PCIe on Pi 5

Sat Nov 18, 2023 4:17 am

Also is that a cameo of a not yet announced adapter board in the article?

User avatar
geerlingguy
Posts: 472
Joined: Sun Feb 15, 2015 3:43 am
Location: St. Louis, MO, USA

Re: Coral TPU, PCIe on Pi 5

Sat Nov 18, 2023 4:29 am

plugwash wrote:
Sat Nov 18, 2023 4:17 am
Also is that a cameo of a not yet announced adapter board in the article?
It's a prototype board ("engineering sample") made by the same folks who did the M.2 Pineberry Pi HAT/foot. I'm trying to convince them to sell it too, I can't be the only weirdo who likes to jam every PCIe device I've ever seen into a Raspberry Pi :lol:
The question is not whether something should be done on a Raspberry Pi, it is whether it can be done on a Raspberry Pi.

Return to “General discussion”