Go to advanced search

by 6by9
Thu Nov 04, 2021 3:47 pm
Forum: Troubleshooting
Topic: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS
Replies: 48
Views: 2307

Re: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS

The VPU, by loading data into the VRF, is largely independent of caching, whereas NEON relies on the ARM caches for efficient operation. i have noticed that if the VPU is clocked at something slow like 50mhz, the sample function takes 70 clock cycles from uncached ram but when the VPU is up past 40...
by cleverca22
Thu Nov 04, 2021 3:26 pm
Forum: Troubleshooting
Topic: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS
Replies: 48
Views: 2307

Re: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS

The VPU, by loading data into the VRF, is largely independent of caching, whereas NEON relies on the ARM caches for efficient operation. i have noticed that if the VPU is clocked at something slow like 50mhz, the sample function takes 70 clock cycles from uncached ram but when the VPU is up past 40...
by 6by9
Thu Nov 04, 2021 2:14 pm
Forum: Troubleshooting
Topic: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS
Replies: 48
Views: 2307

Re: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS

... ARM caches for efficient operation. Actually VPU vector code is fine under KMS. Arm doesn't use that hardware. You will lose the ability to submit QPU jobs to VPU (as arm is now in charge of them). However there is an arm side ioclt that allows QPU code to be submitted. Ooh, interesting - thanks ...
by cleverca22
Wed Nov 03, 2021 11:46 pm
Forum: Troubleshooting
Topic: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS
Replies: 48
Views: 2307

Re: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS

The VPU is fast and there is an example in /opt/vc/src/hello_pi/hello_fft nope! thats the QPU/V3D example hello_fft uses the 3d core to do fft stuff but my code is using the VPU core, which is still basically unused, only the pwm audio really puts any load on it I would ...
by jamesh
Wed Nov 03, 2021 1:53 pm
Forum: Troubleshooting
Topic: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS
Replies: 48
Views: 2307

Re: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS

... stop working reliably anyway. Actually VPU vector code is fine under KMS. Arm doesn't use that hardware. You will lose the ability to submit QPU jobs to VPU (as arm is now in charge of them). However there is an arm side ioclt that allows QPU code to be submitted. Ooh, interesting - thanks ...
by dom
Wed Nov 03, 2021 11:52 am
Forum: Troubleshooting
Topic: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS
Replies: 48
Views: 2307

Re: Pi 0 2 w does not work with 64 Bit Raspberry Pi OS

... stop working reliably anyway. Actually VPU vector code is fine under KMS. Arm doesn't use that hardware. You will lose the ability to submit QPU jobs to VPU (as arm is now in charge of them). However there is an arm side ioclt that allows QPU code to be submitted.
by cleverca22
Thu Oct 14, 2021 9:20 pm
Forum: General discussion
Topic: Why is the pi 4 nearly 10 times slower than Intel XEON 2.4 Ghz chips?
Replies: 50
Views: 2872

Re: Why is the pi 4 nearly 10 times slower than Intel XEON 2.4 Ghz chips?

Is there an example for double-precision floating point with shaders or VPU code? the v3d/qpu shaders operate on either 32bit floats or 8bit floats pinned to the 0-1 range https://github.com/matthewarcus/mandelpi/blob/master/mandel.qasm is an example of a compute shader ...
by Gavinmc42
Mon Sep 13, 2021 1:16 am
Forum: Bare metal, Assembly language
Topic: open firmware and booting custom apps fast
Replies: 51
Views: 4769

Re: open firmware and booting custom apps fast

... running a ThreadX app. Replace that with a LK one? I prefer Open firmware all around. QEMU for VPU is looking more useful. I checked years ago the QPU's have no access to GPIO :( Temp sensors are one of the first I2C sensors I used on Pi's. Made a controller with a Zero that worked pretty good ...
by cleverca22
Sat Sep 04, 2021 1:58 am
Forum: Graphics programming
Topic: HW accelerated JPEG encoding Gstreamer
Replies: 8
Views: 1097

Re: HW accelerated JPEG encoding Gstreamer

... encoded by the start_x.elf firmware using the VPU/QPUs? there is a dedicated jpeg encoder/decoder block, completely seperate from the vpu and v3d/qpu though the VPU is still running the drivers, to manage the jpeg block 0x7e005000 is the jpeg encoder block, no public docs on how to manage it exist, ...
by cleverca22
Wed Aug 11, 2021 8:07 pm
Forum: Graphics programming
Topic: STICKY: All about accelerated video on the Raspberry Pi [thanks all contributors!]
Replies: 55
Views: 5673

Re: All about accelerated video on the Raspberry Pi (check my notes)?

... and all compositing is usually done by the compositing window manager, ignoring the hw capabilities entirely it instead goes thru opengl (and the QPU's), because x11 was designed for differently capable GPU's with weaker composition hardware I was trying to get gstreamer to do it without X running ...
by cleverca22
Wed Aug 11, 2021 11:00 am
Forum: Graphics programming
Topic: STICKY: All about accelerated video on the Raspberry Pi [thanks all contributors!]
Replies: 55
Views: 5673

Re: All about accelerated video on the Raspberry Pi (check my notes)?

"QPU" units which provide raw video compute that can be used in lots of ways the QPU is for running shaders and opengl, and has no direct connection to video encode/decode H264 encode actually uses a number of sub-blocks: ...
by Gavinmc42
Wed Aug 11, 2021 8:07 am
Forum: Graphics programming
Topic: STICKY: All about accelerated video on the Raspberry Pi [thanks all contributors!]
Replies: 55
Views: 5673

Re: All about accelerated video on the Raspberry Pi (check my notes)?

... Pi4/VC6 is evolving into Open source and is changing all the time. I think some gets backported to the VC4 Pi's? There is some VC4 and VC6 Python QPU code around too. Start_x.elf has camera lens correction etc in it, that keeps changing. Then there is the always evolving DeviceTree. It's all a ...
by egnor
Wed Aug 11, 2021 5:55 am
Forum: Graphics programming
Topic: STICKY: All about accelerated video on the Raspberry Pi [thanks all contributors!]
Replies: 55
Views: 5673

STICKY: All about accelerated video on the Raspberry Pi [thanks all contributors!]

... OS compression/decompression helpers (motion estimation, prediction, etc.) video capture helpers (lens correction, pixel dematrixing, etc) "QPU" units which run shaders for 3D rendering the Hardware Video Scaler (HVS) that does output-time scaling and compositing Pixel Valves and encoders ...
by cleverca22
Sat Jun 05, 2021 3:40 pm
Forum: C/C++
Topic: gmp (Gnu multiple precision arithmetic lib) on Intel i7 5x/2x faster than on Pi400 32bit/64bit OS
Replies: 139
Views: 13173

Re: gmp (Gnu multiple precision arithmetic lib) 5x faster on intel i7 than on Pi400 (two samples)

... the usual arm cores you expect (with neon on some models) the dual-core VPU, with a 16 lane int-only vector core, where the firmware runs the QPU, i think its 12 cores of 16-lane vector only, but is effectively 192 cores, this is where shaders run because they are seperate cores, you could ...
by cleverca22
Sun May 16, 2021 2:06 am
Forum: C/C++
Topic: gmp (Gnu multiple precision arithmetic lib) on Intel i7 5x/2x faster than on Pi400 32bit/64bit OS
Replies: 139
Views: 13173

Re: gmp (Gnu multiple precision arithmetic lib) 5x faster on intel i7 than on Pi400 (one sample)

... not looked into some very rough math says thats about 125 milion 64bit int mults per sec, not sure how that compares to the power of the arm or qpu cores...
by kngmalza
Tue May 04, 2021 7:59 pm
Forum: Advanced users
Topic: Raspberry pi4 py-videocore6 execute a program using GPU
Replies: 0
Views: 117

Raspberry pi4 py-videocore6 execute a program using GPU

... to use py-videocore6 or achieve my result, i write this on my example code: from videocore6.driver import Driver from videocore6.assembler import qpu from time import monotonic from hashlib import sha256 @qpu def mine(asm, message, difficulty=1): assert difficulty >= 1 prefix = '1' * difficulty ...
by Anichang
Thu Mar 25, 2021 9:47 pm
Forum: General programming discussion
Topic: How to execute arbitrary code on VPUs?
Replies: 5
Views: 782

Re: How to execute arbitrary code on VPUs?

https://github.com/ali1234/vcpoke/blob/master/main.c#L21-L53 this implements 2 VPU functions, to read/write ram/mmio due to the security in the chip design, some things cant be read by the arm core Awesome! Thanks! I finally understand how to use the execute_code() and execute_qpu() in mailbox.h th...
by cleverca22
Sat Mar 13, 2021 2:22 pm
Forum: Bare metal, Assembly language
Topic: RPI4 cache
Replies: 7
Views: 1084

Re: RPI4 cache

... for some DSP tasks. the part i described is the VPU, which is a dual-core cpu with vector extensions, where the firmware runs there is ALSO the QPU, where opengl shaders run, and that is effectively a 192 core cpu with heavy restrictions on what it can do, that is what i would call a "traditional ...
by cleverca22
Thu Mar 04, 2021 5:13 am
Forum: Advanced users
Topic: Can I Query VC Performance Counters Using 'perf' ?
Replies: 3
Views: 442

Re: Can I Query VC Performance Counters Using 'perf' ?

... you looked at was vc4 on a pi2, with "dtoverlay=vc4-fkms-v3d" and: pi@pi2:~ $ GALLIUM_HUD='L2C-total-cache-hit,L2C-total-cache-miss;QPU-total-idle-clk-cycles,QPU-total-clk-cycles-vertex-coord-shading,QPU-total-clk-cycles-fragment-shading,QPU-total-clk-cycles-waiting-TMU' glxgears ...
by Seneral
Wed Dec 16, 2020 5:27 pm
Forum: Graphics programming
Topic: VC4CV - framework for CV using GL and QPU [help needed]
Replies: 5
Views: 1269

Re: VC4CV - framework for CV using GL and QPU [help needed]

Ok so I just got the following stall using the minimal program: pi@raspberrypi:~/VCREPO/build $ sudo ./QPUMin -c qpu_mask_tiled.bin -q 010000000000 SETUP: 10 instances processing 1/2 columns each, covering 80x60 tiles, plus 0 columns dropped -- QPU Enabled -- QPUs 1-4: 15 ...
by Seneral
Wed Dec 16, 2020 5:10 pm
Forum: Graphics programming
Topic: VC4CV - framework for CV using GL and QPU [help needed]
Replies: 5
Views: 1269

Re: VC4CV - framework for CV using GL and QPU [help needed]

Thanks for the interest, greatly appreciated. All the testing codes are below 4096bytes, with the newest tiled threshold shader (qpu_mask_tiled.asm) being the largest at 3720bytes with all the loops unrolled. I don't know if you're familiar with vc4asm but that is the assembler I chose. ...
by Akane
Wed Dec 16, 2020 3:26 pm
Forum: Graphics programming
Topic: VC4CV - framework for CV using GL and QPU [help needed]
Replies: 5
Views: 1269

Re: VC4CV - framework for CV using GL and QPU [help needed]

The number 500 reminds me of 4096 bytes (512 instructions) of instruction cache per slice. How long is your QPU code? I think you should bisect your problems to trace the cause. Can you offer us a minimal code that reproduces your problems? Our public libraries, which are working ...
by Seneral
Wed Dec 16, 2020 2:51 pm
Forum: Graphics programming
Topic: VC4CV - framework for CV using GL and QPU [help needed]
Replies: 5
Views: 1269

Re: VC4CV - framework for CV using GL and QPU [help needed]

... if it wasn't limited by the camera thoughput. The single fullscreen shader only manages 50-60fps. Even when I limit the tiled shader to use one QPU (so all execute serially) it still outperforms the single shader with 65-75fps - even at a more fair 1664x1232 for the simplified tiled setup it ...
by Seneral
Sun Dec 13, 2020 7:07 pm
Forum: Graphics programming
Topic: VC4CV - framework for CV using GL and QPU [help needed]
Replies: 5
Views: 1269

Re: VC4CV - framework for CV using GL and QPU [help needed]

... blob detection is really lacking in performance and can't even run the full sensor resolution, so it's quite useless for calibration, too. But the QPU tiled camera program still has problems. By now I think they are less QPU related and more camera pipeline related, so if anyone has knowledge about ...
by Seneral
Sat Dec 12, 2020 6:52 pm
Forum: Advanced users
Topic: HQ camera - use it as a streaming device - Redirect camera output to certain HDMI port
Replies: 6
Views: 1570

Re: HQ camera - use it as a streaming device - Redirect camera output to certain HDMI port

... it's pretty involved (with lot's of custom code), but if you want lowest latency and highest throughput, I'm currently experimenting with the QPU and a side-effect of that is that I blit the camera output to the screen for debugging. It completely skips the window manager and everything, it ...

Go to advanced search