hglm
Posts: 30
Joined: Fri May 31, 2013 8:24 pm

Re: Experimental enhanced X driver (rpifb)

Tue Jun 18, 2013 7:59 pm

dom wrote:
ssvb wrote: That's a good point, but I don't think it directly applies here. The CPU imageblit implementation is unsurprisingly fully occupying the CPU while it is doing its stuff, so the concurrently running programs are totally out of luck and are just waiting to be scheduled. Even if less memory bandwidth is used, it is still going to be wasted. And with IRQ enabled DMA, the concurrently running applications at least are going to still have a chance to do something useful at the same time.
The dma controller has a WAITS field, which can be used to reduce the memory bandwidth (by taking longer), if this were a problem (but I doubt it will be - the DMA is quick and will not saturate the memory bus for long).
I am wondering how the kernel behaves in some extreme (but not completely uncommon) cases, such as running "cat bigfile.txt' in a console. With one DMA request following another, I wonder whether the kernel is smart enough to schedule enough CPU time to other processes needing the CPU in this case.

A related question is the behaviour of bcm_dma_wait_idle(channel), which is used to wait for completion. Looking at the source code (arch/arm/mach-bcm2708/dma.c), this does "ugly busy wait"; does that mean the kernel cannot service any other requests during this time? What happens to processes needing "near real-time" scheduling precision during a big framebuffer DMA request? Scrolling the whole framebuffer at 1920x1080x32bpp takes about 15 MB/300 MB/s = 0.05 seconds, which could cause issues with other real-time processes. Of course, an IRQ-based DMA waiting would solve this.

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: Experimental enhanced X driver (rpifb)

Tue Jun 18, 2013 11:23 pm

asb wrote:
ssvb wrote:But first we need to come up with a better driver name. It's indeed controversial to use the driver named "sunxifb" (which implies Allwinner) on Raspberry Pi. Maybe something like xf86-video-armfbdev or xf86-video-fbturbo would work better for the unified optimized driver? Any other suggestions are welcome.
fbturbo makes sense to me.
OK, I'll take care of the driver rename a bit later after we are done with the DMA. This is a change which should not be normally done in a rush.

hglm
Posts: 30
Joined: Fri May 31, 2013 8:24 pm

Re: Experimental enhanced X driver (rpifb)

Wed Jun 19, 2013 4:53 pm

Unfortunately it looks heavy file system access concurrent with DMA scrolling can cause a hard crash. I tried the following program to concurrently:

(1) Cat a big text file to the console, triggering almost continuous DMA scrolling.
(2) Read a big file from the root file system (SD card).

The result was a hard crash with loss of video signal.

I used the following program, run as root in the console:

Code: Select all

#!/bin/bash
echo Clear buffer cache
sync; echo 3 > /proc/sys/vm/drop_caches
echo Preloading bigfile.txt
cat bigfile.txt > /dev/null
{ time cat bigfile > /dev/null ; } 2> result.out &
time cat bigfile.txt
This program clears the buffer cache, ensures the file bigfile.txt (size 64K) is loaded into the buffer cache, and then concurrently cats the text file to the console and reads a different big file (bigfile, size 128MB) from the SD card. The 'time' commands serve to measure relative performance but that is irrelevant since a hard crash happens when running the program.

Apparently the concurrent DMA operations on the framebuffer and the SD card cause a problem, maybe because of the latency caused by the framebuffer operations.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6940
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Experimental enhanced X driver (rpifb)

Wed Jun 19, 2013 5:07 pm

hglm wrote:Unfortunately it looks heavy file system access concurrent with DMA scrolling can cause a hard crash. I tried the following program to concurrently:
This is a little surprising as the GPU heavily uses DMA (for accelerated memcpy operations and accessing peripherals, like PWM or HDMI audio output).
Something like xbmc would be concurrently doing sdcard accesses and DMA, and that is reliable.

I wonder if a software bug is more likely. For instance if set_dma_cb was called with h=0, you would trash the whole of memory.
I'm not saying that particular case is happening, but it's another explanation for the observed failure.

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: Experimental enhanced X driver (rpifb)

Wed Jun 19, 2013 5:25 pm

dom wrote:
hglm wrote:Unfortunately it looks heavy file system access concurrent with DMA scrolling can cause a hard crash. I tried the following program to concurrently:
This is a little surprising as the GPU heavily uses DMA (for accelerated memcpy operations and accessing peripherals, like PWM or HDMI audio output).
Something like xbmc would be concurrently doing sdcard accesses and DMA, and that is reliable.
Thanks. It's good to know that DMA is used by more things than just SDHCI. Even if it is initiated from the GPU side.
I wonder if a software bug is more likely. For instance if set_dma_cb was called with h=0, you would trash the whole of memory.
The documentation says: "DMA Transfer Length. This specifies the amount of data to be transferred in bytes. In normal (non 2D) mode this specifies the amount of bytes to be transferred. In 2D mode it is interpreted as an X and a Y length, and the DMA will perform Y transfers, each of length X bytes and add the strides onto the addresses after each X leg of the transfer.
The length register is updated by the DMA engine as the transfer progresses, so it will indicate the data left to transfer."


But in reality if "height == 0 && width != 0" in 2D mode, then just a single row is copied. My guess is that "length register is updated by the DMA engine as the transfer progresses" means that the DMA is first decrementing "height", but when it runs down to zero, then it kinda switches into non-2D mode and begins decrementing "width" resulting in handling an extra row. I wonder if it is 2D mode that could be not very robust in general?
I'm not saying that particular case is happening, but it's another explanation for the observed failure.
Another possible explanation could be that the DMA settings are too aggressive. For example, setting burst length to something more than 5 locks up the system. I'm setting it to 4 right now. But with concurrent DMA transfers in the other channels it might be still too large.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6940
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Experimental enhanced X driver (rpifb)

Wed Jun 19, 2013 5:58 pm

The height field is (height-1) - the spec is not clear on that.
Because you subtract 1, passing in h=0 would be interpreted as (1<<16) lines of image, which would trash things.

I can see your lockup case, and burst_length=2 avoids it for me.

Some info I sent Simon a while back:
Checked with Nick (DMA controller designer) and you are probably
seeing a lockup:

"If you do a big read burst then the DMA will ask for the required no
of beats but only consume 2 128 bit word. The DMA will then stall its
read bus, write out the data and then consume some more.
This runs the risk of a system wide lockup where the stalled read is
preventing the DMA write from completing due to some circular
dependency somewhere in the AXI system.

To make this safe, DMA0 and the VPU DMA have an 8 deep FIFO fitted to
the read data path. This absorbs the extra read words and the read
bus then completes and becomes ready. The DMA then sucks the data out
of the fifo as it needs it.

Are you using DMA 0?"

Me: No, he's currently using 3. The VPU has reserved one of 0/15
reserved (probably 15). I'll see if 0 is available.

"You should be able to do read bursts of 9 with DMA0. 8 beats in the
fifo and 1 gets eaten by the DMA. 10 might also make it if the data
is aligned, else it gets stuck in the DMA arbiter which may stall all
the other channels until it clears.

However if he is managing 5 on a dma with no fifo then I assume that 2
in the DMA , and 1 in the DMA arbiter, and then 1 in the system
arbiter, 1 in the l2 or sdram arbiter and then you have a possible
system lock up so 6 would make sense."

I've had a look, and DMA0 is available. I'll export it in the
"dma.dmachans" parameter in a later build, although I need to be
careful that sdcard doesn't grab it (I think from the code it prefers
channels 2/3).
For now you can probably hack dma.c to give you channel 0 and see if
the high burst hangs go away, and if the performance difference is
worthwhile.
I think the max burst length that is safe is hard to predict. You keep lowering it until it works...

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6940
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Experimental enhanced X driver (rpifb)

Wed Jun 19, 2013 6:32 pm

I've added a new flag to dma driver (BCM_DMA_FEATURE_BULK) which prefers channel 0.
With channel 0, your crash goes away. In fact I think burst length of 8 still works.

I'll push that later, and suggest:

Code: Select all

	ret = bcm_dma_chan_alloc(BCM_DMA_FEATURE_BULK,
				 &fb->dma_chan_base, &fb->dma_irq);
and

Code: Select all

int burst_size = (fb->dma_chan == 0) ? 8:2;
for the PR. Let me know if that sounds okay.

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: Experimental enhanced X driver (rpifb)

Wed Jun 19, 2013 7:20 pm

dom wrote:I think the max burst length that is safe is hard to predict. You keep lowering it until it works...
This actually scares me a bit. What if we have even more DMA channels active at the same time (not just SD and DMA scrolling, but also add GPU DMA to the mix)? Is there any possibility of getting some subtle data corruption instead of an easily visible deadlock?

Also reducing burst length below 4 reduces performance quite significantly. But there is no significant difference between 4 and 5. And burst length 10 in channel 0 was just ~20% faster than burst length 4 in channel 2 (~440 MB/s vs. ~360 MB/s) according to my measurements, which is nice but not too critical.
I've added a new flag to dma driver (BCM_DMA_FEATURE_BULK) which prefers channel 0.
With channel 0, your crash goes away. In fact I think burst length of 8 still works.
Thanks, though it might be safer to be even more conservative if there is any theoretical risk of deadlocks when stressing concurrent DMA use even harder.
I'll push that later, and suggest:

Code: Select all

	ret = bcm_dma_chan_alloc(BCM_DMA_FEATURE_BULK,
				 &fb->dma_chan_base, &fb->dma_irq);
and

Code: Select all

int burst_size = (fb->dma_chan == 0) ? 8:2;
for the PR. Let me know if that sounds okay.
Being able to grab the DMA channel 0 surely looks good to me.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6940
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Experimental enhanced X driver (rpifb)

Wed Jun 19, 2013 8:27 pm

ssvb wrote:This actually scares me a bit. What if we have even more DMA channels active at the same time (not just SD and DMA scrolling, but also add GPU DMA to the mix)? Is there any possibility of getting some subtle data corruption instead of an easily visible deadlock?
My understanding is that the maximum burst size *is* set in stone. It is the number of register stages between two places. i.e. you don't want to have more request in flight than places to put them.
The problem is it's a little complicated to predict. It varies with address (i.e. cached/uncashed alias). It varies with alignment (I think unaligned source requires an extra slot, and unligned dest requires an extra slot).

Data corruption will not occur (not from this issue). It will just be a hard deadlock.

I believe the reason you only saw the deadlock in your stress test, is because when the bus is lightly loaded, the first acccesses are being completed before the deadlock state occurs.
You need another bus master to stall the dma for enough cycles that its whole burst is in flight at once.
If there are more slots that the burst size, then it doesn't matter how much busier the bus gets, nothing will deadlock.

If it makes you feel better, we do *lots* of dma memcopies on GPU (both 1 and 2 dimensional).
We use 15 as the burst size for memory to memory, and 14 for memory to/from peripheral transfers (presumably there's one fewer register stages in that path).
This is with dma channel 15 which I believe has the same buffering as channel 0. We don't get lockups or corruption.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6940
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Experimental enhanced X driver (rpifb)

Sat Jun 22, 2013 1:37 pm

DMA can be used for fast filling. Can this be plumbed into X?

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6940
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Experimental enhanced X driver (rpifb)

Sat Jun 22, 2013 1:54 pm

"next" firmware branch has the kernel patch in (with use of dma channel 0). Use:

Code: Select all

sudo BRANCH=next rpi-update
to test. You will also need to build the X driver:
https://github.com/ssvb/xf86-video-sunxifb/issues/10

hglm
Posts: 30
Joined: Fri May 31, 2013 8:24 pm

Re: Experimental enhanced X driver (rpifb)

Sat Jun 22, 2013 2:19 pm

dom wrote:DMA can be used for fast filling. Can this be plumbed into X?
I implemented accelerated filling in the sunxifb driver on the sunxi platform (which uses a 2D DMA engine analogous to the RPi), and the consensus was that this is not a really big win, except in some contrived cases. because direct framebuffer fills are not very common in the X environment (more often an off-screen buffer is cleared/filled), and especially because the fill bandwidth of pure CPU fill is very high, so DMA fill was actually slower than CPU fill. (see also https://github.com/ssvb/xf86-video-sunxifb/issues/8)

The same argument probably applies to the RPi because it also has a high bandwidth for CPU fill of the framebuffer that is unlikely to be improved with the use of DMA.

However, when the DMA uses an IRQ and the CPU is freed to execute other processes when waiting for the fill (as is the case on the sunxi platform), the reduced CPU utilization does have some advantages. For example, in the somewhat contrived scenario of running x11perf -fillrect500 and concurrently running a CPU intensive process, the DMA fill doubles performance because the fill is essentially free while the CPU is fully occupied with the CPU intensive process.

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: Experimental enhanced X driver (rpifb)

Tue Dec 10, 2013 2:42 pm

A long overdue status update. The rename of the ddx driver to xf86-video-fbturbo has happened a while ago. When using it with 3.10 kernel, windows moving and scrolling becomes DMA accelerated. But even with the 3.6 kernel, a software optimization utilizing VFP for framebuffer read back also provides better performance than the default fbdev driver.

There are a few low hanging fruits:
* Update the DMA code in the kernel to make use of IRQ and reduce the CPU load.
* Implement hardware cursor via DispmanX and also Xv extension (with extra RGB support exposed, we can possibly even hook it into SDL for accelerated scaling and tearing elimination).
* Ensure that all the useful pixman optimizations are included in raspbian and also reach upstream.

Somewhat more difficult things to look into:
* Zero-copy implementation of X11 EGL for Raspberry Pi (basically a nice integration of OpenGL ES with X11 window system).

Of course everything depends on whether the Raspberry Pi users are interested in a faster X11 desktop or not. So far it looks like nobody cares, and the community is just waiting for a wayland silver bullet to save the day. Which is of course also fine, and means less work for me to do ;)

asb
Forum Moderator
Forum Moderator
Posts: 853
Joined: Fri Sep 16, 2011 7:16 pm

Re: Experimental enhanced X driver (rpifb)

Tue Dec 10, 2013 2:46 pm

ssvb wrote: Of course everything depends on whether the Raspberry Pi users are interested in a faster X11 desktop or not. So far it looks like nobody cares, and the community is just waiting for a wayland silver bullet to save the day. Which is of course also fine, and means less work for me to do ;)
Thanks for the status update. We most certainly do care! I intend to include fbturbo as an option in the next Foundation Raspbian image.

User avatar
mikronauts
Posts: 2823
Joined: Sat Jan 05, 2013 7:28 pm

Re: Experimental enhanced X driver (rpifb)

Tue Dec 10, 2013 8:24 pm

Well done!

I care - and I am certain many others do as well.
ssvb wrote:A long overdue status update. The rename of the ddx driver to xf86-video-fbturbo has happened a while ago. When using it with 3.10 kernel, windows moving and scrolling becomes DMA accelerated. But even with the 3.6 kernel, a software optimization utilizing VFP for framebuffer read back also provides better performance than the default fbdev driver.

There are a few low hanging fruits:
* Update the DMA code in the kernel to make use of IRQ and reduce the CPU load.
* Implement hardware cursor via DispmanX and also Xv extension (with extra RGB support exposed, we can possibly even hook it into SDL for accelerated scaling and tearing elimination).
* Ensure that all the useful pixman optimizations are included in raspbian and also reach upstream.

Somewhat more difficult things to look into:
* Zero-copy implementation of X11 EGL for Raspberry Pi (basically a nice integration of OpenGL ES with X11 window system).

Of course everything depends on whether the Raspberry Pi users are interested in a faster X11 desktop or not. So far it looks like nobody cares, and the community is just waiting for a wayland silver bullet to save the day. Which is of course also fine, and means less work for me to do ;)
http://Mikronauts.com - home of EZasPi, RoboPi, Pi Rtc Dio and Pi Jumper @Mikronauts on Twitter
Advanced Robotics, I/O expansion and prototyping boards for the Raspberry Pi

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6940
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Experimental enhanced X driver (rpifb)

Tue Dec 10, 2013 10:09 pm

ssvb wrote:Of course everything depends on whether the Raspberry Pi users are interested in a faster X11 desktop or not. So far it looks like nobody cares, and the community is just waiting for a wayland silver bullet to save the day. Which is of course also fine, and means less work for me to do ;)
Yes, still very interested. We'll always support the X11 desktop, and making it faster is of huge importance.
Wayland is more expensive in memory use than X11, and will probably only really be useful on a 512M board.

The next sdcard image will be 3.10 kernel, so no need to worry about 3.6 kernel for long.

Just let us know what you think should be included in raspbian, and feel free to submit PRs to (3.10) kernel.

User avatar
blachanc
Posts: 466
Joined: Sat Jan 26, 2013 5:03 am
Location: Quebec,canada(french)

Re: Experimental enhanced X driver (rpifb)

Wed Dec 11, 2013 4:55 am

ssvb wrote:
Of course everything depends on whether the Raspberry Pi users are interested in a faster X11 desktop or not. So far it looks like nobody cares, and the community is just waiting for a wayland silver bullet to save the day. Which is of course also fine, and means less work for me to do ;)
I personally posted three times questions regarding the possibility of seeing improvements or not in the future. I am sure many other users did the same.
The reality is:There are so many new posts daily (which is great because it shows how "HOT" the PI is), that it is really easy to miss posts.
Anyway: To answer your question directly: yes, I do care a lot about speed.
And a big Thank you for your work!!!

Ben

(and another thread to subscribe to)
Autism/Asperger syndrome: what is your score on this quiz?
http://www.raspberrypi.org/forums/viewtopic.php?f=62&t=70191

User avatar
Jessie
Posts: 1754
Joined: Fri Nov 04, 2011 7:40 pm
Location: C/S CO USA

Re: Experimental enhanced X driver (rpifb)

Wed Dec 11, 2013 5:20 am

ssvb wrote:A long overdue status update. The rename of the ddx driver to xf86-video-fbturbo has happened a while ago. When using it with 3.10 kernel, windows moving and scrolling becomes DMA accelerated. But even with the 3.6 kernel, a software optimization utilizing VFP for framebuffer read back also provides better performance than the default fbdev driver.

There are a few low hanging fruits:
* Update the DMA code in the kernel to make use of IRQ and reduce the CPU load.
* Implement hardware cursor via DispmanX and also Xv extension (with extra RGB support exposed, we can possibly even hook it into SDL for accelerated scaling and tearing elimination).
* Ensure that all the useful pixman optimizations are included in raspbian and also reach upstream.

Somewhat more difficult things to look into:
* Zero-copy implementation of X11 EGL for Raspberry Pi (basically a nice integration of OpenGL ES with X11 window system).

Of course everything depends on whether the Raspberry Pi users are interested in a faster X11 desktop or not. So far it looks like nobody cares, and the community is just waiting for a wayland silver bullet to save the day. Which is of course also fine, and means less work for me to do ;)
Always interested. I won't even run most gui programs because the performance is bad. So whatever makes them go faster will be appreciated.

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: Experimental enhanced X driver (rpifb)

Wed Dec 11, 2013 7:41 am

asb wrote: Thanks for the status update. We most certainly do care! I intend to include fbturbo as an option in the next Foundation Raspbian image.
Thanks. Please let me know if you encounter any issues or just have questions.

masterluke
Posts: 200
Joined: Tue Apr 17, 2012 4:10 pm

Re: Experimental enhanced X driver (rpifb)

Wed Dec 11, 2013 10:56 am

Any work done to speed up X would be really, really appreciated. I had pretty much given up hope of seeing something like this, which it why I (and I suspect others) have stopped asking for it.

asb
Forum Moderator
Forum Moderator
Posts: 853
Joined: Fri Sep 16, 2011 7:16 pm

Re: Experimental enhanced X driver (rpifb)

Wed Dec 11, 2013 10:41 pm

I've packaged up fbturbo, and you can now install it from the Foundation's Raspbian repo with

Code: Select all

sudo apt-get update && sudo apt-get install xserver-xorg-video-fbturbo
fbturbo should be used by default when you startx. If in doubt, check /var/log/Xorg.0.log. If you wish to disable fbturbo, then either apt-get remove xserver-xorg-video-fbturbo or get rid of /usr/share/X11/xorg.conf.d/99-fbturbo.conf. I'm particularly interested in any regressions - is there anything that seems slower than before, or fails to render correctly.

Thanks again for all your work on this, ssvb.

gkreidl
Posts: 6345
Joined: Thu Jan 26, 2012 1:07 pm
Location: Germany

Re: Experimental enhanced X driver (rpifb)

Fri Dec 13, 2013 4:56 pm

Thanks, Alex

I had already compiled it myself and it seemed to work nice. But it's a good thing to have it in the repository now.
Minimal Kiosk Browser (kweb)
Slim, fast webkit browser with support for audio+video+playlists+youtube+pdf+download
Optional fullscreen kiosk mode and command interface for embedded applications
Includes omxplayerGUI, an X front end for omxplayer

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 32864
Joined: Sat Jul 30, 2011 7:41 pm

Re: Experimental enhanced X driver (rpifb)

Fri Dec 13, 2013 5:17 pm

This is great stuff. Do we have any figures on improvements? Anecdotal would be fine!
Principal Software Engineer at Raspberry Pi Ltd.
Working in the Applications Team.

dom
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 6940
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge

Re: Experimental enhanced X driver (rpifb)

Fri Dec 13, 2013 5:22 pm

jamesh wrote:This is great stuff. Do we have any figures on improvements? Anecdotal would be fine!
I did try it yesterday. Was quite snappy. Dragging windows seemed smoother.
Personally I'd prefer some benchmarks to anecdotes, but I've not done a side by side comparison yet.

Why not try it?

ssvb
Posts: 112
Joined: Sat May 19, 2012 6:15 pm

Re: Experimental enhanced X driver (rpifb)

Fri Dec 13, 2013 5:29 pm

jamesh wrote:This is great stuff. Do we have any figures on improvements? Anecdotal would be fine!
Dragging windows should be happening without an ugly redraw trail. That's the performance issue which is specifically addressed.

As for some reproducible benchmark numbers, you can try running gtkperf. But it is a poor benchmark because results heavily depend on the used gtk theme, also the workload is purely synthetic and not representing real applications good enough.

Return to “General discussion”