C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Speeding up rpi-fbcp

Mon Mar 16, 2015 8:21 pm

Im not sure if this is the right place for this, but my question relates to rpi-fbcp which uses the low-level videocore stuff you guys know all about (and I dont...) :)

Im trying to run an application which uses OpenGL ES (Kivy) on a Raspberry Pi, and direct the output to a secondary monitor (not the hdmi output). To my knowledge the only way to achieve this at the moment is to use rpi-fbcp (https://github.com/tasanakorn/rpi-fbcp) to take snapshots of the primary frame buffer (fb0) and copy them to frame buffer used by the secondary monitor (fb1).

At present I can run my Kivy app with output to an hdmi monitor and it works excellently. Fast and smooth. I can also use my secondary monitor with X outputing directly to fb1, and it works well. To me this suggests that Kivy is working correctly, and also my secondary monitor is working correctly.

Unfortunately when I try and put everything together I start running into problems. With rpi-fbcp mirroring fb0 onto fb1 I get a lag of nearly a second between screen updates on the hdmi monitor and the secondary monitor. This applies to both my Kivy app and an X desktop. As far as I can test it is independent of output resolution, my secondary monitor has a native resolution of 320x240 which I cant display on an hdmi monitor, but the lag is pretty much the same for larger resolutions (which fbcp scales to fit the smaller secondary monitor).

Now I am pretty sure that fbcp works properly for other situations, if everyone had to deal with this 1 second lag then no-one would be using it at all. Does anyone know what it is about my particular setup that could be causing these problems?

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Tue Mar 17, 2015 4:16 am

When you say lag, are you saying the updates on the secondary monitor lag behind the HDMI monitor by one second, or do you mean that you are only getting one update per second on the secondary monitor? Also, which Raspberry Pi do you have?

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Tue Mar 17, 2015 7:53 pm

Possibly both?

With both an hdmi monitor and my secondary monitor connected, I can move the mouse or click a button on and watch the action display on the secondary monitor around a second later than on the hdmi monitor.

The delay is similar with just a text console displayed, and I have also observed that the top of the screen responds much better than the bottom. I can type a line of text on the top line and it seems to respond acceptably. If the screen is full and im typing at bottom line then there is significant lag between pressing keys and the characters appearing on the screen.

It looks to me like the snapshot and copy operation fills fb1 from the top of the screen down, and it takes a long time to complete. The top of the screen is getting a half decent refresh rate, but the bottom is not. When a large portion of the screen redraws you can see tearing as it updates.

Is there any way of getting some feedback on how long this loop is actually taking to run? If the gpu operations are taking a lot longer than expected then the 25ms sleep could be completely wrong.

Code: Select all

while (1) {
        ret = vc_dispmanx_snapshot(display, screen_resource, 0);
        vc_dispmanx_resource_read_data(screen_resource, &rect1, fbp, vinfo.xres * vinfo.bits_per_pixel / 8);
        usleep(25 * 1000);
    }
EDIT:
I am testing on a Model A+, I have made contact with someone working on a similar project (using Kivy, a RPi and a different secondary screen) and he is reporting similar performance. I will find out what model he is using too.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 12:22 am

I have written some code to try and work out the amount of time a snapshot takes.

Code: Select all

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>

#include "bcm_host.h"

int main(void)
{
    bcm_host_init();

    DISPMANX_DISPLAY_HANDLE_T display = vc_dispmanx_display_open(0);

    DISPMANX_MODEINFO_T info;
    vc_dispmanx_display_get_info(display, &info);

    int width = 320;
    int height = 240;
    int pitch = width * 2;

    printf("%dx%d -> %dx%d\n", info.width, info.height, width, height);

    void *data = malloc(pitch * info.height);

    VC_RECT_T rect;
    vc_dispmanx_rect_set(&rect, 0, 0, width, height);

    uint32_t vc_image_ptr;
    DISPMANX_RESOURCE_HANDLE_T resource =
        vc_dispmanx_resource_create(VC_IMAGE_RGB565,
                                    width,
                                    height,
                                    &vc_image_ptr);

    struct timeval start_time;
    struct timeval end_time;
    struct timeval diff;

    gettimeofday(&start_time, NULL);

    int i;
    for (i = 0 ; i < 1000 ; i++)
    {
        vc_dispmanx_snapshot(display, resource, 0);
        vc_dispmanx_resource_read_data(resource, &rect, data, pitch);
    }

    gettimeofday(&end_time, NULL);
    timersub(&end_time, &start_time, &diff);

    double time_taken = diff.tv_sec + (diff.tv_usec / 1000000.0);

    printf("%.1f snapshots per second\n", 1000.0 / time_taken);

    vc_dispmanx_resource_delete(resource);
    vc_dispmanx_display_close(display);

    return 0;
}
With a Raspberry Pi Model B (512MB) I get around 517 snapshots per second (720x576 -> 320x240) when the Raspberry Pi has no load. If I run the hello_videocube example, this reduces to around 154 snapshots per second. My code is just copying the snapshot data into memory, not to a secondary framebuffer. So I would expect the slow down to be worse when the data is copied to the framebuffer.

One thing that you could try would be to reduce the usleep time in rpi-fbcp. You could actually comment out the usleep all together and see what happens (rpi-fbcp will hog the CPU and everything else will be less responsive, but you will get some idea of what is possible).

I would expect that it is the secondary framebuffer itself that is causing the delay. The rpi-fbcp program is copying an entire screen shot to the secondary framebuffer each time vc_dispmanx_snapshot is called. The framebuffer data is copied to the little LCD display using notro's fbtft driver. The driver has to copy all the data to the TFT using the SPI interface. When you use the Desktop or the text console on the secondary display, the amount of data being copied to the LCD would mostly be significantly less.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 1:10 am

I have a Raspberry Pi Model A (256MB) with a NeoSec (tinylcd) 480x320 TFT display. I put some timing code around the snapshot loop of rpi-fbcp and this is the result. With the 25 millisecond sleep in place I get around 10 snapshots per second. If I comment out the sleep all together I get about 90 snapshots per second, when the system is not running anything. This reduces to about 30 snapshots per second when omxplayer is displaying the test video. Surprisingly, even without the usleep rpi-fbcp only take about 13% of the CPU time (according to htop). Try it for yourself!

tvjon
Posts: 853
Joined: Mon Jan 07, 2013 9:11 am

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 8:12 am

Unsure if this helps the OP, but here's the times I get on a RPi 2 with your example code Andy.

This is for the HDMI output, so your code exactly as you've written:

pi@pi2 /opt/vc/src/hello_pi/time $ ./time.bin
1920x1080 -> 320x240
399.5 snapshots per second


& the following for "Gert's vga666" DPI output:

pi@pi2 /opt/vc/src/hello_pi/time $ ./time.bin
1920x1080 -> 320x240
947.6 snapshots per second

Running "hello_videocube.bin" via the vga666:

pi@pi2 /opt/vc/src/hello_pi/time $ ./time.bin
1920x1080 -> 320x240
425.2 snapshots per second

Running "hello_videocube.bin" via the vga666 & showing a movie with omxplayer via the HDMI display simultaneously:

pi@pi2 /opt/vc/src/hello_pi/time $ ./time.bin
1920x1080 -> 320x240
116.4 snapshots per second

Andy, thank you for the code, & indeed all of your useful Github examples.

Oh, do you by chance know how the display_number is initialised in the

"/opt/vc/src/hello_pi/hello_video.c" example code?

bcm_init()

doesn't appear to initialise display_number, so I can't see where it happens.

It's explicit in most examples, just not hello_video.c

Thank you.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 12:05 pm

tvjon wrote:Andy, thank you for the code, & indeed all of your useful Github examples.
You are very welcome!
tvjon wrote:Oh, do you by chance know how the display_number is initialised in the

"/opt/vc/src/hello_pi/hello_video.c" example code?

bcm_init()

doesn't appear to initialise display_number, so I can't see where it happens.

It's explicit in most examples, just not hello_video.c

Thank you.
Unfortunately no, I don't know. I must admit I find the OpenMAX/IL code incomprehensible. I have a couple of projects that I would like to implement using GPU encoding of H264 video and JPEG files, but I have decided to try and understand the MMAL interface rather than bang my head against OpenMAX/IL.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 32420
Joined: Sat Jul 30, 2011 7:41 pm

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 12:20 pm

AndyD wrote:
tvjon wrote:Andy, thank you for the code, & indeed all of your useful Github examples.
You are very welcome!
tvjon wrote:Oh, do you by chance know how the display_number is initialised in the

"/opt/vc/src/hello_pi/hello_video.c" example code?

bcm_init()

doesn't appear to initialise display_number, so I can't see where it happens.

It's explicit in most examples, just not hello_video.c

Thank you.
Unfortunately no, I don't know. I must admit I find the OpenMAX/IL code incomprehensible. I have a couple of projects that I would like to implement using GPU encoding of H264 video and JPEG files, but I have decided to try and understand the MMAL interface rather than bang my head against OpenMAX/IL.
I currently trying to get a yuv2mpeg convertor going that uses MMAL (raw YUV420 to H264 done using HW encoder), once it's done I'll post it on github. I agree than OMX code is incomprehensible.
Principal Software Engineer at Raspberry Pi Ltd.
Working in the Applications Team.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 12:45 pm

jamesh wrote:I currently trying to get a yuv2mpeg convertor going that uses MMAL (raw YUV420 to H264 done using HW encoder), once it's done I'll post it on github. I agree than OMX code is incomprehensible.
That would be great JamesH!

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 10:23 pm

Thanks so much for your help guys, its awesome to get some assistance from people who actually know what they are doing :)

AndyD:
Running your code (pasted directly from your post above):
With nothing else running 853.9 snapshots per second.
With my Kivy app running 718.7 snapshots per second.

Taking rpi-fbcp (25us sleep) with your 1000 count for loop and timing in it:
With nothing else running: 37.0 snapshots per second.
With my Kivy app running: 36.2 snapshots per second.

Taking rpi-fbcp (no sleep) with your 1000 count for loop and timing in it:
With nothing else running: 659 snapshots per second.
With my Kivy app running: 500 snapshots per second.


So based on that, it looks like fbcp is doing its job just fine? I cant really notice any difference in performance from reducing the sleep in rpi-fbcp down to 10us. If i take the sleep out completely then I hit max cpu while trying to run fbcp and my app at the same time so everything runs awfully. With the sleep time at 10us the animations seem a little smoother perhaps, but there is still a lot of lag.

I think you were right to bring up the difference between lag and refresh rate. I'm guessing that since the animation seems relatively smooth the screen must be refreshing a reasonable rate, its just there is a noticeable lag between input and display. Typing on the console highlights this clearly. You can type nearly a whole line of text before characters start appearing, and then sit back and watch as they slowly turn up on the screen. This behaviour is the same for fbcp sleep times of 25, 10 and 0.
AndyD wrote:I would expect that it is the secondary framebuffer itself that is causing the delay. The rpi-fbcp program is copying an entire screen shot to the secondary framebuffer each time vc_dispmanx_snapshot is called. The framebuffer data is copied to the little LCD display using notro's fbtft driver. The driver has to copy all the data to the TFT using the SPI interface. When you use the Desktop or the text console on the secondary display, the amount of data being copied to the LCD would mostly be significantly less.
Can you expand on this a little for me?

Why does using the text console or a desktop outputing directly to fb1 result in less data being copied to the LCD than when using the snapshot?

Doesnt the TFT driver have to write the entire framebuffer to the screen each refresh anyway?

Its sounding like the bottleneck is not fbcp getting data onto the framebuffer, but rather the driver then sending that data to the display. I just dont get why a desktop displayed directly on fb1 can be snappy and responsive but a desktop snapshotted accross from fb0 is not.
Last edited by C_D on Wed Mar 18, 2015 11:16 pm, edited 1 time in total.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 10:38 pm

Hi C_D,

The timing code was really just to investigate how long the actually snapshot takes. I think it is pretty clear that the snapshot itself isn't particularly slow, but is does slow down when the GPU is busy.

When you comment out the usleep in rpi-fbcp did you get any better performance? It improved things a little for me. As I said it was surprising (to me) that removing the usleep from the code didn't cause a huge difference in CPU load. I think that this is probably as far as I can go. The code itself is pretty simple and I think the slowest part of process is copying each frame to the LCD itself.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 10:41 pm

C_D wrote:Can you expand on this a little for me?

Why does using the text console or a desktop outputing directly to fb1 result in less data being copied to the LCD than when using the snapshot?

Doesnt the TFT driver have to write the entire framebuffer to the screen each refresh anyway?
I wouldn't have thought it would, but I could easily be wrong! I will have a look at the fbtft code.

Edit: Can't find the specific code (Notro's how it works wiki page is interesting), but I think that fbtft only does partial updates (using set_addr_win).

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 11:17 pm

Haha, sorry I just edited my previous post while you were posting. I think I answered your question above.

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 11:21 pm

My screen actually doesnt use the Notro driver, however I expect it works in a very similar manner.

If the driver can tell that only a section of the screen needs to be redrawn then that could explain why rewriting the entire frame buffer every frame makes it run slowly. Sounds like I need to find more information on how that side of things work.
Last edited by C_D on Wed Mar 18, 2015 11:51 pm, edited 1 time in total.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 11:26 pm

C_D wrote:My screen actually doesnt use the Notro driver, however I expect it works in a very similar manner.
That is interesting, what screen are you using?
C_D wrote:If the driver can tell that only a section of the screen needs to be redrawn then that could explain why rewriting the entire frame buffer every frame makes it run slowly. Sounds like I need to find more information on how that side of things work.
I am think that is the way to go forward from here.

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Wed Mar 18, 2015 11:51 pm

I am using this screen:
http://www.dfrobot.com/index.php?route= ... ct_id=1062

Which uses this driver:
https://github.com/robopeak/rpusbdisp

I expect it to be fairly similar underneath but the data is sent via USB packets rather than the SPI interface.

From skimming the driver source I see references to 'dirty rectangles' which suggests it does only redraw section that need updating. Somehow I need to make fbcp also follow that protocol and only rewrite sections of the framebuffer not redraw the whole thing every time.

Do you experience the same lag as me when snapshotting a text console accross from fb0 to fb1 as opposed to just displaying the console directly on fb1?

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Thu Mar 19, 2015 1:54 am

C_D wrote:I am using this screen:
http://www.dfrobot.com/index.php?route= ... ct_id=1062

Which uses this driver:
https://github.com/robopeak/rpusbdisp

I expect it to be fairly similar underneath but the data is sent via USB packets rather than the SPI interface.
That would be my guess too.
C_D wrote:From skimming the driver source I see references to 'dirty rectangles' which suggests it does only redraw section that need updating. Somehow I need to make fbcp also follow that protocol and only rewrite sections of the framebuffer not redraw the whole thing every time.

Do you experience the same lag as me when snapshotting a text console accross from fb0 to fb1 as opposed to just displaying the console directly on fb1?
To be honest I haven't used rpi-fbcp in anger. I have a TFT disply, but I write directly to the framebuffer for my project.

On another note, I have had a go at writing my own version of a Raspberry Pi screen copier. It is called raspi2fb. It works in the same way as rpi-fbcp, but I have added some things that I have been thinking about.
  • Allow different devices on the command line
  • Can be run as a daemon
  • Attempts to sleep only as long as needed to maintain the requested refresh rate
  • Compares each row of the snapshot to the framebuffer and only copies if different.
Let me know if it is useful and improves things at all.

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Thu Mar 19, 2015 2:04 am

AndyD wrote:Compares each row of the snapshot to the framebuffer and only copies if different.
That could be just the kind of optimisation I'm looking for.

At the very least it might confirm the theory that the TFT driver only updates sections of the screen as required.

I shall try your code and see what happens 8-)

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Thu Mar 19, 2015 2:31 am

Mate, you are my hero.

Huge improvement in response for small changes on the screen. Typing text on the console for example is now possible at a reasonable rate. There is a big tear when the whole screen refreshes (eg. all lines on the console move up at once), but its definitely helped a lot.

This is a MASSIVE step in the right direction. I think it confirms both the cause of the problem and whats required to fix it. The root cause of the problem is that the TFT is slow to draw the screen, and the workaround is that you only get it to draw small sections of the screen. So to keep everything working you cant rewrite the whole framebuffer every frame or the screen cant keep up. Maybe its worse for my screen than for some of the SPI TFT's? If they were all as bad as mine I would have thought this fix would have been requested long ago.

I wonder how the TFT driver knows which sections of the framebuffer have been changed? The framebuffer itself must provide that information so the TFT driver can update the right sections. It cant be comparing every pixel to see where the modified data is.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Thu Mar 19, 2015 3:10 am

C_D wrote:Mate, you are my hero.
Thanks, I am glad you are finding an improvement.
C_D wrote:Huge improvement in response for small changes on the screen. Typing text on the console for example is now possible at a reasonable rate. There is a big tear when the whole screen refreshes (eg. all lines on the console move up at once), but its definitely helped a lot.
Yes, hadn't thought about tearing, not really surprised. Need to see if FBIO_WAITFORVSYNC works for the secondary display. I will keep tinkering when I have time and post back here with any updates.
C_D wrote:This is a MASSIVE step in the right direction. I think it confirms both the cause of the problem and whats required to fix it. The root cause of the problem is that the TFT is slow to draw the screen, and the workaround is that you only get it to draw small sections of the screen. So to keep everything working you cant rewrite the whole framebuffer every frame or the screen cant keep up. Maybe its worse for my screen than for some of the SPI TFT's? If they were all as bad as mine I would have thought this fix would have been requested long ago.
I think you may be surprised. From looking at some of the fbtft code it is clear that some screens can work at faster spi speeds than others, so perhaps some screens have less trouble using rpi-fbcp.
C_D wrote:I wonder how the TFT driver knows which sections of the framebuffer have been changed? The framebuffer itself must provide that information so the TFT driver can update the right sections. It cant be comparing every pixel to see where the modified data is.
Yes, I am pretty sure that is all part of the framebuffer device.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Thu Mar 19, 2015 11:47 pm

I have now added instruction (and an init.d script) to make raspi2fb run at boot.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Sat Mar 21, 2015 12:54 am

I have made some more changes to rpi2fb. I did some testing and found that reading from the framebuffer (notro's fbtft, Model A Raspberry Pi) is about 3 times slower than reading from a local (malloc'ed) buffer. So I now use two screen buffers and alternate between them. I look for changes by comparing these two buffers, rather than reading from the framebuffer itself.

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Sat Mar 21, 2015 2:44 am

Awesome, can't wait to test it out!

I havent had time to look through the tft driver code yet, do you know if its going to be possible to query the framebuffer directly to find out which parts of the screen have changed? Im sure that information is available, im just not sure how its presented to the driver.

User avatar
AndyD
Posts: 2366
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia

Re: Speeding up rpi-fbcp

Sat Mar 21, 2015 3:32 am

C_D wrote:Awesome, can't wait to test it out!
Great, let me know if there are any issues.
C_D wrote:I havent had time to look through the tft driver code yet, do you know if its going to be possible to query the framebuffer directly to find out which parts of the screen have changed? Im sure that information is available, im just not sure how its presented to the driver.
I think the issue is around the other way. The framebuffer driver itself is only updating the TFT display when something writes to a particular part of the display via the device (/dev/fb1 etc). I think the approach I am taking, only writing rows of pixels that are different to the pixels currently being displayed is the only way, is reasonably sensible. By considering rows, I can use standard function memcmp and memcpy to compare and copy the pixel data (which should be faster than iterating over the buffer itself).

C_D
Posts: 18
Joined: Tue Mar 10, 2015 9:51 pm
Location: New Zealand

Re: Speeding up rpi-fbcp

Sat Mar 21, 2015 4:24 am

If I could isolate a 'window' of pixels that required updating, is there a sensible way of copying that across to the new framebuffer? Ive got a few ideas for isolating the region of interest.

Return to “Graphics programming”