Last post update: 28/04/2014
1. Installation
The rpi-update kernel has fiq_fsm included.
Code: Select all
sudo rpi-update
2. Overview of changes
Introducing FIQ_FSM: The flying spaghetti monster FIQ
The dwc_otg driver has been rewritten to include a much more fully-featured FIQ. This FIQ now handles select types of transcation by itself without any involvement from the dwc_otg host controller driver (HCD).
Two modes of operation are provided:
"NOP" FIQ which emulates the base FIQ implemented by gsh some time ago. This is an extremely fast interrupt handler designed solely to hold off SOF interrupts until USB transactions need queueing by the driver. All other driver functionality is unchanged when using this FIQ - there have been some minor tweaks that result in a small reduction in interrupt rate.
"FSM" FIQ
- Performs the entirety of all split transactions without driver intervention via a state machine design.
- A bitmask module parameter selects which features are enabled at boot-time.
- Implements its own stateless microframe "pipelining" allowing optimal use of a full-speed frame bandwidth.
- Performs an entire URB's worth (32 or 64 transactions total) of high-speed Isochronous transactions to an endpoint (webcam or DVB dongle)
Advantages of pushing more work into the FIQ:
- When handling a host channel interrupt, the FIQ takes approximately 0.1x the CPU time that the HCD takes to service the interrupt and perform the next transaction. This results in an increase in CPU cycles available for everything else: the reduction typically increases available cycle count by 20% under worst-case conditions.
- For SOF interrupts, the CPU time is approx 0.03x the time the driver takes. A 490ns interrupt handler is important when the interrupt generation rate is 8000 per second.
- The memory footprint of the FSM FIQ is tiny compared to the HCD. This dramatically reduces both the effect and likelihood of L1 cache evictions when servicing a FIQ interrupt which should tangentially increase responsiveness.
- The FIQ is unaffected by system interrupt latency. The FIQ is only ever disabled in minimal critical sections where the HCD is reading or writing FIQ state information.
- There's a nice side-effect of performing transactions in lock-step in the FIQ: under heavy load, there's an interrupt-aggregation effect. This means that the total time spent in the HCD interrupt handlers levels off as workload increases, rather than increasing until the Pi grinds to a halt.
- The OTG hardware has a number of bugs in it that cause scheduling problems for periodic transactions. By precisely tracking interrupt status and host channel state, these scheduling errors can be worked around or masked.
- High-speed isochronous transactions on dwc_otg are especially vulnerable to interrupt latency: by giving control of a whole batch (usually 32 or 64) of transactions to the FIQ, they can be precisely scheduled and also reap the benefit of 0.1x the CPU time.
3. Known issues / bug list
- There is a bug affecting interrupt processing with fiq_enable = 0. Use fiq_enable=1.
- If the root port is disconnected on boot, or if the root port is disconnected while the FIQ is running, USB becomes unresponsive. This is because the core changes mode from host to device which changes interrupt register meanings.
Github issues have been logged for-
USB audio DACs being broken - https://github.com/raspberrypi/linux/issues/575- USB breaking in general (characterised by ethernet dropouts) - https://github.com/raspberrypi/firmware/issues/268
4. Module options in detail
- dwc_otg.fiq_enable: Support using the ARM FIQ.
- dwc_otg.fiq_fsm_enable: If 1, then the larger FSM handler is installed. If 0, the NOP FIQ is installed.
- dwc_otg.fiq_fsm_mask: Bitmask of transaction types to perform in the FIQ. Has no effect for the NOP FIQ.
Bit 0: Accelerate non-periodic split transactions
Bit 1: Accelerate periodic split transactions
Bit 2: Accelerate high-speed isochronous transactions
The default is 0x7, i.e. all options enabled. - dwc_otg.nak_holdoff: default 8. For split transactions to bulk endpoints, this adjustable parameter specifies the hold-off time in microframes before retrying a transaction to a full-speed Bulk endpoint. Useful for throttling interrupt frequency. Set 0 to disable. This can be used with either the NOP FIQ or FSM FIQ.
The NAK holdoff for bulk split transactions is now adjustable.
The NAK holdoff can be used to dramatically reduce the CPU interrupt frequency when polling full-speed bulk endpoints. The default value of 8 should be used in most cases. However, if the answer to all these questions is yes:
- - The only full-speed devices attached with bulk endpoints are serial UART adapters or similar with documented tx/rx FIFO sizes
- The data source accumulation rate is "slow" compared to USB1.1 bandwidth (slow means <0.1Mbit/s)
- Latency of returned data is not an issue (latency in ms = nak_holdoff/8)
As an example for a device with 128-byte FIFOs and a slow baud rate, the maximum values listed below can be used without risking data loss:
Code: Select all
Baud nak_holdoff
2400 2048
4800 1024
9600 512
19200 256
38400 128
57600 64
115200 32
Control endpoints are throttled to a fixed interval of 8 microframes.
6. Limitations
The FIQ is still dependent on the HCD to queue periodic transactions in a timely fashion. While each individual stage of a split transaction or high-speed isochronous transaction can be performed perfectly, endpoints can still get a longer service interval if there is a long interrupt hold-off time. With typical strenuous usage of subsystems known to cause interrupt latency (heavy write activity to filesystems/SD card, heavy ethernet use) it is possible for periodic transactions to be queued too late in a full-speed frame to be performed. The FIQ will automatically time-out any transaction that could not be started in the correct frame. In most cases, this results in the transaction being re-queued for the subsequent frame (in the case of interrupt transactions) but for Isochronous transports this will cause data loss.
There is an upper bound to the amount of USB1.1 bus bandwidth that can be used per TT. This restriction is born from the limitations of the hardware, which conspire to reduce the throughtput for all types of split transaction. In effect, we can only make use of:
- - Approx 45% of a downstream TT's non-periodic bandwidth
- Maximum 3 periodic transactions per frame per TT, inclusive of Isochronous
- Maximum of 752 periodic bytes per frame for an Isochronous IN or OUT endpoint
- Using large-bandwidth Isochronous transport will reduce the number of other types of transactions that can be completed in a frame.
The driver rewrite has resulted in a much more aggressive reservation of host channels for transactions performed by the FIQ. Each host channel can theoretically be recycled for another transaction after each transfer complete interrupt (and this is what the HCD does at the cost of significant CPU time) but for a FIQ-enabled transfer the host channel is reserved for the duration of the transfer. This imposes a greater constraint on the maximum number of active endpoints that it is possible to communicate simultaneously with: typical effects would be that bulk transport endpoints start to slow down in throughput as contention for host channels occurs. In extreme cases, Isochronous will start to lose out on host channel contention and thus miss frames.
The microframe scheduler is currently unaware of the increased reservation period necessary for host channels used by the FIQ. It also does not accurately track the frame bandwidth required when considering a full-speed transaction and the associated guard interval.