The problem of priority inversion is well-known and.. actually, scratch that: it’s clearly not well known at all because people keep making the same mistake. So I’m writing this to try and explain the problem (with reference to CAN) in the hope that a search engine indexes this and puts it high enough up the search results for people to find.

The TL;DR for priority inversion:

Priority inversion causes a high priority activity to get delayed for a long time, causing missed deadlines, timeouts, and all kinds of consequent faults. Worse, it’s typically an intermittent problem often not found in testing, and a great source of Heisenbugs because adding instrumentation to look at it can make it go away. It happens when the high priority (and urgent) activity gets stuck behind a low priority (non-urgent) activity and lots of medium priority activities block the low priority activity, which is blocking the high priority activity.

The word ‘activity’ here can mean tasks in a real-time system, scheduled by an RTOS (real-time operating system) that chooses what to run based on priorities. This is exactly what happened on Mars on the 4th July 1997 when the Mars Pathfinder mission landed.

Mars Pathfinder

The spacecraft control system kept going through watchdog resets, triggered because a high priority task didn’t complete in time. The problem was down to priority inversion around access to a resource guarded by a semaphore. A low priority task had the resource locked, and the high priority task was blocked waiting for the low priority task to release the semaphore. But the low priority task couldn’t run because medium priority tasks were running, so it couldn’t unlock the semaphore, and so the high priority task got stuck, the watchdog triggered, the system reset, and the process repeated. You can read more about it directly from the horse’s mouth (Glenn Reeves, the NASA team lead for the Pathfinder spacecraft software).

Priority inversion also happens in CAN. The CAN protocol uses priorities (in the form of CAN IDs) to determine which frame is transmitted on the bus. But the CAN specification says nothing about how frames within a CAN controller are arbitrated. What should happen is that frames are internally arbitrated by priority, so that the frame the CAN engine enters into bus arbitration is the highest of its own frames, and then the CAN bus arbitration picks the highest priority frame bus-wide. But if a FIFO queue is used internally then we get priority inversion.

Here’s a short video showing how priority inversion plays out on CAN bus:

I’ve put together a demo using real CAN hardware to show how this really happens in a CAN system. The demo uses PyBoards - small boards based on the STM32F405 microcontroller with on-chip CAN controllers (ST’s bxCAN controller) and running MicroPython firmware with custom CAN drivers that I wrote.

Demo boards

The PyBoards fit into a custom rack cards that Canis Labs uses to test CAN-HG (CAN-HG is an augmentation of CAN that adds out-of-band data for security and performance, but that’s another story..) The Kvaser Leaf and Busmaster tools are used to see the CAN frames on the bus. The video below shows how priority inversion happens for real, and also how it’s fixed with priority queueing in the CAN drivers.

Priority inversion on CAN is not a theoretical or contrived problem. To see this, let’s take a look at the Zephyr Project CAN system. The CAN API is very simple, with just a can_send call to queue a CAN frame (a callback can be set for a “sent” event). There’s nothing inherently wrong with this API. If the number of buffer slots supports is small then there is pressure on the application developer to create their own queue in software, and inevitably this will be a FIFO queue (I’ll come back to this shortly). The problem is deeper in the drivers. Here is a code snippet from the code for the STM32 CAN controller driver:

	/* Set TX priority to chronological order */
	can->MCR |= CAN_MCR_TXFP;

The bit TXFP is described in the STM32 reference manual for the bxCAN CAN controller as follows:

“The transmit mailboxes can be configured as a transmit FIFO by setting the TXFP bit in the CAN_MCR register. In this mode the priority order is given by the transmit request order. This mode is very useful for segmented transmission.

Yep, we have a priority inversion problem with these drivers. Let’s look at the can_send function in the NXP FlexCAN drivers:

	while (true) {
		alloc = mcux_get_tx_alloc(data);
		if (alloc >= 0) {
			if (atomic_test_and_set_bit(data->tx_allocs, alloc)) {
				continue;
			}

			break;
		}

		if (k_sem_take(&data->tx_allocs_sem, timeout) != 0) {
			return CAN_TIMEOUT;
		}
	}

The NXP FlexCAN controller is a slotted buffer controller, which means it has lots of CAN frame buffer slots in its memory, allocated between sending and receiving CAN frames. The controller can be set to arbitrate between its frames by buffer number or by CAN ID. The driver selects arbitration by buffer number and then in the can_send function they allocate a buffer higher than any buffer in use, in order to preserve FIFO transmission order:

	/* mcux_get_tx_alloc is a linear on array, and binary on atomic_val_t search
	 * for the highest bit set in data->tx_allocs. 0 is returned in case of an empty
	 * tx_alloc, the next free bit otherwise.
	 * The reason to always use a higher buffer number than the current in use is
	 * that a FIFO manner is kept. The Controller would otherwise send the frame
	 * that is in the lowest buffer number first.
	 */

So this driver specifically implement FIFO buffering, which is bad enough, but worse than that is that it spins waiting for a free buffer, where “free” doesn’t mean buffer without a CAN frame in it, but one where the buffer number is higher than any frame in the controller. What will tend to happen under peak bus load is that the buffer number will ‘walk’ up to the top one, and then when that buffer contains a low priority frame, the task will spin on a sequence of CAN transmit events for ages waiting for it to be put into the controller (this could be tens or hundreds of milliseconds on a busy CAN bus). In essence, this couples priority inversion on the CAN bus to task execution, causing a task to overrun and fail in unpredictable ways. And it’s an intermittent problem that probably won’t appear in testing. Obviously this is a terrible driver design.

It may look like I am picking on the Zephyr Project, but far from it: I’ve come across it in many other CAN drivers too, even in a self-driving car system where the steering depended on sending a CAN frame with a short latency!

So how is priority inversion solved? What is required to solve it? In short, the CAN frame queueing in the CAN driver must be by CAN ID value, using the same ordering as on the CAN bus. For CAN controllers that support a large number of messages then this is usually as simple as flipping a bit to ask the controller to arbitrate internally this way (the FlexCAN and the bxCAN controllers by default come out of reset in this mode!).

For CAN controllers with only three buffers then the software drivers must keep a bigger priority queue in memory and then shuffle the frames back and forth to keep the highest priority three frames in the hardware. All these controllers provide an an abort mechanism so that a low priority CAN frame in a hardware buffer can be thrown out and replaced by a new higher priority frame. The CAN controllers are generally designed to help with this: the bxCAN has a CODE field and the FlexCAN has a LPTM field to indicate which of the transmit buffers is the lowest-priority one.

Note that the common Microchip MCP2515 standalone CAN controller has three transmit buffers, but it doesn’t arbitrate internally by CAN ID (there is a 2-bit priority field for each buffer, but this won’t work because the priorities would need to be atomically re-ordered when a new frame is put in the controller). It’s simpler to just use a single buffer and use the abort mechanism when a low-priority frame needs to be pre-empted.

One final point worth discussing is why people are so attracted to FIFO queueing of CAN frames (even to the point of disabling the default priority queueing in CAN controllers)? One answer to this is segmented messages: quite a few CAN layers define ways to map a message longer than 8 bytes into a sequence of 8-byte CAN frames, and this nearly always is done by transmitting a sequence of CAN frames with the same ID. It’s obviously vital that these frames go into FIFO order because otherwise the bigger message is scrambled. The problem with queueing by CAN priority is that under nearly all priority queueing implementations (both hardware and software), two internal frames with the same ID are entered into arbiration in an undefined transmission order (for example, the FlexCAN hardware uses an internal hardware scanning state machine that loops around all the buffers looking for the lowest priority frame). The solution to this is not, of course, to make all CAN frames go in FIFO order, but to make only frames for a segmented message go in FIFO order with respect to each other. This is actually straightforward to implement using a higher-level FIFO driver, that creates a FIFO queue of frames and then hands them to the CAN driver one-by-one as each previous frame is sent.

So to summarise:

  • Priority inversion is a big deal, it’s an intermittent problem that might escape testing but could be catastrophic when deployed
  • Priority inversion happens on CAN
  • It is fixed by having the hardware arbitrate internally by CAN ID
  • If there aren’t enough buffers in hardware then the driver should implement them in software using the abort feature of the CAN hardware and not just push the problem up to the application (which will inevitably just use a FIFO queue)
  • Where FIFO sending is important (e.g. segmented messaging) it should be done by a specific higher level FIFO queueing system that creates separate FIFO queues and puts the head item of each FIFO into the priority queue.

UPDATE 2020/07/01:

  • I have rewritten the wording on CPU execution being coupled to priority inversion to make it clear the task is spinning on a transmit event rather than the CPU itself spinning in a loop
  • As a courtesy I reported this priority inversion issue to the Zephyr Project; it was subsequently reclassified from “bug” to “enhancement” because not suffering from priority inversion is apparently a ‘nice to have’ feature of an RTOS designed for safety critical systems

UPDATE 2020/07/07:

  • Added a demo of CAN priority inversion happening on real CAN hardware