DSPRelated.com
Blogs

Blocking == Technical Debt

Miro SamekOctober 25, 20246 comments

Blocking occurs every time a program waits in line for something to happen. For instance, the basic Arduino "Blink" example turns the LED on and calls the delay() function to wait for a timeout event in 1000 milliseconds. Then it turns the LED off and calls delay() to wait in line for another timeout event in 1000 milliseconds. Performed in a loop, this ends up blinking the LED.

Blocking in Arduino programming is accomplished by busy waiting. But it is not the only form. Blocking based on context-switching is the cornerstone of every RTOS (Real Time Operating System). For example, a call to the FreeRTOS vTaskDelay() blocking function allows other RTOS threads to execute while the original thread is delayed. A single thread can make multiple calls to vTaskDelay() or any other blocking RTOS primitive (e.g., xSemaphoreTake()). Still, the RTOS blocking mechanism "remembers" precisely where in the arbitrarily complex thread's code sequence to return to after unblocking. The RTOS charges a hefty price for this capability in the form of a whole private stack for each thread and an elaborate context switch, but the convenience is considered well worth it. In fact, the intuitive sequential programming paradigm enabled by blocking is the main argument for using a traditional RTOS in the first place.

Video "To block or not to block, that is the question!"

Blocking as Technical Debt

Sequential programming based on blocking is simple and intuitive, but every blocking call incurs some technical debt that borrows initial expediency in exchange for increased development costs later.

The problem with the sequential paradigm is that it hard-codes the blocking calls, which means that the code can handle only the hard-coded sequence of events. This might work initially, but you inevitably discover other events and event sequences that must also be managed during ongoing development. Weaving the new events into the existing hard-coded blocking call structure becomes progressively difficult. The problem is not just with sequencing but also timing because the blocking calls clog the control flow.

An RTOS makes the sequential paradigm more extensible because RTOS allows you to create multiple "super-loops" (called threads or tasks). Additional threads can block and wait in line for additional hard-coded events independently from the existing threads, thus relieving the timing and, to some degree, the sequencing problem. However, the added threads often need to share resources with the existing threads, which require protection to prevent race conditions and other concurrency hazards. RTOS provides mutual exclusion mechanisms (e.g., mutex). However, such mechanisms are often also based on blocking, so they only exacerbate the blocking problem. In this sense, RTOS doubles down on blocking. The following diagram shows the perils of blocking.

Restricting Blocking as the Best Practice

The perils of blocking are widely recognized even without an RTOS. For instance, the basic Arduino "Blink" tutorial contains a recommendation for the "Blink Without Delay" example based on the non-blocking Arduino millis() function.

However, the constipation problem caused by blocking is much more acute in an RTOS. Here, the concurrency experts recommend drastically restricting blocking by structuring threads as event loops with a single blocking call to a message queue at the top and non-blocking code after that [1,2]. I presented this event-driven approach in the following YouTube video:


RTOS as Technical Debt

Even though a traditional blocking RTOS can be used to implement event loops for otherwise non-blocking, asynchronous Active Objects, it is an awkward fit.

Slide: Paradigm Shift from RTOS (blocking) to RTEF (non-blocking)

As shown in the Venn diagram above, a traditional RTOS does not provide the mechanisms necessary for extensible, event-driven Active Objects, which must be supplied externally to the RTOS. At the same time, most RTOS services are based on blocking and, therefore, are useless and outright counterproductive for the event-driven paradigm. In that sense, the entire traditional RTOS can be considered a source of technical debt.

Blocking vs. Preemption

Embedded developers often conflate the RTOS's ability to manage blocking threads with preemptive multitasking. However, these concepts are independent. Preemptive but non-blocking multitasking has been known for decades and is used extensively, for example, in the automotive OSEK/VDX kernel [3]. The immensely popular ARM Cortex-M also implements preemptive, non-blocking multitasking in the NVIC Nested Vectored Interrupt Controller). The NVIC allows preemption of prioritized interrupts, which all nest on a single stack (the main stack). The NVIC is an example of the Stack Resource Policy (SRP) implemented directly in hardware [4]. Please see my "Super-Simple Tasker" videos from the Embedded Online Conference:

Video: Super-Simple Tasker -- The Hardware RTOS for ARM Cortex-M, Part-1

Video: Super-Simple Tasker -- The Hardware RTOS for ARM Cortex-M, Part-2

The advantage of preemptive, non-blocking kernels is that they are adequate for non-blocking Active Objects, while being much more efficient than blocking kernels, yet still fully compatible with the Rate-Monotonic Scheduling/Analysis (RMS/RMA) method.

End Notes

Many embedded developers believe that the venerable "superloop" and traditional blocking RTOS are the only alternatives for embedded software architecture. However, other paradigms and efficient implementations exist, such as the QP real-time embedded frameworks [5,6].

The sequential paradigm enabled by blocking might be intuitive and expedient, but the question is how well it matches reality. Most real-life embedded systems must handle multiple sequences of events, so they are poorly served by software with hard-coded sequences. Therefore, choosing a more flexible, reusable, non-blocking paradigm often pays off, even if it means abandoning the sequential model. Remember: adding blocking to a non-blocking code is easy. Removing blocking calls sprinkled throughout the code is like trying to "unscramble" scrambled eggs.

[1] David Cummings, "Managing Concurrency in Complex Embedded Systems," Workshop on Cyber-Physical Systems, 2010

[2] Herb Sutter, "Prefer Using Active Objects Instead of Naked Threads," Dr. Dobbs Journal, 2010

[3] OSEK/VDX, "OSEK/VDX Operating System Version 2.2.3," OSEK/VDX website, 2005

[4] T. P. Baker, A Stack-Based Resource Allocation Policy for Realtime Processes, IEEE, 1991

[5] Quantum Leaps, QP/C Real-Time Embedded Framework, GitHub

[6] Quantum Leaps, QP/C++ Real-Time Embedded Framework, GitHub


[ - ]
Comment by MatthewEshlemanNovember 5, 2024

Agreed. We must "embrace the async."


[ - ]
Comment by aleph_fiveNovember 10, 2024

Encountered with this problem at my work. Have a device with lots of peripheral and functionalities (BLE scan and peripheral device, USB, GPS module, charge control, polling state of charge IC, other microcontroller connected by UART, all of this multiplied by my inexperience). Used legacy blocking modules with FreeRTOS written by my mentor due to timing. As a result, there was no RAM left. So I had to rewrite some modules in event-driven manner with state machines. Ironically, in pursuit of the goal of doing it quickly, I spent more time.

Sequential programming seems simple only at first glance. In my experience, the majority of embedded activities are event-driven by nature. In my opinion, the sooner a novice developer understands this, the easier it will be to design complex systems with ever-increasing functionality. It's easier to combine several modules in one thread if there is a need. With event-driven approach dynamic memory allocation in embedded doesn't seem as evil. It takes more resources to allocate stack for task than to create an event queue.

By the way, thank You for YouTube "State Machines" playlist! At first glance, hierarchical state machines seemed to me something far from real applications, especially in embedded systems. But it was worth it to spend part of my vacation studying this topic in detail.

[ - ]
Comment by QLNovember 10, 2024

Thanks for sharing your experience.

My observation that "blocking is technical debt" is not a theoretical speculation but rather comes from my personal experience as well. At some point in my career, I spent over a year removing blocking from the software driver for a GPS chipset. The driver communicated with the chipset via a UART, so it was inundated with the usual synchronous request-reply communication, where the software would send a message to the chipset and wait (block) in line for the reply. Sounds familiar?

The approach required either busy-polling for several milliseconds for every instance of such exchange or using a blocking RTOS primitive (e.g., a semaphore). The polling solution would essentially tie up the host CPU (the marketing claimed that our chipset requires only 3Mips maximum). The RTOS solution would force the end-users to provide an RTOS and a dedicated thread to run our driver, which was even more problematic. On top of this, the blocking driver required the send()/receive() primitives as well as time() primitive to request the current time from the system. 

In contrast, a non-blocking solution that emerged after refactoring the previous blocking design involved several state machines, but it had minimal external requirements. The whole interface consisted only of a non-blocking gps_service() operation, which the host CPU had to call occasionally with really loose timing requirements. The host CPU passed a UART buffer with the bytes received from the GPS chipset and the current time. The operation returned the status, which told the host CPU whether there were any bytes to send to the GPS chipset. That way, our "driver" didn't need to know how to send bytes, receive bytes, or request time. More importantly, it was easy to integrate with whatever software architecture the host CPU used: a "superloop," an RTOS thread, or whatever else they wanted. It also consumed less than the advertised 3Mips of the host CPU.

But this was over a year of my life. So, yes, I know from experience that removing blocking is like trying to "unscramble" scrambled eggs.

[ - ]
Comment by sprite4November 1, 2024

I would take polling+state machines over event driven programming anytime!

It's a lot more robust! I don't care if it uses a bit more power.

[ - ]
Comment by QLNovember 1, 2024
I'm not sure what you understand under "polling+state machines," but polled state machines (a.k.a., input-driven state machines) are precisely NOT robust. The problem is with *race conditions* around the guards so extensively used in such state machines. I've devoted a whole video to the subject: State Machines Part-3: Input-Driven State Machines
[ - ]
Comment by MatthewEshlemanNovember 5, 2024

sprite4 stated: "I would take polling+state machines over event driven programming anytime! It's a lot more robust! I don't care if it uses a bit more power."


I would love to hear more on this.

To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.

Please login (on the right) if you already have an account on this platform.

Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: