When creating an application a common approach to managing application computer bandwidth is to structurally isolate the portions of the application that are doing the data processing from those implementing the data acquisition. The goal of this segmentation is to prevent the operation of the one from impacting the speed of the other. But if we think about what we learned from the posts preceding this one, we can see that this use case is just a specialized case of a general principle. Namely, that allowing processes to run in parallel reduces the likelihood that their execution will impede the operation of their peers by improving the overall utilization of the computer’s resources. Consequently, it behooves us to consider how we can allow data processing to run in parallel with the rest of the application’s code — and so garner a few of the many advantages that this approach offers.
Parallel Data Processing
It first should be noted that we already have most of the tools and concepts that we will need to build this parallel data processing capability. In fact, one possible implementation is not so different from our general approach to building state machines. So with that idea in mind, let’s consider our perspective data processor’s high-level functional requirements.
Just the One?
First, since the idea is to improve efficiency by allowing more things to happen in parallel, we need to ask ourselves the obvious question, “Why stop at just one data processor?” This question is particularly urgent if you already know that your data processing takes longer than the time required to acquire another dataset. It is an appealing idea to have 2 or more data processors that can share the processing load. However, if you are going to be running multiple instances simultaneously, the process has to be reentrant. But that’s not a problem, we’ve done reentrant processes before and know how to make that work.
Second, we need to consider the data processor’s internal structure. I have given this point away already by talking about state-machines, so let’s consider why this choice is a good one. At first glance it might seem like using a state-machine might be overkill for this application, after all you might figure that there is at most 2 states:
Wait for Data and
Process Data. This line of reasoning is valid, but it assumes that the data processing is a one-step process, so are there circumstances where this assumption isn’t valid?
If your processing is going to involve the complex analysis of potentially large datasets, you might not want to spend a lot of time processing flawed or invalid data. Hence before jumping right into the full-blown analysis it would be reasonable to have a state that does a quick sanity check on the data — and then another to handle the error that results if the data is bad. Next, consider what you are doing with the data when you are done with the analysis. There could be added limit checking of the results, and perhaps transfer of the results to a database or automated report generator — all tasks that could (and probably should) be expressed as separate states. Looking back now, by my count we are now up to 6 states — and we are still assuming that the analysis process itself is a single atomic operation that you will not want or need to breakup.
My point is that if you start with the basic outline of a state-machine and end up never having more than the two states, there’s no harm done because that basic structure is pretty lean. However if, as your project progresses, you discover that more states are needed, having that infrastructure already be in place can save a lot of time.
By the way, as a quick aside, I have often had the experience of having a customer request a significant change and then have them be amazed in how little time it takes to implement the modification. I have even had customers comment on how “lucky” I was that the code was structured in such a way that the change was even possible. Well let me tell you in the strongest terms possible that if luck was involved, it was luck of my own making. The better you design your code up front, the more “lucky” you will be when the time comes to make modifications.
The Right Kind of Communications
Third, we have to look at how we are going to communicate with these cloned state-machines. In past work on our testbed application, we have used UDEs and FGVs for communications between processes, and they might work here as well, but there would be problems. If you have one UDE or FGV feeding data to all the cloned data processors, you would have the problem of deciding which clone handles the each new update. While you don’t want to miss any updates, you certainly don’t want to process the same dataset twice either. The whole process would be fraught with opportunities for errors and race conditions. Of course you could solve that problem by creating a separate UDE or FGV for each clone, but then scalability goes down the tubes as the process(es) generating the data must keep track of how many clones there are any which one will get the next dataset.
Our way out of this conundrum is to properly apply a structure that people so often misuse: The Queue. This is the use case for which queues were born. The key feature that makes them so good in this sort of situation, is that if you have multiple receivers waiting to dequeue data from the same queue, LabVIEW guarantees that each new item inserted will only be seen by one of the receivers, and that enqueued data will be distributed to all the receivers on a first come, first served, basis. Moreover, because queues can be named, different processes running in the same instance of LabVIEW don’t need the queue reference to be sent to them. All they need to know is the name of the queue and they can get their own reference.
Yes, queues do still require polling, but at least there’s only one part of your code involved, and not the whole application. Moreover, there are two other mitigating factors present. First, in this sort of application the poll rate can often be much slower, on the order of once every few seconds. Second, because the state machine resides inside an event structure, it would take but a moment to implement logic that would allow the polling to be turned off altogether.
Let’s Look at the Test Code
As we start considering the code to implement this functionality, let’s first look at the test routines that we will use to evaluate the queue functionality. The VI
Data Processing Queue Handler.vi has the responsibility of starting everything off by initializing the data processing queue and launching a predetermined number of data processing clones. This logic takes place in the VI’s
Timeout event handler.
You will notice that to simplify the process of obtaining a queue reference, I have encapsulated that logic in a set of subVIs for interacting with the queue. The other event of significance is the handler for the
Check Queue Size UDE. While a test is running, we want to be able to monitor the number of items in the queue, but rather than simply polling the queue status, I created a UDE that flags the handler every time a new item is enqueued.
When the event fires, the handler calls the built-in queue function that returns, in addition to some other stuff that we don’t need, the number of items that are currently in the queue. Next, to feed data to the queue, I created a second test VI called
Queue Test.vi. It’s whole job is to wait a delay period specified on the front panel, and then enqueue an item into the data processing queue.
You can see that at this point, the only value in the queue data is a timestamp, but the data is defined as a typedef. Remember! Any time you are creating a datatype that will be accessed by reference, whether it be a UDE, a queue, a notifier or an event, always make the data structure a typedef.
After the data value is enqueued, the code fires the event that tells the handler to check the queue size.
Introducing the Data Processor
Turning finally to the reentrant data processing state-machine (
Data Processor.vi) we see that in addition to an event for stopping, the VI’s
Timeout event handler includes the logic for three states, the first of which is
This state’s job is to get the clone ready to start processing data. Consequently, it initializes the shift register holding the queue reference, and a second shift-register carrying a boolean value that we will discuss in a moment. The state also sets the next state to be executed to
Check for Data, and retains a timeout value of 0 ms so, assuming that there are no errors during initialization, the state machine will immediately start waiting for data to process. Note that the
Initialize state also opens the VI’s front panel. You probably would not want this feature in deliverable code except as, perhaps, a debugging option. I have included it here to make it easier for you to see the code at work.
Check for Data state starts by calling the queue subVI that is responsible for dequeuing an item. Inside this subVI, the dequeuing function is given a timeout of 0 ms so if there is not any data immediately available, the call will terminate with the timeout flag asserted. This Boolean value is inverted and passed out of the subVI to indicate to the calling code whether there is any data that needs processing. If the
Check for Data state logic finds this bit set, the code sets the next state for execution to
Process Data and sets the timeout value of 0 ms. If the bit indicating that data is available is not set, the code retains
Check for Data as the next state to execute but sets the timeout to a longer value (5 sec) read from the application’s INI file.
Before we go on to talk about the
Process Data state, we need to have a quick conversation about the boolean shift register. Normally, when the standard
Stop Application event fires, a VI wants to immediately stop what it’s doing and abort. However in one significant way, this is not a “normal” VI. In order to protect the data that has been acquired, this process should only stop if the queue driving it is empty. To create that functionality, the VI incorporates deferred shutdown logic in the form of this shift-register. Because the value is initialized to false the loop will, during normal operation, continue regardless of whether data is available or not. However, when a shutdown is requested, the event handler does not immediately stop the loop, but instead sets the shift-register value to true and branches to the
Check for Data state with a 0 ms timeout. If the queue is empty, the process will end at that time. However, if the queue is not empty, the VI will continue toggling between the
Check for Data and
Process Data states until the queue contents are exhausted.
As you would expect, the
Process Data state basically consists of processing the last data dequeued and branching back to the
Check for Data state to look for more. However, given that the only data in our test queue is a timestamp, you have probably guessed that the actual data processing to be done isn’t very expansive — and you’re right. In fact, the “data processing” consists largely of a wait, the duration of which varies at random between 4 and 6 seconds.
Putting it to the Test
To test this code, open and run
Data Processing Queue Handler.vi. You will immediately see the front panels of three data processing clones open. Move them so they aren’t overlapping each other or anything else.
Now open and run
Queue Test.vi. After a few seconds it will enqueue an item and then enqueue a new one every 6 seconds. Note that as each clone handles an item it will display that item’s timestamp on its front panel. Note also that the indicated queue depth never exceeds 1.
Now change the delay on
Queue Test.vi from 6 sec to 2 sec. You will notice that the queue depth chart is now updating faster. Likewise, the queue depth will begin to show momentary increases to a depth of two, but the chart will always drop back to 1. In other words, there might be slight delays now and again, but for the most part three clones can keep up with the flow of data.
Finally, drop the delay to 1.5 seconds. With data coming at this rate, the queue depth will continue to go up and down, but now it is always going up more than it is going down. This overall upward trend shows us that we have reached the point where three clones are getting overwhelmed by amount of data that is being enqueued.
Now if you increase the delay back to 2 seconds, the queue depth will gradually begin to decrease as the slower flow of new data allows the clones to begin catching up on the backlog. Alternatively, if you just click the stop button, the two test VIs will stop and close immediately, but the clones will continue running until the queue empties out.
Parallel Data Processing — Release 1
The Big Tease
So we have learned the basics behind creating an environment for an application that supports an expandable data processing capability. For many applications this simple structure will be more than adequate, but (as we have seen) if the data starts coming too fast the queue can grow without limit. Of course this isn’t necessarily a problem if the periods of high data generation are interspersed with periods of comparative idleness. However this sort of variability can be a bit of a two-edged sword. The periods of low data throughput can give the system time to recover from a large backlog of data, but this variability can also make it difficult to estimate how many copies of the data processing process will be needed. Pick a number that is too high, and you’re wasting computer resources. Pick a number that is too low and you could still end up with a situation where too much data is being queued — perhaps to the point of running out of memory.
Well, the next time we get together we’ll look at how to modify the basic structure we have created thus far to add the ability for the software to decide on the fly when more clones are needed, and when to kill off existing ones that aren’t being used.
Until Next Time…