Laying the good foundation, with TCP…

In case you are just joining the conversation, we are in the midst of a project to modify the testbed application that we have been slowly assembling over the past year. I would heartily recommend that you take some time and review the past posts.

To this point in our latest expansion project, we have created a remote control interface, embedded it in our testbed and performed some basic testing to verify that the interface works. Our next step is to create the first of several “middleware” processes. I call them middleware because they sort of sit between our application’s basic code and the external applications and users. In future installments we will look at middleware for .NET, ActiveX and WebSockets, but we will start with a more fundamental interface: TCP/IP.

The Roadbed for the Information Highway

Aside from giving me the opportunity to air out some tired metaphors, TCP/IP is a good place to start because it gives us the opportunity to examine the protocol that underlies a host of other connection options.

Just the basics

Although the idea of creating a “server” can have a certain mystique, there really isn’t much to it really – at least when you are working with LabVIEW. The underlying assumption to the process is that there is something monitoring the computer’s network interface waiting for a client application to request a TCP/IP connection. In network parlance, this “something” is called a “listener” because listens to the Ethernet interface for a connection. However a given listener isn’t simply listening for any connection attempt, rather the network standards define the ability to create multiple “ports” on a single interface, and then associate particular ports with particular applications. Thus, when you create the listener you have to tell it what port that it is to monitor. In theory, a port number can be any U32 value, but existing standards specify what sorts of traffic is expected on certain port numbers. For example, by default HTTP connections are expected on ports 80 or 8080, port 21 is the default for FTP and LabVIEW by default listens to port 3363. All you have to do is pick a number that isn’t being used for anything else on your computer. To create a listener in LabVIEW, there is a built-in function called TCP Create Listener. It expects a port number, and returns a reference to the listener that it creates – or an error if you pick a port to which some other application is already listening.

Once you have created the listener, you have to tell it to start listening by calling the built-in function TCP Wait On Listener. As its name implies, it waits until a connection is made on the associated port, though you will typically want to specify a timeout. When this function sees and establishes an incoming connection it outputs a new reference specific to that particular connection. A connection handler VI can then use that reference to manage the interactions with that particular remote device or process.

Finally, when you are done with your work, you kill the server by closing the listener reference (TCP Close Connection), and all connection references that you have open. Put these three phases together, you come up with something like this.

The Simplest TCP Server

This simple code creates a listener, waits for a connection, services that connection (it reads 4 bytes from it), and then quits. While this code works, it isn’t really very useful. For example, what good is a server that only waits for one connection and then quits? Thankfully, it’s not hard to expand this example. All you have to do is turn it into a mini state-machine.

One Step at a Time

As usual, the state-machine is built into the timeout event so the following states include a shift register pointing to the next state to be executed, and a second one carrying the delay that the code will impose before going to that state. But before we get into the specifics, here’s a state diagram showing the process’ basic flow.

State Diagram

Execution starts with the Initialize Listener state. It’s main job at this point is to create the TCP listener. Next is the aptly named state, Wait for Connection. It patiently waits for a connection by looping back to itself with a short timeout. As long as there is no connection established, this state will execute over and over again. This series of short waits gives other events (like the one for shutting down the server) a chance to execute.

When a connection is made, the machine transitions to the Spawn Handler state. Since it is critical that the state machine gets back to waiting for a new connection as soon as possible, this state dynamically launches a reentrant connection handler VI and immediately transitions back to the Wait for Connection state.

The state machine continues ping-ponging between these last two states until the server is requested to stop. At that point, the code transitions to the Close Listener state which disposes of the listener and stops the state machine. So let’s look at some real code to implement these logical states – which, by the way resides in a new process VI named TCP-IP Server.vi.

The Initialize Listener State

This state at present only executes once, and its job is to create the listener that initiates connections with remote clients. The TCP Create Listener node has two inputs, the first of which is the port that the listener will monitor. Although I could have hard-coded this number, I instead chose to derive this value from the application’s Server.Port property. In a standalone executable, the application reads this value from its INI file at start-up, thus making it reconfigurable after the application is deployed. If the server.tcp.port key does not exist in the INI file, the runtime engine defaults to LabVIEW’s official port number, 3363.

Initialize Listener

When running in the development environment, this value is still reconfigurable, but it is set through the My Computer target’s VI Server settings. To change this value, right-click on My Computer in the project explorer window and select Properties. In the resulting dialog box, select the VI Server Category. At this point, the port number field is visible in the Protocols section of the VI Server page, but it is disabled. To edit this value, check the TCP/IP box to enable the setting, make the desired change and then uncheck the TCP/IP box, and click the OK button. It is critical that you remember to uncheck TCP/IP before leaving this setting. If you don’t, the project will be linked to the specified port and the TCP server in the testbed application will throw an error 60 when it tries to start.

The other input to the TCP Create Listener node is a timeout. However, this isn’t the time that the node will wait to finish creating the listener. We will be testing this code on a single computer and so don’t have to worry about such things as the network going down – even momentarily. In the broader world, though, there are a plethora of opportunities for things to go wrong. For example, the network could go south while a client is in the middle of connecting to our server. This timeout addresses this sort of situation by specifying the amount of time that the listener will wait for the connection to complete, once a connection attempt starts.

The Wait for Connection State

This state waits for connection attempts, and when one comes, completes the connection. Unfortunately, LabVIEW doesn’t support events based on a connection attempt so this operation takes the form of a polling operation where the code checks for a connection attempt, and if there is none, waits a short period of time and then checks again. The short wait period is needed to give the process as a whole the chance to respond to other events that might occur.

Wait for Connection

The logic implementing this logic starts with a call to the built-in TCP Wait On Listener node with a very short (5-msec) timeout. If there is no connection attempt pending when the call is made, or an attempt is not received during that 5-msec window, the node terminates with an error code 56. The following subVI (Clear Errors.vi) looks for, and traps that error code so its occurrence can be used to decide what to do next. If the subVI finds an error 56, the following logic repeats the current state and sets the timeout to 1000-msec. If there is no error, the next state to be executed is Spawn Handler and the timeout is 0.

If there is a successful connection attempt, the TCP Wait On Listener node also outputs a new reference that is unique to that particular connection. This new reference is passed to a shift-register that makes is available to the next state.

The Spawn Handler State

In this state, the code calls a subVI (Launch Connection Handler.vi) that spawns a process to handle the remote connection established in the previous state. This connection handler takes the form of a reentrant VI that accepts two inputs: a reference to a TCP connection and a boolean input that enable debugging operations – which at the current time consists of opening the clone’s front panel when it launches, and closing it when it closes.

Spawn Handler

It is important that the connection handler be a reentrant process for two reasons: First, we want the code to be able to handle more than one connection at a time. Second, the listener need to get back to listening for another new connection as quickly as possible. We’ll discuss exactly what goes into the connection handler in a bit.

The Close Listener State

Finally, when the process is stopping, this event closes open connections, sets the timeout to -1, and stops the event loop.

Close Listener

But why are there two connections to be closed? Doesn’t the connection handler that gets launched to manage the remote connection handle closing that reference? While that point is true, the logic behind it is flawed. There is a small, but finite, delay between when the remote connection is completed and the Spawn Handler starts executing. If the command to stop should occurring during that small window of time, the handler will never be launched, and so can’t close that new connection and its associated reference.

Turning States into a Plugin

Now that we have an understanding of the process’ basic operation, we need to wrap a bit more logic around it to turn it into usable code.

Adding Shutdowns and Error Handling

To begin with, if this new process is going to live happily inside the structure we have already defined for testbed application plugins, it is going to need a mechanism to shut itself down when the rest of the application stops. Since that mechanism is already defined, all we have to do is register for the correct event (Stop Application) and add an event handler to give it something to do.

Loop Shutdown

Nothing too surprising here: When the shutdown event fires, the handler sets the next state to be executed to Close Listener and the timeout to 0. Note that it does not actually stop the loop – if it did the last state (which closes the listener reference) would never get the chance to execute. Finally, we also need to provide for error handling…

Add in Error Handling

…but as with the shutdown logic, this enhancement basically consists of adding in existing code. In this case, the application’s standard error reporting VI.

Defining the Protocol

With the new middleware plugin ensconced happily in the testbed framework, we need to create the reentrant connection handler that will handle the network interactions. However, before we can do that we need to define exactly what the communications protocol will look like. In later posts, I will present implementations of a couple of standardized protocols, but for now let’s explore the overall communications process by “rolling our own”.

As a quick aside, you may have noticed that I have been throwing around the word “protocol” a lot lately. Last time, I talked about creating a safe protocol for remote access. Then this time we discussed the TCP protocol, and now I am using the word again to describe the data we will be sending over out TCP connection. A key concept in networking is the idea of layers. We have discussed the TCP protocol for making connections, but that isn’t whole story. TCP is build on top of a lower-level protocol called IP – which is itself built on even lower level protocols for handling such things as physical interfaces. In addition, this protocol stack can also extend upwards. For example, VI Server is at least partially built on top of TCP, and we are now going to create our own protocol that will define how we want to communicate over TCP.

This layering may seem confusing, but it offers immense value because each layer is a modular entity that can be swapped out without disrupting everything else. For example, say you swap out the NIC (Network Interface Card) in your computer, the only part of the stack that needs to change are the very lowest levels that interface to the hardware.

The first thing we need to do is define the data that will be passed back and forth over the connection, and how that data will be represented while it is in the TCP communications channel. Taking the more basic decision first, let’s look at how we want to represent the data. Ideally, we want a data representation that is flexible in terms of capability, is rigorous in its data representations and easy to generate in even primitive languages like C++. The first standard that was created to fill this niche was a spin-off of HTML called XML. The problem is that while it excels in the first two points, the third is a problem because when used to encode small data structures the same features that make it incredibly flexible and rigorous, conspire to make it is very verbose. Or to put it another way, for small data structures the data density in an XML document is very low.

Fortunately, there is an alternative that is perfect for what we need to do: JSON. The acronym stands for “JavaScript Object Notation”, and as the name implies is the notation originally used to facilitate the passing of data within JavaScript applications. The neat part is that a lot of the JSON concepts map really well to native LabVIEW data structures. For example, in terms of datatypes, you can have strings, numbers and booleans, as well as arrays of those datatypes. When you define a JSON object, you define it as a collection of those basic datatypes – sort of like what we do with clusters in LabVIEW. But (as they say on the infomercials), “Wait there’s more…” JSON also allows you include other JSON objects in the definition of a new object just LabVIEW lets us embed clusters within clusters. Finally, to put icing on the cake, nearly every programming language on the planet (including LabVIEW) incorporates support for this standard.

To see how this will work, let’s consider the case of the temperature controller parameters. When wanting to configure this value, the remote application will need to send the following string: (Note: As with JavaScript itself, the presence of “white space” in JSON representations is not significant. I’m showing this “pretty-printed” to make it easier to understand.)

{
    "Target":"Dog House TC", 
    "Data":{
        "Error High Level":100,
        "Warning High Level":90,
        "Warning Low Level":70,
        "Error Low Level":60,
        "Sample Interval":1
    }
}

This string defines a JSON object that contains two items. The first is labeled Target and it holds a string identifying the specific plugin that it is wanting to configure – in this case the Dog House TC. In the same way, the name of the second item is labeled Data, but look at its value! If you think that looks like another JSON object definition, you’d be right. This sub-object has 5 values representing the individual parameters needed to configure a temperature controller. In case you’re wondering, this is what the code looks like that parses this string back into a LabVIEW data structure:

Unflattening JSON

That’s right, all it takes is one built-in function and one typedef cluster. The magic lies in the fact that the string and the cluster represent the exact same logical structure so it is very easy for LabVIEW’s built-in functions to map from one to the other.

The Unflattened JSON Data

The other thing to note is that the Sample Interval value in the cluster has a unit associated with it, in this case milliseconds. The way LabVIEW handles this situation is consistent with how it handles units in general: When converting data to a unitless form (like a JSON value) it expresses the value using the base unit for the type of data that it is. In the example shown, Sample Interval is time, and the base unit for time is seconds, so LabVIEW expresses the 1000 msecs as 1 sec in JSON. Likewise when unflattening the string back to a LabVIEW data structure, the function interprets the input value in the value’s base units as defined in the cluster.

We are about done with what our message will look like, but there are still a couple of things we need to add before we can start shooting our data down a wire. To begin with, we need to remember that Ethernet is a serial protocol and as so it’s much easier to uses if a receiver can know ahead of time how much data to be expecting. To meet that need, we will append a 2-byte binary value that is the total message. The other thing we need is someway to tell whether the message arrived intact and without corruption, so we will also append a 2-byte CRC. Moreover, to make the CRC easy for other applications to generate we will use a standard 16-bit CCITT form of the calculation. So this is what one of our command data packets will look like:

Message Format

In the same way, we can use the same basic structure for response messages. All we have to do is redefine the JSON “payload” as a JSON object with two objects: a numeric error code (where 0 = “No Error”), and a string that is a contains any data that the response needs to return. As you would expect, this string would itself be another JSON-encoded data structure.

Creating the Connection Handler

We are finally ready to implement the reentrant command handler that manages these messages, and the important part of that job is to ensure that it is fully reentrant. By that I mean that it does little good to make the VI itself reentrant if its major subVIs are not. So what is a “major” subVI? The two things to consider are:

  • How often does the SubVI execute? If the subVI rarely executes or only runs once during initialization, it might not be advantageous to make it reentrant.
  • How long does it take to execute? In the same way, subVIs that implement simple logic and so execute quickly, might not provide a lot of benefit as reentrant code.

As I am wont to do, I defined the handler’s overall structure as a state machine with three states corresponding to the three phases of the response interaction. So the first thing we need to do (and the first state to be executed) is Read Data Packet. Its job is to read an entire message from the new TCP connection, test it for validity and, if valid, pass the command on to the Process Command state.

Read Data Packet

The protocol we have defined calls for each message to start with a 2-byte count, so the state starts by reading two bytes from the interface, casting the resulting binary value to U16 number and then using that number to read the remainder of the message. Then to validate the message, the code performs a CRC calculation on the entire message, including the CRC at the end. Due to the way the CRC calculation works, if the message and CRC are valid, the result of this calculation will always be 0. Assuming the CRC checks, the code strips the CRC from the end of the string and sends the remaining part of the string to a subVI that converts the JSON object into a LabVIEW object. I chose an object-oriented approach here because it actually simplifies the code (no case structures) and it provides a clear roadmap of what I need to do if I ever decide to add more interface commands in the future. If the CRC does not check, the next state to execute is either Send Response if no error occurred during the network reads, or Stop Handler if there was.

Moving on, the Process Command state calls a dynamic dispatch method (Process Command.vi) that is responsible for interfacing to the rest of the application through the events we defined last time, and formatting a response to be sent to the caller. The object model for this part of the code has 5 subclasses (one for each command) and the parent class is used as the default for when JSON command structure does not contain a valid command object. It should surprise no one that the command processing subclass methods look a lot like the test VIs we created last time to verify the operation of the remote access processor, consequently, I am not going to take the time or space to present them all again here. However I will highlight the part that makes them different:

Parsing Response for Error or Data

This snippet shows the logic that I use to process the response coming back from the remote access engine in response to the event that reads the graph data. Because the variant returned in the response notifier can be either a text error message, or an array of real data, the first thing the code does is attempt to convert the variant into a string. If this attempt fails and generates an error, we know that the response contains data and so can format it for return to the remote caller. If the variant converts successfully to a string, we know the command failed and can pass an error back to the caller.

At this point, we now have a response ready to send back to the caller, so the state machine transitions to the Send Response state. Here we see the logic that formats and transfers the response to the caller:

Send Response

Since the core of the message is a JSON representation of a response cluster, the code first flattens the cluster to a JSON string. Note however, the string that it generates contains no extraneous white space, so it will look different very from the JSON example I showed earlier. The logic next calculates the length of the return message and the CRC of the JSON. Those two values are added to the beginning and end, respectively, of the JSON string and the concatenated result is written back to the TCP connection.

Finally, the Stop Handler state closes the TCP connection and stops the state machine loop, which also stops and removes from memory the reentrant clone that has been running.

Testing the Middleware

Finally, as always we need to again test what we have done, and to do that I have written a small LabVIEW test client program. However, if you know another programming language, feel free to write a short program to implement the transactions that we have defined. The program I created is included as a separate project. The top-level VI opens a window that allows you to select the action you want to perform, the plugin that it should target and (if required) enter the data associated with the action. Because this is a test program, it also incorporates a Boolean control that forces an invalid CRC, so you can test that functionality as well.

So open both projects and run the testbed application, nothing new here – or so it seems. Now run the simple TCP client, its IP address and port number are correct for this test scenario. As soon as the client starts, the waveform graph for displaying the plugin graph data appears, so let’s start with that. You should be able to see the data from each of the 5 testbed plugins by selecting the desired target and clicking the Send Command button. You should also be able to see all 5 graph images.

Now try generating some errors. Turn on the Force CRC Error check-box and retry the tests that you just ran successfully. The client’s error cluster should now show a CRC Error. Next turn the Force CRC Error check-box back off and try doing something illegal, like using the Set Acquisition Rate action on one of the temperature controllers. Now you should see an Update Failed error.

Continue trying things out, verifying that the thing which should work do, and that the things that shouldn’t work, don’t. If you did the testing associated with the last post, you will notice that there is a lag between sending the command and getting the results, but that is to be expected since you are now running over a network interface. Finally, assuming the network is configured correctly, and the desired ports are open, the client application should be able to work from a computer across the room, across the hall or across the world.

Testbed Application – Release 19
Toolbox – Release 16
Simple TCP Client – Release 1

Big Tease

So what is in store for next time? Well, let’s extend things a bit further and look at a way to access this same basic interface, but this time from a web browser! Should be fun.

Until Next Time…
Mike…

Expanding Data Processing Bandwidth — Automatically

Well-written software can typically deal with any performance requirement pretty easily as long as the requirement is constant. It’s when requirements change over time that things can get dicey. For example, if your test system generates a new data packet to process every second and it consistently takes 5 seconds for the data to be processed, a little simple math will tell you how much processing bandwidth you need to create to keep up with the flow of data. But how are you to properly size things when variability is inserted into the process? What if the time between data packets can vary between 100 msec and several minutes? Or what if the data processing time can change dramatically due to things like network traffic?

These are the kind of situations where the processing needs to be more than simply “flexible”, it has to be able to automatically maintain its own operation and reconfigure itself on the fly. To demonstrate one possible implementation of this “advanced” technique, we will build on the simple pieces that we have learned in the past. In some ways, good software design techniques are like Lego blocks. Each one by itself is not very impressive, but when you stick them together, magic happens. But before we can stick anything together, we need to understand…

…what we’re going to do.

You’ll notice that any of the bandwidth management challenges that I mentioned earlier can be addressed by either adding more data processing clones, or removing existing ones that are being underutilized. Consequently, the question of how to implement this self-maintenance functionality really gets down to a matter of how to dynamically manage the number of data processing clones that are currently available — which in turn boils down to answering two very simple questions:

1. How do we know we need more?

Given that the whole point of the exercise is to manage a queue, the current state of that queue will give us all the information that we need to answer this question. Specifically, we can know when more processing bandwidth is needed by monitoring how many items are currently in the queue waiting processing. When the code starts to see the depth going steadily up, it can launch additional processes to handle the data backlog. Of course, this functionality assumes that there is a process that is constantly monitoring the queue and managing that aspect of its operation — which we actually have already in the test code from last week (Data Processing Queue Handler.vi). All we have to do is repurpose this VI to be a permanent part of the final application.

2. How do we know a clone is no longer needed?

The one part of the system that knows whether a clone is being under-utilized is, in fact, the clone itself. As a part of its normal operation, it knows and can keep track of how often is it being used. Having said that, there are (at least) a couple of ways to quantify how much a clone is being utilized. We could, for instance, consider how much time the clone is spending processing data versus how much time it spends waiting to receive data to process. If the utilization percentage drops below a given limit, the clone could then shut itself down. However, for this demonstration, I’m going to use a much simpler criteria that, quite frankly, works pretty well. The code will simply keep track of how many times in a row it goes to the queue and doesn’t find any data.

Code Modifications

Before I start describing the changes that will we will need to make in order to fashion this new ability, I want to consider for a moment the thing that won’t have to change: Queue Test.vi. You might be tempted to say, “Well big deal. All it does is stuff some data into the queue every so often. Who cares if it doesn’t have to change? It’s not even deliverable code”

While that is undoubtedly true, the fact of the matter is that this test routine is important, but not because of what it is or what it does. Rather we care about Queue Test.vi because in our little test environment, it represents the rest of our application — or at least that part of it that is generating data. Consequently, the fact that it doesn’t need modification means that your main application, likewise, won’t need modification if you decide to upgrade from a data processing environment that uses a fixed number of data processors to one that dynamically manages itself.

Data Processing Queue Handler.vi

First, note that previously this routine’s primary job was to simply report how deep the data queue was — a bit of functionality that would likely have not been needed in a real application. Now however, this routine is going to be taking an active part in the process, so I started the modifications by adding the error reporting VI that will transfer errors it generates to the exception handler.

New Queue Handler Timeout Case

In addition, because the software will initially only start a single data processing clone, I also modified the timeout event handler that performs the VI’s initialization, by removing the loop around the clone launching subVI.

The remainder of the modifications to this routine occurs in the event handler for the Check Queue Size UDE. Previously this event only reported how deep the queue was getting. While it still performs that function, that queue depth information in addition now drives the logic that determines whether we have enough processing bandwidth online.

New Queue Handler Queue Size Check Case

Note that the queue depth is compared to a new configuration value called Max Queue Size that defines how large the queue can grow before a new data processing clone is launched. Regardless of whether it launches a new clone, event handler calls another new subVI that returns the number clones that are currently running. As you will see in a moment, one of the modifications to the data processing VI is the addition of logic that keeps track of the names of the clones that are running. The subVI that we are calling here returns a count of the number of names that have been recorded so far.

Data Processor.vi

Turning now the data processing code itself, the first stop is in the state-machine’s Initialize state. Here we have all the same logic that existed before, but with a couple minor additions

New Data Processor Initialize

First, there is a new subVI that registers a clone is starting up. This subVI writes the clone’s name to the FGV that is maintaining the clone count. Second, there is also a new shift register carrying a cluster of internal data that clone will need to do its work. All that is needed during initialization is to set a timestamp value. The Check for Data state is next and it has likewise seen some tweaks — the most significant of which is moving the logic for responding to the dequeue operation into a subVI.

New Data Processor Check for Data

The justification for this move lies in the fact that this logic is now also responsible for determining whether or not the clone is being adequately utilized. As I stated before, each time the clone goes to the queue and comes up empty, the logic will increment a counter that is being carried in the new shift register’s data. If this count exceeds a new configuration value Clone Idle Count, the code will branch to a new state that will shutdown the clone. Likewise, anytime the clone does get data to process, it will reset the count to 0. The changes to the Process Data state, which comes next, are pretty trivial.

New Data Processor Process Data

All that happens here is that the timestamp extracted from the data to be “analyzed” updates the new cluster data — as well as the indicator on the front panel. Finally, there is the new state: Self Shutdown

New Data Processor Self Shutdown

…which simply calls a subVI that removes the clone’s name from FGV maintaining a list of all running clones, and stops the event loop.

Let’s talk about “Race Conditions”

All we have left to do now is test this work and see the differences that it makes, but before we can do that, we need to have a short conversation about race conditions. Very often developers and instructors (myself included) will talk about the necessity of avoiding race conditions. The dirty little secret is that as long as you have multiple things happening in parallel, race conditions will always be present. The real point that these admonitions attempt to make is that you should avoid the race conditions that are unrecognized and potentially problematic.

I bring this point up because as you do the following testing, you may get the chance to see this concept in action. The way it will appear is that the system will launch a new clone immediately after one kills itself off for being underutilized. The reason for this apparent logical lapse is that a race condition exists between the part of the code that is checking to see if another clone is needed and the several places where the clones are deciding whether or not they are being used. There are two causes for this race condition, one we can ameliorate a bit and one over which we have no possible control.

Starting with the cause we can’t control, a simple immutable law of nature is that no matter how sophisticated our logic or algorithms might be, they can not see so much as a nanosecond into the future. Consequently, the first source of a race condition is that when the queue checking logic sees that there are three elements enqueued, it has no way of knowing that a currently active clone will be available in a few milliseconds. While it is true that under certain circumstances it might be possible to provide this logic with a bit of “foresight”, there is no generalized solution to this aspect of the problem. Consequently, this is an issue that we may just have to live with.

The news, however, is better for the second cause. Here the problem is that with all the clones having the same timeout between data checks, it is probable that sooner or later one of the clones is going to become “synchronized” with the others such that it is always checking the queue just after it was emptied by one of the others. However the solution to this problem lies in its very definition. The cure is to see to it that the clones do not have constant timeouts from one check to the next. To implement this concept in our test code I modified the routine that returns the delay to add a small random difference that changes each time it’s called.

The bottom line is that while completely removing all race conditions is not possible, they can be managed such that their impacts are minimized.

New Tests for New Code

The testing of the modified code starts the same as it did before: open and launch Data Processing Queue Handler.vi and Queue Test.vi. The first difference that you will notice is that only 1 clone is initially launched, but at the default data rate, 1 clone is more than adequate.

Now decrease the delay between data packets to 2 seconds. Here the queue depth will bounce around a bit but the clone count will stabilize at 3 or 4.

Finally, take the delay all the way down to 1 second. Initially the clone count may shoot up to 8 or 9, but on my system the clone count eventually settled down to 6 or 7.

At this point, you can begin increasing the delay again and the slowly the clones will start dropping out from disuse. Before you shutdown the test, however, you might want to set the delay back to 2 seconds and leave the code running while you go about whatever else you have to do today. It could be instructive to notice how other things you are doing on the same computer effect the queue operation. You might also want to rerun the test but start Queue Test.vi first and let it run for a minute or so before you startData Processing Queue Handler.vi — just to see what happens.

Further Enhancements?

So we have our basic scalable system completed, but are there things we could still do to improve its operation? Of course. For example, we know that due to timing issues which we can only partially control, the number of clones that is running at one time can vary a bit, even if the data rate is constant. One thing that could be done to improve efficiency would be to change the way clones are handled. For example, right now a data processor is either in memory and running or it is closed. One thing you could do is create a new state that a clone could be in — like loaded into memory, but inactive. You could implement this zombie state by setting the timeout to 0, thus effectively turning off the state machine.

It might also be helpful to change to queue depth limit at which a new clone is created by making it softer. Instead of launching a new clone anytime the queue depth exceeds 3, it might be useful in some situations to maintain a running average and only create a new clone if the average queue depth over the last N checks is greater than 3.

Who knows? Some of you might think of still other modifications and enhancements. The point is to experiment and see what works best for your specific application.

Parallel Data Processing — Release 2

The Big Tease

So what is in store for next time? Well in the past we have discussed how to dynamically launch and use VIs that run as separate processes. But what if the code you want to access dynamically like this happens to exist in a process that you have already compiled into a standalone application? If this application is working you don’t want to risk breaking something by modifying it. As it turns out there are ways to manage and reuse that code as well, even if it was created in an older version of LabVIEW. Next time we’ll start exploring how to do it.

Until Next Time…

Mike…

Expandable Data Processing

When creating an application a common approach to managing application computer bandwidth is to structurally isolate the portions of the application that are doing the data processing from those implementing the data acquisition. The goal of this segmentation is to prevent the operation of the one from impacting the speed of the other. But if we think about what we learned from the posts preceding this one, we can see that this use case is just a specialized case of a general principle. Namely, that allowing processes to run in parallel reduces the likelihood that their execution will impede the operation of their peers by improving the overall utilization of the computer’s resources. Consequently, it behooves us to consider how we can allow data processing to run in parallel with the rest of the application’s code — and so garner a few of the many advantages that this approach offers.

Parallel Data Processing

It first should be noted that we already have most of the tools and concepts that we will need to build this parallel data processing capability. In fact, one possible implementation is not so different from our general approach to building state machines. So with that idea in mind, let’s consider our perspective data processor’s high-level functional requirements.

Just the One?

First, since the idea is to improve efficiency by allowing more things to happen in parallel, we need to ask ourselves the obvious question, “Why stop at just one data processor?” This question is particularly urgent if you already know that your data processing takes longer than the time required to acquire another dataset. It is an appealing idea to have 2 or more data processors that can share the processing load. However, if you are going to be running multiple instances simultaneously, the process has to be reentrant. But that’s not a problem, we’ve done reentrant processes before and know how to make that work.

State-ly Expandability

Second, we need to consider the data processor’s internal structure. I have given this point away already by talking about state-machines, so let’s consider why this choice is a good one. At first glance it might seem like using a state-machine might be overkill for this application, after all you might figure that there is at most 2 states: Wait for Data and Process Data. This line of reasoning is valid, but it assumes that the data processing is a one-step process, so are there circumstances where this assumption isn’t valid?

If your processing is going to involve the complex analysis of potentially large datasets, you might not want to spend a lot of time processing flawed or invalid data. Hence before jumping right into the full-blown analysis it would be reasonable to have a state that does a quick sanity check on the data — and then another to handle the error that results if the data is bad. Next, consider what you are doing with the data when you are done with the analysis. There could be added limit checking of the results, and perhaps transfer of the results to a database or automated report generator — all tasks that could (and probably should) be expressed as separate states. Looking back now, by my count we are now up to 6 states — and we are still assuming that the analysis process itself is a single atomic operation that you will not want or need to breakup.

My point is that if you start with the basic outline of a state-machine and end up never having more than the two states, there’s no harm done because that basic structure is pretty lean. However if, as your project progresses, you discover that more states are needed, having that infrastructure already be in place can save a lot of time.

By the way, as a quick aside, I have often had the experience of having a customer request a significant change and then have them be amazed in how little time it takes to implement the modification. I have even had customers comment on how “lucky” I was that the code was structured in such a way that the change was even possible. Well let me tell you in the strongest terms possible that if luck was involved, it was luck of my own making. The better you design your code up front, the more “lucky” you will be when the time comes to make modifications.

The Right Kind of Communications

Third, we have to look at how we are going to communicate with these cloned state-machines. In past work on our testbed application, we have used UDEs and FGVs for communications between processes, and they might work here as well, but there would be problems. If you have one UDE or FGV feeding data to all the cloned data processors, you would have the problem of deciding which clone handles the each new update. While you don’t want to miss any updates, you certainly don’t want to process the same dataset twice either. The whole process would be fraught with opportunities for errors and race conditions. Of course you could solve that problem by creating a separate UDE or FGV for each clone, but then scalability goes down the tubes as the process(es) generating the data must keep track of how many clones there are any which one will get the next dataset.

Our way out of this conundrum is to properly apply a structure that people so often misuse: The Queue. This is the use case for which queues were born. The key feature that makes them so good in this sort of situation, is that if you have multiple receivers waiting to dequeue data from the same queue, LabVIEW guarantees that each new item inserted will only be seen by one of the receivers, and that enqueued data will be distributed to all the receivers on a first come, first served, basis. Moreover, because queues can be named, different processes running in the same instance of LabVIEW don’t need the queue reference to be sent to them. All they need to know is the name of the queue and they can get their own reference.

Yes, queues do still require polling, but at least there’s only one part of your code involved, and not the whole application. Moreover, there are two other mitigating factors present. First, in this sort of application the poll rate can often be much slower, on the order of once every few seconds. Second, because the state machine resides inside an event structure, it would take but a moment to implement logic that would allow the polling to be turned off altogether.

Let’s Look at the Test Code

As we start considering the code to implement this functionality, let’s first look at the test routines that we will use to evaluate the queue functionality. The VI Data Processing Queue Handler.vi has the responsibility of starting everything off by initializing the data processing queue and launching a predetermined number of data processing clones. This logic takes place in the VI’s Timeout event handler.

Queue Handler Timeout Case

You will notice that to simplify the process of obtaining a queue reference, I have encapsulated that logic in a set of subVIs for interacting with the queue. The other event of significance is the handler for the Check Queue Size UDE. While a test is running, we want to be able to monitor the number of items in the queue, but rather than simply polling the queue status, I created a UDE that flags the handler every time a new item is enqueued.

Queue Handler Queue Size Check Case

When the event fires, the handler calls the built-in queue function that returns, in addition to some other stuff that we don’t need, the number of items that are currently in the queue. Next, to feed data to the queue, I created a second test VI called Queue Test.vi. It’s whole job is to wait a delay period specified on the front panel, and then enqueue an item into the data processing queue.

Queue Test

You can see that at this point, the only value in the queue data is a timestamp, but the data is defined as a typedef. Remember! Any time you are creating a datatype that will be accessed by reference, whether it be a UDE, a queue, a notifier or an event, always make the data structure a typedef.

After the data value is enqueued, the code fires the event that tells the handler to check the queue size.

Introducing the Data Processor

Turning finally to the reentrant data processing state-machine (Data Processor.vi) we see that in addition to an event for stopping, the VI’s Timeout event handler includes the logic for three states, the first of which is Initialize.

Data Processor Initialize

This state’s job is to get the clone ready to start processing data. Consequently, it initializes the shift register holding the queue reference, and a second shift-register carrying a boolean value that we will discuss in a moment. The state also sets the next state to be executed to Check for Data, and retains a timeout value of 0 ms so, assuming that there are no errors during initialization, the state machine will immediately start waiting for data to process. Note that the Initialize state also opens the VI’s front panel. You probably would not want this feature in deliverable code except as, perhaps, a debugging option. I have included it here to make it easier for you to see the code at work.

Data Processor Check for Data

The Check for Data state starts by calling the queue subVI that is responsible for dequeuing an item. Inside this subVI, the dequeuing function is given a timeout of 0 ms so if there is not any data immediately available, the call will terminate with the timeout flag asserted. This Boolean value is inverted and passed out of the subVI to indicate to the calling code whether there is any data that needs processing. If the Check for Data state logic finds this bit set, the code sets the next state for execution to Process Data and sets the timeout value of 0 ms. If the bit indicating that data is available is not set, the code retains Check for Data as the next state to execute but sets the timeout to a longer value (5 sec) read from the application’s INI file.

Before we go on to talk about the Process Data state, we need to have a quick conversation about the boolean shift register. Normally, when the standard Stop Application event fires, a VI wants to immediately stop what it’s doing and abort. However in one significant way, this is not a “normal” VI. In order to protect the data that has been acquired, this process should only stop if the queue driving it is empty. To create that functionality, the VI incorporates deferred shutdown logic in the form of this shift-register. Because the value is initialized to false the loop will, during normal operation, continue regardless of whether data is available or not. However, when a shutdown is requested, the event handler does not immediately stop the loop, but instead sets the shift-register value to true and branches to the Check for Data state with a 0 ms timeout. If the queue is empty, the process will end at that time. However, if the queue is not empty, the VI will continue toggling between the Check for Data and Process Data states until the queue contents are exhausted.

Data Processor Process Data

As you would expect, the Process Data state basically consists of processing the last data dequeued and branching back to the Check for Data state to look for more. However, given that the only data in our test queue is a timestamp, you have probably guessed that the actual data processing to be done isn’t very expansive — and you’re right. In fact, the “data processing” consists largely of a wait, the duration of which varies at random between 4 and 6 seconds.

Putting it to the Test

To test this code, open and run Data Processing Queue Handler.vi. You will immediately see the front panels of three data processing clones open. Move them so they aren’t overlapping each other or anything else.

Now open and run Queue Test.vi. After a few seconds it will enqueue an item and then enqueue a new one every 6 seconds. Note that as each clone handles an item it will display that item’s timestamp on its front panel. Note also that the indicated queue depth never exceeds 1.

Now change the delay on Queue Test.vi from 6 sec to 2 sec. You will notice that the queue depth chart is now updating faster. Likewise, the queue depth will begin to show momentary increases to a depth of two, but the chart will always drop back to 1. In other words, there might be slight delays now and again, but for the most part three clones can keep up with the flow of data.

Finally, drop the delay to 1.5 seconds. With data coming at this rate, the queue depth will continue to go up and down, but now it is always going up more than it is going down. This overall upward trend shows us that we have reached the point where three clones are getting overwhelmed by amount of data that is being enqueued.

Queue Overrun

Now if you increase the delay back to 2 seconds, the queue depth will gradually begin to decrease as the slower flow of new data allows the clones to begin catching up on the backlog. Alternatively, if you just click the stop button, the two test VIs will stop and close immediately, but the clones will continue running until the queue empties out.

Parallel Data Processing — Release 1

The Big Tease

So we have learned the basics behind creating an environment for an application that supports an expandable data processing capability. For many applications this simple structure will be more than adequate, but (as we have seen) if the data starts coming too fast the queue can grow without limit. Of course this isn’t necessarily a problem if the periods of high data generation are interspersed with periods of comparative idleness. However this sort of variability can be a bit of a two-edged sword. The periods of low data throughput can give the system time to recover from a large backlog of data, but this variability can also make it difficult to estimate how many copies of the data processing process will be needed. Pick a number that is too high, and you’re wasting computer resources. Pick a number that is too low and you could still end up with a situation where too much data is being queued — perhaps to the point of running out of memory.

Well, the next time we get together we’ll look at how to modify the basic structure we have created thus far to add the ability for the software to decide on the fly when more clones are needed, and when to kill off existing ones that aren’t being used.

Until Next Time…

Mike…

How to Make Dynamic SubVIs

The last two posts discussed different ways to use dynamic linking and calling in situations where you want the target VI to run in parallel with the rest of the application. Another major use case for dynamic calling is to create software plugins.

I use the term “plugin architecture” to indicate a technique that strives to simplify code by facilitating runtime changes to limited portions of the code while allowing the basic logic for the function as a whole to remain intact and unchanged. For example, say you are implementing a control algorithm that has, for the sake of argument, two inputs: a position and a load. As long as the readings are accurate, timely and in the proper units, most of the control algorithm doesn’t care how these inputs are measured. It would, therefore, simplify the code base as a whole to incorporate a technique that allows the reuse all the common code by simply “plugging in” different acquisition modules that support the various different types of sensors that can measure position and load.

This isn’t rocket science

This goal might sound lofty, but the hurdle for getting into it is actually pretty low. In fact, when teach the LabVIEW Core 1 and 2 week, I will sometimes introduce what LabVIEW can do by demonstrating a simple plugin architecture that only uses concepts introduced and discussed in those two classes.

I start the demo by opening an application that I have written to implement a simple calculator. It has two numeric inputs labeled “X” and “Y” and an indicator labeled “Output”. The only other control is a popup menu that lists the two math operations it can perform: addition and subtraction. I first show that you can use the simple program to add and subtract numbers.

I then comment that it would be nice if my program could multiply numbers as well. So I drag the program window to one side but leave it running. I then open a new window and add two inputs and an output (labeled like the program). On the block diagram I drop down a multiplier, wire it up, and save the VI with the name Multiply.vi. After closing that VI’s front panel, I tell the class that although they just watched me create the ability to multiply numbers, the program which is already running has already acquired it. At this point, I pull the program front panel back over and show that the menu which only moments before said “Add” and “Subtract”, now has a third option: “Multiply”. Moreover, the new selection does indeed allow me to now multiply numbers. Finally, I make the point that this capability to dynamically expand and change functionality was created using nothing but things that they will be learning in the coming week.

While the students at this point might not be cheering, they are awake and have some motivation to learn. You better believe that when I show them the code Friday afternoon and they can recognize how the program works, they are excited. So let’s see what I can do now to motivate and excite you…

It’s all about the packaging

As we begin to get into the following use cases you should notice that the actual code needed to implement each solution is not really very complicated. If fact, in the following sections we will be dealing with many of the same functions as when we were dynamically launching separate processes. The tricky part here is that we need to fit all the data management and launch logic in a space that would otherwise be occupied by a single VI. For this reason we need to be very careful when packaging this code. To demonstrate what I mean, let’s consider three common use cases that all call a simple test VI (called amazingly enough Test.vi) that simply returns a random number. I have ordered these cases such that each example builds on what went before:

System Initialization

I like to start here when explaining these concepts because it is the simplest structure and so is easy to understand.

System Initialization

Here you see laid out all the basic pieces required to dynamically call a subVI. The first VI (the one marked “Dummy”) is the one we will use through-out this post to represent the data management needed to get the path to the VI that will be called dynamically. Depending upon your application this could come from an INI file, database, or what have you. We won’t go into a lot of detail with that VI because we have discussed the various techniques pretty thoroughly in other posts.

You should recognize the next node (Open VI Reference) as we have already used it a lot. It opens a reference to the VI specified in the input VI name or path. This action also has the effect of loading the VI into memory. Just as statically subVIs can be reentrant, you can specify it here, as well, if needed.

Once the VI is open, we pass its reference to a node that looks a lot like the Start Asynchronous Call we have already used. This node is Call by Reference and like the prior one it expects a strict VI reference and in return provides a connector pane for passing data to the target. However, unlike the asynchronous call, this node always waits until the target finishes executing before continuing, hence its representation of the target’s connector pane also allows us to get data from the target.

Finally, in order to remove the target VI from memory, we need to close the reference when we are done with it — which is what the last node does.

When using this technique it is important to keep in mind that, because it loads, executes and unloads the VI all at once, it is very inefficient when used is a situation where it will be run multiple times. Sort of like trying to use an Express VI inside a loop — but worse.

However, one place where this technique can be used very effectively is doing things like system initialization. Its not uncommon for large and complex applications to require an equally large and complex initialization process, especially the first time they are run. Although you could just write a VI to do this initialization and install it in the program, why should you burden your program with a bunch of code that may only execute one time — ever? This technique allows you to only load and execute a VI if you actually need it.

Function Substitution

This use case is perhaps the most common of the three. It uses the same nodes we just saw, but because it can be used essentially anywhere, the packaging has to be very efficient. A solution I often use is to create a mini state-machine that has states for loading, executing and unloading the target VI, as well as one more for deciding what to do first. The user interface has one (enumerated) control that allows you to specify whether you want to Load the target, Run the target, or Unload the target.

The inputs are pretty simple, and the state-machine logic mirrors that simplicity. If the function requested is Load, the Startup state transitions directly to the Open VI state which opens and buffers the VI reference. Here is the code for these two states:

Function substitution - Startup - Load

Function substitution - Open VI

If the function requested is Run (the input’s default value, by the way), the Startup state first checks to see if the VI reference is valid and if it is, transitions to the Execute state.

Function substitution - Startup - Run

Function substitution - Execute

If the reference is not valid, execution instead continues with the Open VI state which returns to the Execute state after opening the reference.

Finally, if the function requested is Unload, the Startup state transitions directly to the Close VI state:

Function substitution - Startup - Unload

Function substitution - Close VI

The result of all this work is a flexible VI that can be used in a variety of different ways depending upon system requirements. For example, if the target VI loads quickly or the calling process doesn’t have tight timing constraints, you can just install it and let the first call to both load the VI into memory and run it. Alternately, if you don’t want the first call to be delayed by that initial load, you can call it once in a non-time-critical part of the code to just Load the target and then Run it as many times as you like later.

In the same way, the Unload function, which normally isn’t needed, can be used to release the memory resources used by a large dynamic subVI when you know that you aren’t going to be needing it for a while.

This approach could even be extended to create a simple test executive that dynamically loads and unloads whole sets of VIs. In such a situation, though, you probably don’t want to be tied to a single connector pane, so you should consider changing the VI reference to non-strict and modifying the Execute state to use our old friend the Run VI method like so.

Function substitution - Execute - Run VI Method

Interprocess Communication

This use case is in many ways the most expansive of the three because it supports communications between different processes regardless of their physical location. In terms of code, there is very little difference between this case and the previous one. In fact, the only change that is really required is to the Load state. This is what the network-enabled version looks like:

Function substitution - Open VI - IPC

That new icon in front of Open VI Reference is the one that makes the magic. Its name is Open Application Reference and its job is to return a reference to the instance of LabVIEW that is hosting the VI that you want to access. To make this connection, you need to know the host-name or IP address of the computer running the LabVIEW application, and the port number the instance is using for incoming connections. If the application that you are wanting to access in on your own computer, you can either leave the host name string empty or use the name localhost. The port can be any number that isn’t already being used. One common people mistake is to simply accept the default value, which is the port that LabVIEW listens to by default. This causes problems when they go to test their application because now there are two processes (the LabVIEW development environment and their application) trying to use the same port.

An important point to remember is that while the Call by Reference node passes data back and forth, the called VI actually executes on the remote system. Hence, this sort of operation works independent of the target system’s operating system or even the version of LabVIEW that it is running.

The tasks for setting up a system to use this technique involves properly configuring the application to which you are going to be connecting — though it often doesn’t require any code changes. This amazing benefit is possible because the underlying networking is handled by the runtime engine, not the application code you write. I have on a couple occasions come into a place and added remote control functionality to an old application that was not even designed with that capability in mind.

You will, however, have to make changes to the target application’s INI file to enable networking, And you, obviously, will need to have access to the application’s source code so you know what VIs to call. Likewise, if you are accessing a computer outside your local network there can be a variety of network communications details to sort out that are beyond the scope of this post. One other thing to bear in mind is that it is always easier to connect to a remote VI if it is already in memory. The reason for this fact is that if the VI is already in memory all you need to know is the VI’s name. If you are also loading it into memory, you have to know the complete path — the annotation of which can change between platforms.

But assuming you establish the connection, what exactly can you do with it? The answer to that lies in what VIs you choose to execute from the remote process. If you execute a VI that fires an event, that event gets fired on the remote system, so you can use it as a channel for controlling that application.

Alternately, say you chose to execute a VI that is a function global variable (FGV). Depending on whether you are reading from or writing to the FGV, you are now either passing data to the remote system, or collecting data from it. By the way, this is the method I still typically use for passing data over a network between LabVIEW applications — not network-enabled shared variables. Unlike this later “enhancement”, the dynamic calling of a FGV doesn’t need to be “deployed”, is more memory efficient, has a smaller code footprint, is far easier to troubleshoot if there is a problem, and works across all releases of LabVIEW back to Version 6.

The only real limit to what you can do with this technique is your imagination.

So there are the three major use cases for the dynamic linking of subVIs. This discussion is by no means a complete consideration of the topic, but hopefully it will whet your appetite to experiment a bit. The link below is to an archive containing all three examples to get you started.

Dynamic Linking Examples

Next time: Command Line Parameters are not a relic of the past.

If you have spent any time poking into some of the more esoteric corners of the application builder, you may have noticed a checkbox in the Advanced section labeled, “Pass all command line arguments to application”, and wondered what that is all about. Well, wonder no more, that little checkbox is what we are going to discuss next time out — and while we’re at it I’ll cover a really neat use for it.

Until next time…

Mike…

Building a Proper LabVIEW State Machine Design Pattern – Pt 2

Last week’s post was rather long because (as is often the case in this work) there was a lot we had to go over before we could start writing code. That article ended with the presentation of a state diagram describing a state machine for a very simple temperature controller. This week we have on our plate the task of turning that state diagram into LabVIEW code that implements the state machine functionality.

The first step in that process is to consider in more detail the temperature control process that the state diagram I presented last week described in broad terms. By the way, as we go through this discussion please remember that the point of this exercise is to learn about state machines — not temperature control techniques. So unless your application is to manage the internal temperature of a hen-house, dog-house or out-house; don’t use this temperature control algorithm anywhere.

How the demo works

The demonstration is simulating temperature control for an exothermic process — which is to say, a process that tends to warm or release heat into the environment over time. To control the temperature, the system has two resources at its disposal: an exhaust fans and a cooler. Because the cooler is actively putting cool air into the area, it has a very dramatic effect on temperature. The fan, on the other hand, has a much smaller effect because it is just reduces the heat through increased ventilation.

When the system starts, the state machine is simply monitoring the area temperature and as long as the temp is below a defined “High Warning Level” it does nothing. When that level is exceeded, the system turns on the fan, as shown by the fan light coming on. In this state, the temperature will continue to rise, but at a slower rate.

Eventually the temperature will, exceed the “High Error Level” and when it does, the system will turn on the cooler (it has a light too). The cooler will cause the temperature to start dropping. When the temperature drops below the “Low Warning Level” the fan will turn off. This action will reduce the cooling rate, but not stop it completely. When the temperature reaches the “Low Error Level”, the cooler will turn off and the temperature will start rising again.

So let’s look at how our state machine will implement that functionality.

“State”-ly Descriptions

As I stated last week, the basic structure is an event structure with most of the state machine functionality built into the timeout case. To get us started with a little perspective, this is what the structure as a whole looks like.

State Machine Overview

Obviously, from the earlier description, this state machine will need some internal data in order to perform its work. Some of the data (the 4 limit values and the sample interval) is stored in the database, while others are generated dynamically as the state machine executes. Regardless of its source, all of this data is stored in a cluster, and you can see two of the values that it contains being unbundled for use. Timeout is the timeout value for the event structure and is initialized to zero. The Mode value is an enumeration with two values. In the Startup case is the logic that implements startup error checking and reads the initial setup values from the database. When it finishes, it sets Mode to its other value: Run. This is where you will find the bulk of the state machine logic. Note that while I don’t implement it here, this logic could be expanded to provide the ability to do such things as pause the state machine.

The following sections describe the function of each state and shows the code that implements it.

Initialization

This state is responsible for getting everything initialized and ready to run.

Initialization State

In a real system, the logic represented here by a single hardware initialization VI would be responsible for initializing the data acquisition, verifying that the system is capable of communicating with the fan and cooler, and reading their operational states. Consequently, this logic might be 1 subVI or there might be 2 or 3 subVIs. The important point is to not show too much detail in various states. Use subVIs. Likewise, do not try to expand on the structure by adding multiple initialization states.

Finally, note that while the selection logic for the next state may appear to be a default transition, it isn’t. The little subVI outside the case structure actually creates a two-way branch in the logic. If the incoming error cluster is good, the incoming state transitions (in this case, Read Input) is passed through unmodified. However, if an error is present, the state machine will instead branch to the Error state where the problem can be addressed as needed.

Read Input

This state is responsible for reading the current temperature (referred to as the Process Variable) and updating the state machine data cluster.

Read Input State

In addition to updating the last value, there are a couple other values that it sets. Both of these values relate to how the acquisition delay is implemented in the state machine. The first of these is the Timeout value, and since we want no delay between states, we set this to zero. The other value is the Last Sample Time. It is a timestamp indicating when the reading was made. You’ll see in a bit how these values are used.

This state also updates two front panel indicators, the graph and a troubleshooting value.

Test Fan Limits

This state incorporates a subVI that analyzes the data contained in the state machine data to determine whether or not the fan needs to change state.

Test Fan Limits State

The selector in this state results in what is the potential for a 3-way branch. If a threshold has been crossed, the next state will be Set Fan, if it has not, the next state will be Test Cooler Limits, and if an error has occurred, the next state will be Error.

Set Fan

Since the fan can only be on or off, all this state needs to do is reverse its current operating condition.

Set Fan State

In addition to the subVI toggling the fan on or off, the resulting Fan State is unbundled from the state machine data and the value is used to control the fan LED.

Test Cooler Limits

This state determines whether the cooler needs to be changed. It can also only be on or off.

Test Cooler Limits State

The logic here is very similar to that used to test the fan limits.

Set Cooler

Again, not unlike the corresponding fan control state, this state changes the cooler state and sets the cooler LED as needed.

Set Cooler State

Acquisition TO Wait

This state handles the part of the state machine that is in often the Achilles Heel of an implementation. How do we delay starting the control sequence again without incurring the inefficiency of polling?

Acquisition TO Wait State

The answer is to take advantage of the timeout that event structures already have. The heart of that capability is the timeout calculation VI, shown here:

Calculate Time Delay

Using inputs from the state machine data, the VI adds the Sample Interval to the Last Sample Time. The result is the time when the next sample will need to be taken. From that value, the code subtracts the current time and converts the difference into milliseconds. This value is stored back into the state machine data as the new timeout. This technique is very efficient because it requires no polling.

But wait, you say. This won’t work if some other event happens before the timeout expires! True. But that is very easy to handle. As an example, here is the modified event handler for the save data button.

Handling Interrupting Events

It looks just as it did before, but with one tiny addition. After reading, formatting and saving the data, the event handler calls the timeout calculation VI again to calculate a new timeout to the intended next sample time.

Error

This state handles errors that occur in the state machine. That being the case, it is the new home for our error VI.

Error State

Deinitialize

Finally, this state provides the way to stop the state machine and the VI running it. To reach this state, the event handler for the UDE that shuts down the application, will branch to this state. Because it is the last thing to execute before the VI terminates, you need to be sure that it includes everything you need to bring the system to a safe condition.

Deinitialize State

With those descriptions done, let’s look at how the code works.

The Code Running

When you look at the new Temperature Controller screen you’ll notice that in addition to the graph and the indicators showing the states of the fan and cooler, there are a couple of numbers below the LEDs. The top one is the amount of elapsed between the last two samples, the bottom one is the delay calculated for the acquisition timeout.

If you watch the program carefully as its running, you’ll notice something a bit odd. The elapsed time indicator shows a constant 10 seconds between updates (plus or minus a couple of milliseconds — which is about all you can hope for on a PC). However, the indicator showing the actual delay being applied is never anywhere near 10,000 milliseconds. Moreover, if you switch to one of the other screens and the back, the indicated delay can be considerably less than 10,000 milliseconds, but the elapsed time never budges from 10 seconds. So what gives?

What you are seeing in action is the delay recalculation, we talked about earlier. In order to better simulate a real-world system, I put a delay in the read function that pauses between 200- and 250-msec. Consequently when execution reaches the timeout calculation, we are already about 1/4 of a second into the 10 second delay. The calculation, however, automatically compensates for this delay because the timeout is always referenced to the time of the last measurement. The same thing happens if another event comes between successive data acquisitions.

On Deck

As always, if you have any questions, concerns, gripes or even (gasp!) complements, post ’em. If not feel free to use any of this logic as you see fit — and above all, play with the code see how you might modify it to do similar sorts of things.

Stay tuned. Next week we will take a deeper look at something we have used before, but not really discussed in detail: Dynamically Calling of VIs. I know there are people out there wondering about this, so it should be fun.

Until next time…

Mike…

Building a Proper LabVIEW State Machine Design Pattern – Pt 1

The other day I was looking at the statistics for this site and I noticed that one of the most popular post with readers was the one I wrote on the producer/consumer design pattern. Given that level of interest, it seemed appropriate to write a bit about another very common and very popular design pattern: the state machine. There’s a good reason for the state-machine’s popularity. It is an excellent, often ideal, way to deal with logic that is repetitive, or branches through many different paths. Although, it certainly isn’t the right design pattern for every application, it is a really good technique for translating a stateful process into LabVIEW code.

However, some of the functionality that state machines offer also means they can present development challenges. For example, they are far more demanding in terms of the design process, and consequently far less forgiving of errors in that process. As we have seen before with other topics, even the most basic discussion of how to properly design and use state machines is too big for a single post. Therefore, I will present one post to discuss the concepts and principles involved, and in a second post present a typical implementation.

State Machine Worst Practices

For some reason it seems like there has been a lot of discussions lately about state machine “best practices”. Unfortunately, some of the recommendations are simply not sound from the engineering standpoint. Meanwhile others fail to take advantage of all that LabVIEW offers because they attempt to mimic the implementation of state machines in primitive (i.e. text-based) languages. Therefore, rather than spinning out yet another “best practices” article, I think it might interesting to spend a bit of time discussing things to never do.

In describing bad habits to avoid, I think it’s often a good idea to start at the most general level and work our way down to the details. So let’s start with the most important mistake you can make.

1. Use the state machine as the underlying structure of your entire application

State machines are best suited for situations where you need to create a complex, cohesive, and tightly-coupled process. While an application might have processes within it that need to be tightly-coupled, you want the application as a whole to exhibit very low levels of coupling. In fact, much of the newest computer science research deprecates the usage of state machines by asserting that they are inherently brittle and non-maintainable.

While I won’t go that far, I do recognize that state machines are typically best suited for lower-level processes that rarely, if ever, appear to the user. For example, communications protocols are often described in terms of state machines and are excellent places to apply them. One big exception to this “no user-interface” rule is the “wizard” dialog box. Wizards will often be built around state machines precisely because they have complex interface functionality requirements.

2. Don’t start with a State Diagram

Ok, so you have a process that you feel will benefit from a state machine implementation. You can’t just jump in and start slinging code right and left. Before you start developing a state machine you need to create a State Diagram (also sometimes called a State Transition Diagram), to serve as a road-map of sorts for you during the development process. If you don’t take the time for this vital step, you are pretty much in the position of a builder that starts work on a large building with no blueprint. To be fair, design patterns exist that are less dependent upon having a completed, through design. However, those patterns tend to be very linear in structure, and so are easy to visualize in good dataflow code. By contrast, state machines are very non-linear in their structure so can be very difficult to develop and maintain. To keep straight what you are trying to accomplish, state machines need to be laid out carefully and very clearly. The unfortunate truth, however, is that state machines are often used for the exact opposite reason. There is a common myth that state machines require a minimum of design because if you get into trouble, you can always just, “add another state”. In fact, I believe that much of the bad advice you will get on state machines finds its basis in this myth.

But even if we buy the idea that state machines require a more through design, why insisted on State Diagrams? One of the things that design patterns do is foster their own particular way of visualizing solutions to programming problems. For example, I have been very candid about how a producer/consumer design pattern lends itself to thinking about applications as a collection of interacting processes. In the same way, state machines foster a viewpoint where the process being developed is seen as a virtual machine constructed from discrete states and the transitions between those states. Given that approach to problem solving, the state diagram is an ideal design tool because it allows you to visually represent the structure that the states and transitions create.

So what does it take to do a good state-machine design? First you need to understand the process — a huge topic on its own. There are many good books available on the topic, as well as several dedicated web sites. Second, having a suitable drawing program to create State Diagrams can be helpful, and one that I have used for some time is a free program called yEd. However fancy graphics aren’t absolutely necessary. You can create perfectly acceptable State Diagrams with nothing more than a paper, a pencil and a reasonably functional brain. I have even drawn them on a white board during a meeting with a client and saved them by taking a picture of them with my cell phone.

Moreover, drawing programs aren’t much help if you don’t know what to draw. The most important knowledge you can have is a firm understanding of what a state machine is. This is how Wikipedia defines a state machine:

A finite-state machine (FSM) or finite-state automaton (plural: automata), or simply a state machine, is a mathematical model of computation used to design both computer programs and sequential logic circuits. It is conceived as an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time; the state it is in at any given time is called the current state. It can change from one state to another when initiated by a triggering event or condition; this is called a transition.

An important point to highlight in this description is that a state machine is at its core is a mathematical model — which itself implies a certain level of needed rigor in their design and implementation. The other point is that it is a model that consists of a “finite number of steps” that the machine moves between on the basis of well-defined events or conditions.

3. Ignore what a “state” is

Other common problems can arise when there is confusion over what constitutes a state. So let’s go back to Wikipedia for one more definition:

A state is a description of the status of a system that is waiting to execute a transition.

A state is, in short, a place where the code does something and then pauses while it waits for something else to happen. Now this “something” can be anything from the expiration of a timer to a response from a piece of equipment that a command had been completed successfully (or not). Too often people get this definition confused with that for a subVI. In fact one very common error is for a developer to treat states as though they were subroutines that they can simply “call” as needed.

4. Use strings for the state variable

The basic structure behind any state machine is that you have a loop that executes one state each time it iterates. To define execution order there is a State Variable that identifies the next state the machine will execute in response to an event or condition change. Although I once saw an interesting object-oriented implementation that used class objects to both identify the next state (via dynamic dispatch) and pass the state machine operational data (in the class data), in most cases there is a much simpler choice of datatype for the State Variable: string or enumeration.

Supposedly there is an ongoing discussion over which of these two datatypes make better state variables. The truth is that while I have heard many reasons for using strings as state variables, I have yet to hear a good one. I see two huge problems with string state variables. First, in the name of “flexibility” they foster the no-design ethic we discussed earlier. Think about it this way, if you know so little about the process you are implementing that you can’t even list the states involved, what in the world are you doing starting development? The second problem with state strings is that using them requires the developer to remember the names of all the states, and how to spell them, and how to capitalize them — or in a code maintenance situation, remember how somebody else spelled and capitalized them. Besides trying to remember that the guy two cubicles down can never seem to remember that “flexible” is spelled with an “i” and not an “a”, don’t forget that there is a large chunk of the planet that thinks words like “behavior” has a “u” in them…

By the way, not only should the state variable be an enumeration, it should be a typedef enumeration.

5. Turn checking the UI for events into a state

In the beginning, there were no events in LabVIEW and so state machines had to be built using what was available — a while loop, a shift register to carry the state variable, and a case structure to implement the various states. When events made their debut in Version 6 of LabVIEW, people began to consider how to integrate the two disparate approaches. Unfortunately, the idea that came to the front was to create a new state (typically called something like, Check UI) that would hold the event structure that handles all the events.

The correct approach is to basically turn that approach inside out and build the state machine inside the event structure — inside the timeout event to be precise. This technique as a number of advantages. To begin with, it allows the state machine to leverage the event structure as a mechanism for controlling the state machine. Secondly, it provides a very efficient mechanism for building state machines that require user interaction to operate.

Say you have a state machine that is basically a wizard that assists the user in setting up some part of your application. To create this interactivity, states in the timeout event would put a prompt on the front panel and sets the timeout for the next iteration to -1. When the user makes the required selection or enters the needed data, they click a “Next” button. The value change event handler for the button knows what state the state machine was last in, and so can send the state machine on to its next state by setting the timeout back to 0. Very neat and, thanks to the event-driven programming, very efficient.

On the other hand, if you are looking for a way to allow your program to lock-up and irritate your users, putting an event structure inside a state is a dandy technique. The problem is that all you need to stop your application in its tracks is one series of state transitions where the “Check UI” state doesn’t get called often enough, or at all. If someone clicks a button or changes something on the UI while those states are executing, LabVIEW will dutifully lock the front panel until the associated event is serviced — which of course can’t happen because the code can’t get to the event structure that would service it. Depending on how bad the overall code design is and the exact circumstances that caused the problem, this sort of lock-up can last several seconds, or be permanent requiring a restart.

6. Allow default state transitions

A default state transition is when State A always immediately transitions to State B. This sort of design practice is the logical equivalent of a sequence structure, and suffers from all the same problems. If you have two or more “states” with default transitions between them, you in reality have a single state that has been arbitrarily split into multiple pieces — pieces that hide the true structure of what the code is doing, complicates code maintenance and increases the potential for error. Plus, what happens if an error occurs, there’s a shutdown request, or anything else to which the code needs to respond? As with an actual sequence structure, you’re stuck going through the entire process.

7. Use a queue to communicate state transitions

Question: If default transitions are bad, why would anyone want to queue up several of them in a row?
Answer: They are too lazy to figure out exactly what they want to do so they create a bunch of pieces that they can assemble at runtime — and then call this kind of mess, “flexibility”. And even if you do come up with some sort of edge case where you might want to enqueue states, there are better ways of accomplishing it than having a queue drive the entire state machine.

Implementation Preview

So this is about all we have room for in this post. Next Monday I’ll demonstrate what I have been writing about by replacing the random number acquisition process in our testbed application with an updated bit of LabVIEW memorabilia. Many years ago the very first LabVIEW demo I saw was a simple “process control” demo. Basically it had a chart with a line on it that trended upwards until it reached a limit. At that point, an onscreen (black and white!) LED would come on indicating a virtual fan had been turned on and the line would start trending back down. When it hit a lower limit, the LED and the virtual fan would go off and the line would start trending back up again. With that early demonstration in mind, I came up with this State Diagram:

Demo State Machine

When we next get together, we’ll look at how I turn this diagram into a state-machine version of the original demo — but with color LEDs!

Until next time…

Mike…