Tree-Control Menus: A Case Study in Data Management

The last time we were together, we discussed the first of two common use cases for tree controls: displaying tabular data. This time out, we are going to look at the other major use case: using tree controls as a sort of menu system to control an application’s operation – or at least its GUI.

The Problem We’re Solving

If you look at the testbed application that we have been working on for almost a year, it’s pretty clear that much of the work has been going on “behind the scenes” and not in the GUI. Oh it is nicely modularized thanks to a structure built around a subpanel interface, but the actual controls are really pretty bare bones. A good example of the utilitarian, but unsophisticated structure is the usage of a simple pop-up menu to select the screen to view. Right now it works pretty well because there is only a handful of plugin screens from which we can choose. However, it doesn’t take much imagination to visualize the mess that would result if there were a dozen, or even hundreds of screens available. We need better organization.

Fixing the Data

The biggest conceptual difference between our current goal, and the one we worked on last time is that in our earlier discussion we were displaying data that already existed outside the application. In other words, my disk has directories that contain files and other directories whether or not I chose to create a program that can read and display the directory’s contents. By contrast, the data we are going to be displaying now only exists within the context of our program, or perhaps within the context of our test environment as a whole. One big consequence of this fact is that we a lot more freedom to define the data’s structure and presentation.

For instance, when our testbed runs right now, there are two “acquisition” processes and three “temperature controllers”. Let’s say, for the sake of argument, that the controller functions are dispersed geographically, and what we see on the local interface is status data from three remote processes. In such a situation, we can observe that there is no “correct” way of viewing that overall structure. Depending upon who the user is and what they need to do there are (at least) two ways that these systems could be organized.

One user, might want to see a top-level breakdown that groups systems based on the function they perform. With this approach to organization, you would have sections for “Data Sources” and “Temperature Controllers”. The individual screen would then be grouped under one or the other of those headings:

By Function

Alternatively, a different user might want to see the network resources grouped primarily by each system’s geographical location, with the functions for each site then grouped together like so:

By Location

However, as I said before, neither view is any more “correct” than the other. Therefore. we need to be able to support either one – and any other structure that our customers request, as well. Although this level of flexibility might seem to be a tall order, the truth of the matter is that the tree control’s basic operation is very simple, so all we are really talking about is a matter of data management. Moreover, we already have in our hands the tools we need to accomplish the job. I am talking, of course about our database.

Creating the Data Management Structures

So in defining our data structures, we can start with what we already know: The user needs to be able to select basic menu structures by changing a single value. From this requirement it’s obvious that we’re going to need a table to identify the menu’s basic context. We will then use the values stored in that table to qualify the menu item groupings. Here is the definition for this table, and the three records we are going to insert into it:

CREATE TABLE menu_context (
  id     AUTOINCREMENT PRIMARY KEY,
  label  TEXT(100),
  CONSTRAINT contextlabel_uc UNIQUE(label)
  )
;

INSERT INTO menu_context (id, label) VALUES (0,'NULL');
INSERT INTO menu_context (label) VALUES ('By Function');
INSERT INTO menu_context (label) VALUES ('By Location');

The second table we need to define, will hold the records that describe the actual menu entries. Each record defines one line of the tree control’s contents.

CREATE TABLE menu_group (
  id           AUTOINCREMENT PRIMARY KEY,
  context_id   INTEGER NOT NULL,
  item_name    TEXT(100),
  parent_id    INTEGER NOT NULL,
  sort_order   INTEGER,
  CONSTRAINT context_group_fk FOREIGN KEY (context_id) REFERENCES menu_context(id),
  CONSTRAINT self_ref_fk FOREIGN KEY (parent_id) REFERENCES menu_group(id)
  )
;

As is typical, the data for each record incorporates a primary key that uniquely identifies it. Next, comes a foreign key value that relates each record to one of the menu context values defined in the menu_context table. The last three fields store the data that controls the entry’s appearance in the tree control. The item_name field contains the text that will appear for the item’s entry in the tree control. The parent_id is the ID key for the item’s parent. A key value of 0 indicates a top-level item. Note that this values relates to the id field value in the same table. This sort of self-referential relationship is common when creating tables that are, in essence, linked lists. Finally, the sort_order field defines the order in which the menu entries will be added to the tree control. This last field is necessary because we are storing the configuration data in a database – and as you will recall DBMS make no promises about the order of data in queries unless you explicitly include an ORDER BY clause in the query.

Now that we have a table defining the overall tree control menu structure, we need to be able to insert into that structure the entries that will represent the plugin screens. In order to accomplish that task we need a table that relates data we already have in the database (the contents of the launch_item table) to the specific tree control entries that will be their parents in the tree control. The following table fulfills that task:

CREATE TABLE subpanel_group_xref (
  id               AUTOINCREMENT PRIMARY KEY,
  launch_item_id   INTEGER NOT NULL,
  menu_group_id    INTEGER NOT NULL,
  menu_context_id  INTEGER NOT NULL,
  CONSTRAINT launchid_subpanel_FK FOREIGN KEY (launch_item_id) REFERENCES launch_item(id),
  CONSTRAINT groupid_subpanel_FK FOREIGN KEY (menu_group_id) REFERENCES menu_group(id),
  CONSTRAINT contextid_subpanel_FK FOREIGN KEY (menu_context_id) REFERENCES menu_context(id)
  )
;

This table might seem a strange candidate for implementing this crucial bit of functionality because it doesn’t appear to actually store any data. The table only has 4 fields and they are all seem to be holding integers. The distinction here is that while most of the tables we have considered serve to store data, this table stores relationships – specifically the 3-way relationship that defines where each plugin will appear in the menu for each menu context. To see how these bits fit together we need to start considering the LabVIEW code that will read these structures and build the tree control based menus.

Creating the LabVIEW

The basic approach that we will take in creating the entries for the tree control is going to incorporate two distinct phases.

  1. Draw the menu structure
  2. Fill in the entries associated with the plugins

Reading the Data

The VI that is responsible for reading the menu data from the database (Config Data_DB_ADO:Read Tree Menu Structure.vi) has an enumerated input that selects the menu context the code will display. This value drives a subVI (Get Menu Context ID.vi) that reads and buffers the id value associated with the desired menu context.

Read Tree Menu Structure

In one sense, this subVI really isn’t necessary because you could theoretically perform this look-up operation using a so-called “subquery”, but this approach is far less efficient because it forces the DBMS to repeat the look-up with each query. In addition, these values are not going to change, so better to let LabVIEW remember them. To my way of thinking, however, the biggest issue with this approach is that it complicates the query itself. Given that this is the logic that maintainers (who may not be knowledgeable in SQL) are going to see, it’s a good idea to keep the SQL logic as simple as possible. The other thing to notice about the query is that it puts the entries in the correct order for display by incorporating the clause ORDER BY parent_id, sort_order ASC. Finally, you can see that I built this logic inside the generic ADO database subclass of the existing Config Data object structure.

For reading the tree entries associated with the plugins we use this VI, which is similar to the one for reading the main menu structure, but with some important differences.

Read Tree Menu Plugin Entries

The first obvious thing is that the query is much more complex because the primary table being queried is a cross-reference table. Consequently, we have to de-reference the id numbers to derive the data we need to build the menu entries. In learning how this de-referencing works, it’s important to remember that SQL is a language created by a mathematician – specifically a mathematician who specialized in a branch of mathematics called “Set Theory”. His (incredibly optimistic) idea was that if he could create a language based on mathematic principles, he would be able to prove, in the mathematical sense, that the program was correct (read: bug free).

While his grand hope evaporated in the face of the harsh reality that most programming has surprising little to do with mathematics (i.e. computing an answer), the set orientation of SQL has survived. For example, when you perform a query, what you are really doing is SELECTing a data subset FROM a larger set of data – which is typically a table. However, sometimes you need to gather data from a still larger set of data that is spread across multiple tables. To do that, you need to temporarily JOIN those tables together into one large virtual dataset based ON some criteria, like matching id numbers. Get the idea?

A not-so-obvious thing about this LabVIEW code is where it is located in the Config Data object structure. Unlike the routine for reading the basic menu structure, this VI is not located in the generic ADO database subclass. Instead it can be found in the JET database subclass, and the reason for this placement lies in the query. Unlike the other query operation which was implemented in generic SQL, there are aspects of this query that utilize JET-specific syntax (specifically, all the parentheses).

Generating the Menu Tags

With the data in hand that defines the tree-control menus, we now need to turn that data into menu entries. The first step in that transformation is to process the raw data we have acquired from the database to generate the tags that are needed to properly organize the tree entries. I won’t take up the room to show the code for this VI (Parse Tree Management Data.vi) because it’s easy to explain what it does – but feel free to check it out in the code. The VI’s primary program structure is a while loop that iterates through the raw tree-definition data generating the tags and formatting the data to generate the tree items. The loop has on it two shift registers: one holds an array of ID numbers that the loop has already processed, the other holds an array of clusters. Each element contains the four items that we will need to define a menu item (Parent Tag, Child Name, Child Tag and Child Only?).

With each iteration, the loop extracts the top element from the array and tests the Parent ID. If it is zero, the item is a top-level entry so the code builds its entry and continues with the next iteration. If Parent ID is anything other than 0, it searches the array of processed IDs to see if its parent has already been processed. When the comparison finds the new entry’s Parent ID it uses its tag value to synthesize the new entry’s child tag. When the new entry’s Parent ID is not found, the code adds the entry’s element back onto the bottom of the array of entries to be processed so it can be retried later. Normally, this search should never fail because the ordering in the queries should put the elements in the correct order, but this is just in case. This operation continues until there are no more entries left to process.

Finally, in terms of tree control infrastructure, the only thing we have left is to actually insert the entries that we have defined into the tree control. By the time we get to this point in the code we have gotten the definitions in the correct order so all we have left to do is disable front panel updates, clear the tree control, add the new tree entries and re-enable front panel updates. Again, this code is very simple so to save space I will refer you to the last post (https://www.notatamelion.com/2015/09/14/a-tree-grows-in-brooklyn-labview/) for details on the call and how it works.

Integration with the Testbed

To integrate this code with the existing testbed application requires very little work. First off on the front panel, we remove the existing ring control that we were using to select screens and add the tree control (I’m using he one from the System-themed palette), and a System-themed enumeration that will allow the operator to switch between the two menu context values. Note that this control could also be defined as a ring with the String[] control populated at run time to show the available options. This implementation would be useful if you want to provide the ability in your program to either allows the users to dynamically configure the basic structure of the tree control menus, or provide different options depending on who is using the system.

New Front Panel Controls

On the block diagram, the front panel changes impact the program logic in two places. First, we need to create a new Value Changeevent to handle the Menu Context control. This event (which is also fired when the GUI initializes itself) is responsible for rebuilding the tree-control menu display.

Menu Context-Value Change Event

The event handler starts by calling a subVI (Get Tree Menu Data.vi) that accepts as an input a Menu Context value and internally calls the two database query VIs we discussed above. After concatenating the arrays that it gets from the two routines, it passes the raw data to the VI (Parse Tree Management Data.vi) I described that converts the raw data into tree control entries. Finally it returns the array of tree control entries to the event handler, which passes it, and two references to the subVI (Draw Menu.vi) which does exactly what it name says. The first of the references is, obviously, a control reference to the tree control. The other is a VI reference to the GUI itself so the subVI can defer and then re-enable front panel updates.

The other block diagram change is to purpose an existing value change event. The event n question used to handle the ring control that changed screens and while it will still be a value change event, it will be a value change event on the aptly labeled tree control, Tree.

Tree-Value Change Event

The original logic that occupied this space took the string value of the selected ring item and used it to look up the name of the associated screen in an array of strings. The string array consisted of screen labels that were generated when the GUI loaded the subpanel VIs into memory and started them running. The resulting index was then used to index the screen’s VI reference from an array of plugin screen VI references. This VI reference would, in turn, drive the subpanel’s Insert VI method to make that screen visible in the subpanel.

The modified form works basically the same, but with a couple minor differences. Although the tree control’s value is a string, the string is the tag associated with the entry. Since the part we need to perform our search is the last item in the colon-delimited list, the first thing we need to do is strip off everything up to, and including, the lastcolon in the string. Moreover because we want this operation to be efficient as possible – so no looping. A very efficient solution is to use the built-in Match Pattern function with the rather curious-looking pattern shown. To see how it works, consider that a dot (“.”) is a special character matches any character. Next, the asterisk (“*”) is a special character that matches the longest sequence of the token that came just before it. Hence, it will match the longest sequence of any characters. Finally, the colon is not a special character so it will match just a colon. The end result is that the complete pattern will match the longest sequence of characters that are followed by a colon, and it works the same whether there is one colon in the string or a dozen. The string I want will be what is left after the match.

The other change that was needed for the tree control, is the case structure, which is there to work around a bug in the way LabVIEW handles value change events with tree controls. I configured the tree control such that only entries that are marked as Child Only are selectable. The bug is that when you click on one of the parent items, LabVIEW still fires the value change event even though the value of the tree control isn’t changing. To work around this issue, the event handler bypasses all further event processing of the “selected” item when it isn’t found in the list of screens.

Testing the Interface

As always, the “Proof of the pudding is in the eating” so let’s try running the application with its new GUI feature. One minor difference in behavior is that the subpanel now remains empty at startup until the user makes a selection. However, from the time the application starts, the user has visible a complete list of all the display screens that are available. In addition, the tree will automatically reconfigure itself when a different context is selected.

Testbed Application – Release 17
Toolbox – Release 14

The Big Tease

At one time when I started doing this work, test systems were surprisingly homogeneous. While it was true that instruments came from many different vendors, the software environment was pretty monolithic. Today, however, things have really changed. Every day it is becoming more common and accepted to have multiple applications running in parallel that were developed using a variety of development tools ranging from C++ to Java to C# and F.

In the past we have talked about creating a LabVIEW-based backend application with the main GUI built using standard web tools such as HTML, CSS and JavaScript (https://www.notatamelion.com/2015/06/08/building-a-web-backend-in-labview/). Over the next few posts I want to consider some of the other ways that your LabVIEW application can work with external applications.

Until Next Time…
Mike…

A Tree Grows in Brooklyn LabVIEW

This time out, I want to start exploring a user interface device that in my opinion is dramatically under-utilized. I am talking about the so-called tree control. This structure solves a number of interface challenges that might otherwise be intractable. For example, the preferred approach to displaying large amounts of data is to avoid generating large tabular blocks of data, opting instead to display these datasets on graphs. However, there can be situations where those large tabular blocks of data are exactly what the customer wants. What a tree control can do is display this data using a hierarchical structure that makes it easier for the user to find and read the specific data they needs. A good example of this sort of usage is Windows explorer. Can you imagine how long it would take to you find anything of all Windows provides was an alphabetical list of all the files on your multi-gigabyte hard drive?

Alternately, a tree control can provide a way of hierarchically organize interface options. For instance, to select the screen to display in the testbed application we have been building, the program currently uses a simple pop-up menu containing a list of the available screens. This technique works well is you have a limited number of screens, but does not scale well.

We will structure our evaluation of tree controls around two applications that demonstrate its usage both as a presentation device for large datasets and as a control interface. Starting that discussion, this week we will look how display a large amount of data (all the files on your PC). Then in the following post we will explore its usefulness for controlling the application itself by modifying the testbed application to incorporate it.

Our Current Goal

To demonstrate this control’s ability to organize and display a large amount of tabular data, we are going to consider an example that displays a hierarchical listing of the files on your computer starting with a directory that you specify. The resulting display will represent folders as expandable headings, and for files show their size and modification date.

I picked this application as an example because it provides the opportunity to discuss an interesting concept that I have been wanting to cover for some time (i.e. recursion). Moreover, on a practical level, this application makes it easy to generate a very large set of interesting data – though it isn’t very fast. But more on that in a bit. For now, let’s start by considering what it takes to make this tree grow. Then we can look at the application’s major components.

Becoming a LabVIEW Arborist

Although tree controls and menus occupy different functional niches, their APIs bear certain similarities. For example, they both draw a distinction between what the user sees and the “tags” that are used internally to identify specific items. Likewise, when creating a child item, both APIs use the parent’s tag to establish hierarchical relationships.

A big difference between the two is that a tree control can have multiple columns like a table. In fact, one way of understanding a tree control is as a table that lets you collapse multiple rows into a header row. So in designing for this thing, the first thing we need to do is decide what values are going to represent the “header rows”, and what values the “data rows”. For this, our first excursion into utilizing tree controls, the “header” rows will define the folders – so that is where we go first.

Showing the Folders

The code that we will use to add a new folder to the tree resides in a VI called Process New Directory Entry.slow.vi (the reasons for the “slow” appellation will be explained shortly).

Process New Directory Entry.slow

Because this logic resides in a subVI, the reference to tree control comes from a control on the VI’s front panel. Next, note that the way you get the row into the control is by using an invoke node that instantiates the Edit Tree Items:Add Item method. I point out this fact because it tells you something important: All the data we are going to be displaying in the control are properties of the control, not values. Consequently, they will be automatically saved as part of the control whenever you save the VI that contains the control.

Next, let’s consider the inputs to the method. The top-most item is Parent Tag. The assumption is that the method is defining a new child item, so this input defines the parent under which the new child will reside. Therefore, a Parent Tag that is a null string indicates an item with no parent (i.e. a top-level item). The next item down from the top is Child Position and its job is to tell the method where to insert the new child that it is creating. A value of -1, as is used here, tells the method to put the new child after any existing children of the identified parent. In other words, if this code is called multiple times, the children will appear in the control in the order in which they were created.

The next two input items (Left Cell String and Child Text) control what the user will see in the control. You will recall that I said that tree controls are sort of like hierarchical, collapsible tables. In that representation, the left-most cell shows the hierarchical organization through indentation. In addition, by default, every row that has other rows nested beneath it shows a small glyph indicating that the row can be expanded. The other cells in the row are like the additional columns in the table and can hold whatever data you want. When creating entries for directories, the left-most cell will contain the name of the directory, and the remainder of the row will be empty. To implement this functionality, the input path is stripped to remove the last item. This value is passed to the Left Cell String input. In addition, an empty string array is written to the Child Text input.

Next, the Child Tag input allows you to specify the value that you want to use to uniquely identify this row when creating children under it, or reading the value of control selections. Now the documentation says that if you don’t wire a string to this input, it will reuse the Left Cell String value as the tag, but you don’t want to depend on this feature. The problem is that tags have to be unique so to prevent duplication, LabVIEW automatically modifies these tags to insure that they are unique by appending a number at runtime. While it is true that the method returns a string containing the key that LabVIEW generated, not knowing ahead of time what the tag will be can complicate subsequent operations. To avoid this issue, I like to include logic that will guarantee that the tag value that I write to this input is unique. For this application, if the parent tag is a null string (indicating a top-level item), the code takes the entire path, converts it to a string and uses the result as the child tag. In the parent tag is not null, the code generates the tag by taking the parent tag value and appending to it a slash character and the string that is feeding the child’s Left Cell String input. If this reason for this logic escapes you, don’t worry about it – you’ll see why it’s important shortly.

Finally, the Child Only? input is a flag that, when false, allows other rows to be added hierarchically beneath it.

Showing the Files

With the code handled for creating entries associated with directories, now we need to implement the logic for creating the entries that represent the files inside those directories – which as you can see below, utilizes the same method as we saw earlier, but plays with the inputs in a slightly different ways.

Process New File Entry.slow

Named Process New File Entry.slow.VI, this VI is designed to use the additional columns to provide a little additional information about the file: to wit, its size and last modification date. Therefore, the first thing the code does is call a built-in function (File/Directory Info) that reads the desired information. However, this call raises the potential of an error being generated. When errors are possible you need to spend some time thinking about what you want to have happen when they occur. In this situation, there are three basic responses:

  1. Propagate the Error: With this approach, the error would simply be propagated on through the code and be reported like any other error. This action would ensure that the error would be reported, but would stop the processing of the interface.
  2. Don’t Include this File in the List: By trapping the error and preventing it from being passed on, we can use it to block the display of files for which we can’t retrieve the desired error. This technique would allow the interface processing to run to completion, by simply ignoring the error.
  3. Include the File but not its Data: A variation of Option 2, this approach would still block the error from being propagated. However, it would still create the file’s entry in the table but with dummy data like, “Not Available”, or simply “n/a” for the missing data.

So which of these options is the correct one? This is one of those situations where there is no universally correct answer. Sorting out which option is the correct one for your application is why, as a software engineering professional, you earn the “big bucks”. For the purpose of our demonstration, I picked Option 2.

Next, the New Directory Tag input is the tag that is associated with the folder in which this file resides. Finally, the Child Tag value is calculated by taking the Parent Tag value and appending to it a slash character and the name of the file stripped from the input path.

Pulling it all Together

So those are the two main pieces of code. All we have to do now is combine them into a single process that will process a starting directory to produce a hierarchical listing of its contents. The name of this VI is Process Directory.slow.vi, and this is what its code looks like:

Process Directory.slow

So you can see that the first thing is does is call the subVI we discussed for creating an entry in the tree control for the directory identified in the Starting Path input, using the tag value from the My Parent Tag input. The result is that the folder is added to the tree, and the subVI returns the tag for the new folder item. The next step is to process the folder’s contents so the code calls the built-in List Folder function to generate lists of the directory’s files and subdirectories.

The array of file names is passed into a loop that repeatedly calls the subVI we discussed earlier that creates entries in the tree control for individual files. The array of subdirectory names drives a loop that first verifies that the first character of the name is not a dollar sign (“$”). Although this check is not technically necessary, it serves to bypass various hidden system directories (like $Recycle Bin) which would generate errors anyway. Assuming that the subdirectory name passes the test, the code calls a subVI that we haven’t looked at before – or have we? If you open this subVI and go to its block diagram and you will see this:

Process Directory.slow

Look familiar? I have not simply duplicated the logic in Process Directory.slow.vi, rather I am using a technique called recursion to allow the VI to call itself. This idea might sound more than a little confusing, but if you think about it, the idea makes a lot of sense. Look at it this way, to correctly process these subdirectories, we need to do the exact same things as we are doing right now to process the parent directory, so why not use the exact same code?

The way it works is that Process Directory.slow.vi is configured in its VI Properties as a shared clone reentrant VI. To review, when LabVIEW runs code utilizing share clones, it creates a small pool of instances of the VIs code in memory. When the shared clone VI is actually called, LabVIEW goes to this pool and dynamically calls one of the share clones that isn’t currently being used. If the pool every “runs dry” LabVIEW automatically adds more clones to the pool. It is this behavior relative to shared clones that is key to the way LabVIEW implements recursion. In order to see how this recursion operates, let’s consider this very basic top-level VI:

Getting Processing Started.slow

The code first clears any contents that might already exist in the tree control and then makes the first call to Process Directory.slow.vi. When the runtime engine sees that call, it goes to the pool, gets a clone of the VI and starts executing it. An important point to remember is that even though all the clones in the pool were derived from the same VI, they are at this point separate entities. It is as though you manually created several copies of the same VI, except LabVIEW did the copying for you.

When running this first clone, LabVIEW will eventually get to the call that it makes to Process Directory.slow.vi. As before, the runtime engine will go to the pool, get a second clone of the VI and start it executing, and so it will go until execution gets to a directory that only has files in it. In that case, the cloned VI will not get called and that Nth-generation clone will finish its execution. At this point LabVIEW will release the clone back to the pool for future reuse, and return to executing the clone that called the one that just finished. This calling clone may have other subdirectories to process, or it may be done – in which case it will also finish its execution, LabVIEW will release it back to the pool, and continue executing the clone that called it. This process will continue until all the clones have finished their work.

Some Further Points

And that, dear readers, is how the process basically works, but there are a couple important things still to cover. We need to talk about memory consumption, performance and how to interact with this control in your program once you have it populated with data.

Memory Considerations

I mentioned earlier that the information that you enter into a tree control are actually properties of the control – not its data. I also stated that as a result of that fact, said information will automatically be saved as part of the control. As a demonstration of that fact, consider that the very basic top-level VI I just showed you consumes about 14 kbytes on disk. However, as a test I turned the process loose on my PC’s Program Files (x86) directory. After it had finished processing the 14,832 folders(!) and 122,533 files(!!) contained therein, I saved the VI again. At that point, the size of the VI on disk ballooned to 2.6 Mbytes.

The solution is to remember to always remember to delete all items from a tree control when the program using it stops. Although you obviously don’t have to worry about this sort of growth is a compiled application (a standalone application can’t save changes to itself), this convention will help to keep you from inadvertently saving extraneous information during development and artificially expanding the size of your application.

Performance Considerations

The test I did to catalog my PC’s Program Files (x86) directory also highlighted another issue: execution speed. To complete the requested processing took about an hour and a half. Doing the same processing, but minus the tree control operations, took less than a minute, so the vast majority or this time was clearly spent in updating the tree control. But what exactly was it that was taking so long? As it turns out, there are two sources of delay, the first of which is actually pretty easy to control.

The way the code is currently written, the tree control on the front panel updates its appearance after each addition – a problem by the way that is not unique to tree controls. The solution is to tell LabVIEW to stop updating the front panel for a while, and here is how to do it:

Getting Processing Started w-Defer Panel Updates

A VI’s front panel has a property called Defer Panel Updates when you set this property to true, LabVIEW records all changes to the VI’s front panel, but doesn’t actually update it to reflect those changes. When the property is later set to false, all pending changes are applied to the front panel at once. The additions shown reduces the time to process my entire Program Files (x86) directory by 66% to just 30 minutes – which is much better, but still not great.

To reduce our processing time further, we have to take more drastic measures – starting with a fundamental change in how we add entries for individual files. The issue is that the technique we are using to add entries is very convenient because we are explicitly identifying the parent under which each child is to be placed. Consequently, we have the flexibility to add entries in essentially any order. However, as the total number of entries grows larger we begin to pay a high price for this convenience and flexibility because, under the hood, the control’s logic has to incorporate the ability to insert entries at random into the middle of existing data.

The solution to this problem is to use a different method. This method is called Edit Tree Items:Add Multiple Items to End and as it names says it simply appends new items to the end of the current list of entries. Of course for this to work, it means that we have to take responsibility for a lot of stuff that LabVIEW was doing for us, like updating the control in order and maintaining the indentation to preserve the hierarchical structure. Thankfully, that work isn’t very hard. For instance, here is the code for creating the new directory entry:

Process New Directory Entry

The first thing you will notice is that the invoke node is gone. The method that we will be invoking sports a single input which is an array of clusters representing the tree entries that it will add. The purpose of the logic before us is to assemble the array element that will create the parent folder’s entry in the tree.

Next, note that the information needed to define the entry is slightly different. First, we don’t need to specify a tag for the parent because we are assuming that the node are going to be simply added to the display in the order that they occur in the array. However, that simplification raises a problem. How do you maintain the display’s hierarchical structure? The thing to remember is that the hierarchy is defined visually, but also logically, by the indentations in the entries. Therefore the entry definition incorporates a parameter that explicitly defines the number of level which the new entry should be indented. Due to the way that we have been building the tags, this value is very easy to calculate. All we have to do is count the number of delimiters (“\”) in the entry’s tag and then subtract the number of delimiters in the starting path. The first part of that calculation occurs in the subVI Calculate Indent Level.vi and the second part is facilitated by a new input parameter Indent Offset.

Making the same adaptations to the routine for adding a new file entry and you get this:

Process New File Entry

Nothing new to see here. The important part is how these two new VIs fit together and to see that we need to look at the recursive VI Process Directory.vi (I have zoomed in on just the part that has changed):

Process Directory

This logic’s core functionality is to build the array of entry definitions that the Edit Tree Items:Add Multiple Items to End method needs to do its work. The first element in this array is the entry for the directory itself, and the following elements define the entries for the files within the directory. Finally, we have a make a small change to the top-level VI as well:

Getting Processing Started

Specifically, we need to calculate the Indent Offset value based on the Starting Path input. But the important question, is does all this really help? With these optimizations in place the processing time for my PC’s Program Files (x86) directory drops to just a hair under 10 minutes. Of course while that improvement is impressive, it might still be too long, but the changes to reduce the processing time further are only necessary if dealing with very large datasets. Plus they really have nothing to do with the tree control itself – so they will have to wait for another time.

Event Handling

The last we have left out so far is what happens after the tree control is populated with data. Well, like most controls in LabVIEW, tree controls support a variety of events including ones that allow event structures to respond when the user selects or double-clicks an item in the control. But this point begs the question: What is the fundamental datatype of a tree control? By default, the datatype of a tree control is a string, and its value is the tag of the currently selected item. Alternatively, if no items are selected, the tree control’s value is a null, or empty string.

Top Level File Explorah

Because the control’s datatype is a string, you can programmatically clear any selection by writing a null string to a Value property node or a local variable associated with the control. However, note the words “By default…” like a few other controls (such as listboxes) tree controls can be configured to allow multiple items to be selected at once. In that case, the control’s datatype changes to an array of strings where each element is the tag of a selected item.

The other thing I wanted to point out through this example is the importance of carefully considering how to define tags for items. it may seem obvious but if you are taking the time to put data into this control, you are probably going to want to use it in the future. it behooves you therefore to tag it in such as way as to allow you to quickly identify and parse values. For example, in this example I put together the tags such that they mirror the data’s natural structure – its file path. By mimicking your data’s natural structure you make it easier to locate the specific information that you need.

File Explorer – Release 1
Toolbox – Release 12

The Big Tease

OK, that is enough for now. Next time we will return to our testbed application and look at using tree controls as a control element. With this use case the focus shifts from volume of data, to organization of the GUI to simplify operator interactions.

Until Next Time…
Mike…

More Than One Kind of Modularity

When learning something that you haven’t done before – like .NET – it’s not uncommon to go through a phase where you look at some of the code you wrote early on and cringe (or at least sigh deeply). The problem is that you are often not only learning a new interface or API, but you are learning how to best use that interface or API. The cause of all the cringing and sighing is that as you learn more, you begin to realize that some of the assumptions and design decisions that you made were misguided, if not flat-out wrong. If you look at the code we put together last time to help us learn about .NET in general, and the NotifyIcon assemble in particular, we see a gold-plated example of just such code. Although it is clearly accomplished it’s original goal of demonstrating how to access .NET functionality and illustrating how the various objects can relate to one another, it is certainly not reusable – or maintainable, or extensible, or any of the other “-ables” that good software needs to be.

In fact, I created the code in that way so this time we can take the lesson one step further to fix those shortcomings, and thus demonstrate how you can go about cleaning up code (of your own or inherited) that is making you cringe or sigh. Remember, it is always worth your time to fix bad design. I can’t tell you how many times I have seen people struggling with bad decisions made years before. Rather than taking a bit of time to fix the root cause of their trouble, they continue to waste hours on project after project in order to workaround the problem.

Ok, so where do we start?

Clearly this code would benefit from cleaning-up and refactoring, but where and how should we start? Well, if you are working on an older code base, the question of where to start will not be a problem. You start with where the most pain is. To put it another way, start with the things that cause you the biggest problems on a day-to-day basis.

This point, however, doesn’t mean that you should just sit around and wait for problems to arise. As you are working always be asking yourself if what you are doing has limitations, or embodies assumptions that might cause problems in the future.

The next thing to remember is that this work can, and should, be iterative. In other words you don’t have to fix everything at once. Start with the most egregious errors, and address the others as you have the opportunity. For example, if you see the code doing something stupid like using a string as a state variable, you can fix that quickly by replacing the strings with a typedef enumeration. I have even fixed some long-standing bugs in doing this replacement because it resolved places where states were subtly misspelled or contained extraneous spaces.

Finally, remember that the biggest payoffs, in terms of long-term benefit, come from improved modularity that corrects basic architectural problems. As we shall see in the following discussion, I include under this broad heading modularity in all its forms: modular functionality, modular logic and modular data.

Revisiting Modular Functionality

Modular functionality is the result of taking small reusable bits of code and encapsulating it in routines that simplify access, standardize the interface or ensure proper usage. There are good examples of all these usages in the application modified code, starting with Create NotifyIcon.vi:

Create NotifyIcon VI

Your first thought might be why I bothered turning this functionality into a subVI. After all, it’s just one constructor node. Well, yes that is true but it’s also true that in order to create that one node you have to remember multiple steps and object names. Even though this subVI appears rather simple, if you consider what it would take to recreate it multiple times in the future, you realize that it actually encapsulates quite a bit of knowledge. Moreover, I want to point out that this knowledge is largely stuff for which there is no overwhelming benefit to be gained from you committing it to memory.

Next, let’s consider the question of standardizing interfaces. Our example in this case is a new subVI I created to handle the task of assigning an icon to the interface we are creating. I have named it Set NotifyIcon Icon.vi:

Set NotifyIcon Icon VI

You will remember from out previous discussion that this task involves passing a .NET object encapsulating the icon we wish to use to a property node for the NotifyIcon object. Originally, this property was combined with several others on a single node. A more flexible approach is to breakup that functionality and standardize the interfaces for all the subVIs that will be setting NotifyIcon to simply consist of an object reference and the data to be used to set the property in a standard LabVIEW datatype – in this case a path input. This approach also clearly simplifies access to the desired functionality.

Finally, there is the matter of ensuring proper usage. A good place to highlight that feature is in the last subVI that the application calls before quitting: Drop NotifyIcon.vi.

Drop NotifyIcon VI

You have probably been warned many times about the necessity of closing references that you open. However, when working with .NET objects, that action by itself is sometimes not sufficient to completely release all the system resources that the assembly had been using. Most of the time if you don’t completely close out the assembly you many notice memory leaks or errors from attempting to access resources that still think they are busy. However with the NotifyIcon assembly you will see a problem that is far more noticeable, and embarrassing. If you don’t call the Dispose method your program will close and release all the memory it was using, but if you go to the System Tray you’ll still see your icon. In fact, you will be able to open its menu and even make selections – it just doesn’t do anything. Moreover, the only way to make it go away it to restart your computer.

Given the consequences of forgetting to include this method in your shutdown sequence, it is a good idea to make it an integral part of the code that you can’t forget to include.

Getting Down with Modular Logic

But as powerful as this technique is, there can still be situations where the basic concept of modularity needs to be expressed in a slightly different way. To see such a situation, let’s look at the structure that results from simply applying the previous form of modularity to the problem of building the menus that go with the icon.

Create ContextMenu VI

Comparing this diagram to the original one from last time, you can see that I have encapsulated the repetitive code that generated the MenuItem objects into dedicated subVIs. By any measure this change is a significant improvement: the code is cleaner, better organized, and far more readable. For example, it is pretty easy to visualize what menu items are on submenus. However, in cases such as this one, this improved readability can be a bit of a double-edged sword. To see what I mean, consider that for the structure of your code to allow you to visualize your menu organization, said organization must be hard-coded into the structure of the code. Consequently, changes to the menus will, as a matter of course, require modification to the fundamental structure of the code. If the justifications for modularity is to include concepts like flexibility and reusability, you just missed the boat.

The solution to this situation is to realize that there is more than one flavor of modularity. In addition to modularizing specific functionality, you can also modularize the logic required to perform complex and changeable tasks (like building menus) that you don’t want to hard code. If this seems like a strange idea to you, consider that computers spend most of their time using their generalized hardware to performed specialized tasks defined by lists of instructions called “programs”. The thing that makes this process work is a generalized bit of software called a “compiler” that turns the programs into data structures that the generalized hardware can use to perform specialized actions.

Carrying forward with this line of reasoning, what we need is a simple way of defining a menu structure that is external to our program, and a “menu compiler” that turns that definition into the MenuItem references that our program needs. So let’s build one…

Creating the Data for Our Menu Compiler

So what should this menu definition look like? Well, to answer that question we need to start with the data required to define a single MenuItem. We see that as a minimum, every item in a menu has to have a name for display to the user, a tag to identify it, and a parent tag that says if the item has a parent item (and if so which item is its parent). In addition, we haven’t really talked about it, but the order of references in an array of menu items defines the order in which the items appear in the menu or submenu – so we need a way to specify its menu position as well. Finally, because in the end the menu will consist of a list (array) of menu item references, it makes sense to express the overall menu definition that we will eventually compile into that array of references as a list (and eventually also an array).

But where should we store this list of menu item definitions? At least part of the to this question depends on who you want to be able to modify the menu, and the level of technical expertise that person has. For example, you could store this data in text files as INI keys, or as XML or JSON strings. These files have the advantage of being easy to generate and are readily accessible to anyone who has access to a text editor – of course that is their major disadvantage, as well. Databases on the other hand are more secure, but not as easy to access. For the purposes of this discussion, I’ll store the menu definitions in a JSON file because, when done properly, the whole issue of how to parse the data simply goes away.

To see what I mean, here is a nicely indented JSON file that describes the menu that we have been working using for our example NotifyIcon application:

[
	{
		"Menu Order":0,
		"Item Name":"Larry",
		"Item Tag":"Larry",
		"Parent Tag":"",
		"Enabled":true
	},{
		"Menu Order":1,
		"Item Name":"Moe",
		"Item Tag":"Moe",
		"Parent Tag":"",
		"Enabled":true
	},{
		"Menu Order":2,
		"Item Name":"The Other Stooge",
		"Item Tag":"The Other Stooge",
		"Parent Tag":"",
		"Enabled":true
	},{
		"Menu Order":3,
		"Item Name":"-",
		"Item Tag":"",
		"Parent Tag":"",
		"Enabled":true
	},{
		"Menu Order":4,
		"Item Name":"Quit",
		"Item Tag":"Quit",
		"Parent Tag":"",
		"Enabled":true
	},{
		"Menu Order":0,
		"Item Name":"Curley",
		"Item Tag":"Curley",
		"Parent Tag":"The Other Stooge",
		"Enabled":true
	},{
		"Menu Order":1,
		"Item Name":"Shep",
		"Item Tag":"Shep",
		"Parent Tag":"The Other Stooge",
		"Enabled":true
	},{
		"Menu Order":2,
		"Item Name":"Joe",
		"Item Tag":"Joe",
		"Parent Tag":"The Other Stooge",
		"Enabled":true
	}
]

And here is the LabVIEW code will convert this string into a LabVIEW array (even if it isn’t nicely indented):

Read JSON String

JSON has a lot of advantages over techniques like XML: For starters, it’s easier to read, and a lot more efficient, but this is why I really like using JSON: It is so very convenient.

Starting the Compilation

Now that we have our raw menu definition string read into LabVIEW and converted into a datatype that will simplify the next step in the processing, we need to ensure that the data is in the right order. To see why, we need to remember that the final data structure we are building is hierarchical, so the order in which we build it matters. For instance, “The Other Stooge” is a top-level menu item, but it is also a submenu so we can’t build it until we have references to all the menu items that are under it. Likewise, if one of the items under it is a submenu, we can’t build it until all its children are created.

So given the importance of order, we need to be careful how we handle the data because none of the available storage techniques can on their own guarantee proper ordering. The string formats can all be edited manually, and it’s not reasonable to expect people to always type in data in the right order. Even though databases can sort the result of queries, there isn’t enough information in the menu definition to allow it to do so.

The menu definition we created does have a numeric value that specifies the order of items in their respective menus and submenus. We don’t, however, yet have a way of telling the level the items reside at relative to the overall menu structure. Logically we can see that “Larry” is a top-level menu item, and “Shep” is one level down, but we can’t yet determine that information programmatically. Still the information we need is present in the data, it just needs to be massaged a bit. Here is the code for that task:

Ordering the Menu Items

As you can see, the process is basically pretty simple. I first rewrite the Item Tag value by adding the original Item Tag value to the colon-delimited list that starts with the Parent Tag. I then count the number of colons in the resulting string, and that is my Menu Level value. The exception to this processing are the top-level menu items which are easy to identify due to their null parent tags. I simply force their Menu Level values to zero and replace the null string with a known value that will make the subsequent processing easier. The real magic however, occurs after the loop stops. The code first sorts the array in ascending order and then reverses the array. Due to the way the 1D array sort works when operating on arrays of clusters, the array will be sorted first by Menu Level and then Menu Order – the first two items in the cluster. This sorting, in concert with the array reversal, guarantees that the children of a submenu will always be processed before the submenu item itself.

Some of you may be wondering why we go to all this trouble. After all, couldn’t we just add a value to the menu definition data to hold the Menu Level? Yes, we could, but it’s not a good idea, and here’s why. In some areas of software development (like database development, for instance) the experts put a lot of store in reducing “redundancy” – which they define basically as storing the same piece of information in more than one place. The problem is that if you have redundant information, you have to decide how to respond when the two pieces of information that are supposed to be the same, aren’t. So let’s say we add a field to the menu definition for the menu level. Now we have the same piece of information stored in two different places: It is stored explicitly in the Menu Level value while at the same time it is also stored implicitly in Parent Tag.

Generating the Menu Item “Code”

In order to turn this listing into the MenuItem references we need, we will pass this sorted and ordered array into a loop that will process one element at a time. And here it is:

Compiling the Menu-1

You can see that the loop carries two shift registers. The top SR holds a 1D array of strings that consists of the submenu tags that the loop has encountered so far. The other SR also carries a 1D array but each element in it is a cluster containing an array of MenuItem references associated with the submenu named in the corresponding element of the top SR.

As the screenshot shows, the first thing that happens in the loop is that the code checks to see if the indexed Item Tag is contained in the top SR. If the tag is missing from the array it means that the item is not a submenu, so the code uses its data to create a non-submenu MenuItem. In parallel with that operation, the code is also determining what to do with the reference that is being created by looking to see if the item’s Parent Tag exists in the top SR. If the item’s parent is also missing from the array, the code creates entries for it in both arrays. If the parent’s tag is found in the top SR, it means that one or more of the item’s sibling items has already been processed so code is executed to add the new MenuItem to the array of existing ones:

Compiling the Menu-2

Note that the new reference is added to the top of the array. The reason for this departure from the norm is that due to the way the sorting works, the menu order is also reversed and this logic puts the items on each submenu back in their correct order. Note also that during this processing the references associated the menu items are also accumulated in a separate array that will be used to initialize the callbacks. Because the array indexing operation is conditional, only a MenuItem that is not a submenu, will be included in this array.

Generating the Submenu “Code”

If the indexed Item Tag is found in the top SR, the item is a submenu and the MenuItem references needed to create its MenuItem should be in the array of references stored in the bottom SR.

Compiling the Menu-3

So the first thing the code does is delete the tag and its data from the two array (since they are no longer needed) and uses the data thus obtained to create the submenu’s MenuItem. At the same time, the code is also checking to see if the submenu’s parent exists in the top SR. As before, if the Parent Tag doesn’t exist in the array, the code creates an entry for it, and if it does…

Compiling the Menu-4

…adds the new MenuItem to the existing array – again at the top of the array. By the time this loop finishes, there should be only one element in each array. The only item left in the top SR should be “[top-menu]” and the bottom SR should be holding the references to the top-level menu items. The array of references is in turn used to create the ContextMenu object which written to the NotifyIcon object’s ContextMenu property.

What Could Possibly Go Wrong?

At this point, you can run the example code and see an iconic system tray interface that behaves pretty much as it did before, but with a few extra selections. However, we need to have a brief conversation about error checking, and frankly in this situation there are two schools of though on this topic. There is ample opportunity for errors to creep into the menu structure. Something as simple as misspelling a parent tag name could result in an “orphan” menu that would never get displayed – or could end up being the only one that is displayed. So the question is how much error checking do we really need to do? There are those that think you should spend a lot of time going through the logic looking for and trapping every possible error.

Given that most menus should be rather minimal, and errors are really obvious, I tend to concentrate on the low-hanging fruit. For example, one simple check that will catch a large number of possible errors, is looking to see if at the end of the processing, there is more than one menu name left in the top SR – and finding an extra one, asserting an error that gives the name of the extra menu. You should probably also use this error as an opportunity to abort the application launch since you could be left in a situation when you can’t shutdown the program because the “Quit” option is missing.

Something else that you might want to consider is what to do if the external file containing the menu definitions comes up missing. The most obvious solution is to, again, abort the application launch with some sort of appropriate error message. However, depending on the application it might be valuable to provide a hard-coded default menu that doesn’t depend on external files and provides a certain minimum level of functionality. In fact, I once worked on an application where this was an explicit requirement because one of the things that the program allowed the user to do was create custom menus, the structure of which was stored in external files.

Stooge Identifier – Release 2
Toolbox – Release 11

The Big Tease

So what are we going to talk about next time? Well something that I have seen coming up a lot lately on the user forum is the need to be able to work with very large datasets. Often, this issue arises when someone tries to display the results of a test that ran for several hours (or days!) only to discover that the complete dataset consists of hundreds of thousands of separate datapoints. While LabVIEW can easily deal with datasets of this magnitude, it should be obvious that you need to really bring you memory management “A” game. Next time will look into how to plot and manage VLDs (Very Large Datasets).

Until Next Time…

Mike…

Building a Web Backend in LabVIEW

When building a web-based application, one of the central goals is often to maximize return on investment (ROI) by creating an interface that is truly cross-platform. One of the consequences of this approach is that if you are using a LabVIEW-based program to do so-called “backend” tasks like data acquisition and hardware control, the interface to that program will likely be hidden from the user. You see the thing is, people like consistent user interfaces and while LabVIEW front panels can do a pretty good job as a conventional application, put that same front panel in a web page and it sticks out like the proverbial sore thumb.

Plus simply embedding a LabVIEW front panel in a web pages breaks one of the basic tenets of good web design by blurring the distinction between content and appearance. Moreover, it makes you — the LabVIEW developer — at least partially responsible for the appearance of a website. Trust me, that is a situation nobody likes. You are angered by what you see as a seemingly endless stream of requests for what to you are “pointless” interface tweaks, “So big deal! The control is 3 pixels too far to the left, what difference does that make?” And for their part, the full-time web developers are rankled by what they experience as a lack of control over an application for which they are ultimately responsible.

If you think about it, though, this situation is just another example of why basic computer science concepts like “low coupling” are, well, basic computer science concepts. Distinguishing between data collection and data presentation is just another form of modularity, and we all agree that modularity is A Very Good Thing, right?

It’s All About the (Data)Base

So if the concept that is really in play here is modularity, then our immediate concern needs to be the structure of the interface between that part of the system which is our backend responsibility, and that which is the “web guy’s” customer facing web-based GUI.

Recalling what we have learned about the overall structure of a web application, we know that the basic way most websites work is that PHP (or some other server-side programming tool) dynamically builds the web pages that people see based on data that they extract from a database. Consequently, as LabVIEW developer trying to make data available to the web, the basic question then is really, “How do I get my data into their database?”

The first option is to approach communications with the website’s database as we would with any other database. Although we have already talked at length about how to access databases in general, there is a problem. While most website admins are wonderful people who will share with you many, many things — direct access to their database is usually not one of those things.

The problem is security. Opening access to your LabVIEW program would create a security hole through which someone not as kind and wholesome as you, could wiggle. The solution is to let the web server mediate the data transfer — and provide the security. To explore this technique, we will consider two simple examples: one that inserts data and one that performs a query. In both cases, I will present a LabVIEW VI, and some simple PHP code that it will be accessing.

However, remember that the point of this lesson is the LabVIEW side of the operations. The PHP I present simply exists so you can get a general idea of how the overall process works. I make no claims that the PHP code bears any similarity to code running on your server, or even that it’s very good. In fact, the server you access might not even use PHP, substituting instead some other server-side tool such as Perl. But regardless of what happens on the other end of the wire, the LabVIEW code doesn’t need to change.

As a quick aside, if you don’t have a friendly HTTP server handy and you want to try some of this stuff out, there is a good option available. An organization called Apache Friends produces an all-in-one package that will install a serviceable server on just about any Windows, Linux or Apple computer you may have sitting around not being used. Note that you do not want to actually put a server that you create in this way on the internet. This package is intended for training and experimentation so they give short shrift to things like security.

The following two examples will be working with a simple database table with the following definition:

CREATE TABLE temp_reading (
   sample_dttm  TIMESTAMP PRIMARY KEY,
   temperature  FLOAT
   )
;

Note that although the table has two columns we will only be interacting directly with the temperature column. The sample_dttm column is configured in the DBMS such that if you don’t provide a value, the server will automatically generate one for the current time. I wanted to highlight this feature because the handling of date/time values is one of the most un-standardized parts of the SQL standard. So it will simplify your life considerably if you can get the DBMS to handle the inserting or updating of time data for you.

Writing to the Database

The first (and often the most common) thing a backend application has to do is insert data into the database, but to do that you need to be able to send data to the server along with the URL you are wanting to access. Now HTTP’s GET method which we played with a bit last week can pass parameters, but it has a couple big limitations: First, is the matter of security.

The way the GET method passes data is by appending it to the URL it is getting. For example, say you have a web page that is expecting two numeric parameters called “x” and “y”. The URL sent to the server would be in the form of:

http://www.mysite.html?x=3.14;y=1953

Consequently the data is visible to even the most casual observer. Ironically though, the biggest issue is not the values that are being sent. The biggest security hole is that it exposes the names of the variables that make the page work. If a person with nefarious intent know the names of the variables, they can potentially discover a lot about how the website works by simply passing different values and seeing how the page responds.

A second problem with the GET method’s way of passing data is that it limits the amount of data you can sent. According to the underlying internet standards, URLs are limited to a maximum of 2048 characters.

To deal with both of these issues, the HTTP protocol defines a method called POST that works much like GET, but without exposing the internal operation or limiting the size of the data it can pass. Here is an example of a simple LabVIEW VI for writing a temperature reading to the database.

Insert to database with PHP

This code is pretty similar to the other HTTP client functions that we have looked at before, but there are a few differences. To begin with, it is using the POST we just talked about. Next, it is formatting and passing a parameter to the method. Likewise, after the HTTP call it verifies that the text returned to it is the success message. But where do those values (the parameter name and the success message) come from? As it so happens I didn’t just pull them out of thin air.

Check out the URL the code is calling and you’ll note that it is not a web page, but PHP file. In fact, what we are doing is asking the server to run a short PHP program for us. And here is the program:

<?php
	if($dbc = mysqli_connect("localhost","myUserID","myPassword","myDatabase")) {
    	// Check connection
		if (mysqli_connect_errno()) {
			// There was a connection error, so bail with the error data
			echo "Failed to connect to MySQL: " . mysqli_connect_error();
		} else {
			// Create the command string...
			$insert = "INSERT INTO temp_reading (temperature) VALUES (" . $_POST["tempData"] . ")" ;
			
			// Execute the insert
			if(!mysqli_query($dbc,$insert)){
				// The insert generated an error so bail with the error data
				die('SQL Error: ' . mysqli_error($dbc));
			}
			
			// Close the connection and return the success message
			mysqli_close($dbc);
			echo "1 Record Added";
		}
	}
?>

Now the first thing that you need to know about PHP code is that whenever you see the keyword echo being used, some text is being sent back to the client that is calling the PHP. The first thing this little program does it try to connect to the database. If the connection fails the code echoes back to the clients a message indicating that fact, and telling it why the connection failed.

If the connection is successful, the code next assembles a generic SQL statement to insert a new temperature into the database. Note in particular the phrase $_POST["tempData"]. This statement tells the PHP engine to look through the data that the HTTP method provided and insert into the string it’s building the value associated with the parameter named tempData. This is where we get the name of the parameter we have to use to communicate the new temperature value. Finally, the code executes the insert that it assembled and echoes back to the client either an error, or a success message.

Although this code would work “as is” in many situations, it does have one very large problem. There is an extra communications layer in the process that can hide or create errors. Consider that a good response tells you that the process definitely worked. Likewise receiving one of the PHP generated errors tells you that the process definitely did not work. A network error, however, is ambiguous. You don’t know if the process failed because the transmission to the server was interrupted; or if the process worked but you don’t know about it because the response from the server to you got trashed.

PHP Mediated Database Reads

The other thing that backend applications have to do is fetch information (often configuration data) from the database. To demonstrate the process go back to the GET method:

Fetch from database with PHP

As with the first example, the target URL is for a small PHP program, but this time the goal is to read a time ordered list of all the saved temperatures and return them to the client. However, the data needs to be returned in such a way that it will be readable from any client — not just one written in LabVIEW. In the web development world the first and most common standard, is to return datasets like this in an XML string. This standard is very complete and rigorous. Unfortunately, it tends to produce strings that are very large compared to the data payload they are carrying.

Due to the size of XML output, and the complexity of parsing it, a new standard is gaining traction. Called JSON, for JavaScript Object Notation, this standard is really just the way that the JavaScript programming language defines complex data structures. It is also easy for human beings to read and understand, and unlike XML, features a terseness that makes it very easy to generate and parse.

Although, there are standard libraries for generating JSON, I have chosen in this PHP program to generate the string myself:

<?php
    if($dbc = mysqli_connect("localhost","myUserID","myPassword","myDatabase")) {

    	// Check connection
		if (mysqli_connect_errno()) {
			echo "Failed to connect to MySQL: " . mysqli_connect_error();
		} else {
			// Clear the output string and build the query
			$json_string = "";
			$query = "SELECT temperature FROM temp_reading ORDER BY sample_dttm ASC";
			
			// Execute the query and store the resulting data set in a variable (called $result)
			$result = mysqli_query($dbc,$query);
			
			// Parse the rows in $result and insert the values into a JSON string
			while($row = mysqli_fetch_array($result)) {
				$json_string = $json_string . $row['temperature'] . ",";
			}
			
			// Replace the last comma with the closing square bracket and send the
			// to the waiting client
			$json_string = "[" . substr_replace($json_string, "]", -1);
			echo $json_string;
		}
	}
	mysqli_close($dbc);
?>

As in the first PHP program, the code starts by opening a database connection and defining a small chunk of SQL — this time a query. The resulting recordset is inserted into a data structure that the code can access on a row by row basis. A while loop processes this data structure, appending the data into a comma-delimited list enclosed in square brackets, which the JSON notation for an array. This string is echoed to the client where a built-in LabVIEW function turns it into a 1D array of floats.

This technique works equally well for most complex datatypes so you should keep it in mind even if you are not working on web project. I once had an application where the client wanted to write the GUI in (God help them) C# but do all the DAQ and control in LabVIEW-generated DLLs. In this situation the C# developer had a variety of structs containing configuration data that he wanted to send me. He, likewise, wanted to receive data from me in another struct. The easy solution was to use JSON. He had a standard library that would turn his structs into JSON strings, and LabVIEW would turn those strings directly into LabVIEW clusters. Then on the return trip my LabVIEW data structure (a cluster containing a couple of arrays) could be turned, via JSON, directly into a C# struct. Very, very cool.

The Big Tease

So what is in store for next time? Well something I have been noticing is that people seem to be looking for new ways to let users interact with their applications. With that in mind I thought I might take some time to look at some nonstandard user interfaces.

For my first foray into this area I was thinking about a user interface where there are several screens showing different data, but by clicking a button you can pop the current screen out into a floating window. Of course, when you close the floating window, it will go back into being a screen the you can display in the main GUI. Should be fun.

Until Next Time…

Mike…

Expanding Data Processing Bandwidth — Automatically

Well-written software can typically deal with any performance requirement pretty easily as long as the requirement is constant. It’s when requirements change over time that things can get dicey. For example, if your test system generates a new data packet to process every second and it consistently takes 5 seconds for the data to be processed, a little simple math will tell you how much processing bandwidth you need to create to keep up with the flow of data. But how are you to properly size things when variability is inserted into the process? What if the time between data packets can vary between 100 msec and several minutes? Or what if the data processing time can change dramatically due to things like network traffic?

These are the kind of situations where the processing needs to be more than simply “flexible”, it has to be able to automatically maintain its own operation and reconfigure itself on the fly. To demonstrate one possible implementation of this “advanced” technique, we will build on the simple pieces that we have learned in the past. In some ways, good software design techniques are like Lego blocks. Each one by itself is not very impressive, but when you stick them together, magic happens. But before we can stick anything together, we need to understand…

…what we’re going to do.

You’ll notice that any of the bandwidth management challenges that I mentioned earlier can be addressed by either adding more data processing clones, or removing existing ones that are being underutilized. Consequently, the question of how to implement this self-maintenance functionality really gets down to a matter of how to dynamically manage the number of data processing clones that are currently available — which in turn boils down to answering two very simple questions:

1. How do we know we need more?

Given that the whole point of the exercise is to manage a queue, the current state of that queue will give us all the information that we need to answer this question. Specifically, we can know when more processing bandwidth is needed by monitoring how many items are currently in the queue waiting processing. When the code starts to see the depth going steadily up, it can launch additional processes to handle the data backlog. Of course, this functionality assumes that there is a process that is constantly monitoring the queue and managing that aspect of its operation — which we actually have already in the test code from last week (Data Processing Queue Handler.vi). All we have to do is repurpose this VI to be a permanent part of the final application.

2. How do we know a clone is no longer needed?

The one part of the system that knows whether a clone is being under-utilized is, in fact, the clone itself. As a part of its normal operation, it knows and can keep track of how often is it being used. Having said that, there are (at least) a couple of ways to quantify how much a clone is being utilized. We could, for instance, consider how much time the clone is spending processing data versus how much time it spends waiting to receive data to process. If the utilization percentage drops below a given limit, the clone could then shut itself down. However, for this demonstration, I’m going to use a much simpler criteria that, quite frankly, works pretty well. The code will simply keep track of how many times in a row it goes to the queue and doesn’t find any data.

Code Modifications

Before I start describing the changes that will we will need to make in order to fashion this new ability, I want to consider for a moment the thing that won’t have to change: Queue Test.vi. You might be tempted to say, “Well big deal. All it does is stuff some data into the queue every so often. Who cares if it doesn’t have to change? It’s not even deliverable code”

While that is undoubtedly true, the fact of the matter is that this test routine is important, but not because of what it is or what it does. Rather we care about Queue Test.vi because in our little test environment, it represents the rest of our application — or at least that part of it that is generating data. Consequently, the fact that it doesn’t need modification means that your main application, likewise, won’t need modification if you decide to upgrade from a data processing environment that uses a fixed number of data processors to one that dynamically manages itself.

Data Processing Queue Handler.vi

First, note that previously this routine’s primary job was to simply report how deep the data queue was — a bit of functionality that would likely have not been needed in a real application. Now however, this routine is going to be taking an active part in the process, so I started the modifications by adding the error reporting VI that will transfer errors it generates to the exception handler.

New Queue Handler Timeout Case

In addition, because the software will initially only start a single data processing clone, I also modified the timeout event handler that performs the VI’s initialization, by removing the loop around the clone launching subVI.

The remainder of the modifications to this routine occurs in the event handler for the Check Queue Size UDE. Previously this event only reported how deep the queue was getting. While it still performs that function, that queue depth information in addition now drives the logic that determines whether we have enough processing bandwidth online.

New Queue Handler Queue Size Check Case

Note that the queue depth is compared to a new configuration value called Max Queue Size that defines how large the queue can grow before a new data processing clone is launched. Regardless of whether it launches a new clone, event handler calls another new subVI that returns the number clones that are currently running. As you will see in a moment, one of the modifications to the data processing VI is the addition of logic that keeps track of the names of the clones that are running. The subVI that we are calling here returns a count of the number of names that have been recorded so far.

Data Processor.vi

Turning now the data processing code itself, the first stop is in the state-machine’s Initialize state. Here we have all the same logic that existed before, but with a couple minor additions

New Data Processor Initialize

First, there is a new subVI that registers a clone is starting up. This subVI writes the clone’s name to the FGV that is maintaining the clone count. Second, there is also a new shift register carrying a cluster of internal data that clone will need to do its work. All that is needed during initialization is to set a timestamp value. The Check for Data state is next and it has likewise seen some tweaks — the most significant of which is moving the logic for responding to the dequeue operation into a subVI.

New Data Processor Check for Data

The justification for this move lies in the fact that this logic is now also responsible for determining whether or not the clone is being adequately utilized. As I stated before, each time the clone goes to the queue and comes up empty, the logic will increment a counter that is being carried in the new shift register’s data. If this count exceeds a new configuration value Clone Idle Count, the code will branch to a new state that will shutdown the clone. Likewise, anytime the clone does get data to process, it will reset the count to 0. The changes to the Process Data state, which comes next, are pretty trivial.

New Data Processor Process Data

All that happens here is that the timestamp extracted from the data to be “analyzed” updates the new cluster data — as well as the indicator on the front panel. Finally, there is the new state: Self Shutdown

New Data Processor Self Shutdown

…which simply calls a subVI that removes the clone’s name from FGV maintaining a list of all running clones, and stops the event loop.

Let’s talk about “Race Conditions”

All we have left to do now is test this work and see the differences that it makes, but before we can do that, we need to have a short conversation about race conditions. Very often developers and instructors (myself included) will talk about the necessity of avoiding race conditions. The dirty little secret is that as long as you have multiple things happening in parallel, race conditions will always be present. The real point that these admonitions attempt to make is that you should avoid the race conditions that are unrecognized and potentially problematic.

I bring this point up because as you do the following testing, you may get the chance to see this concept in action. The way it will appear is that the system will launch a new clone immediately after one kills itself off for being underutilized. The reason for this apparent logical lapse is that a race condition exists between the part of the code that is checking to see if another clone is needed and the several places where the clones are deciding whether or not they are being used. There are two causes for this race condition, one we can ameliorate a bit and one over which we have no possible control.

Starting with the cause we can’t control, a simple immutable law of nature is that no matter how sophisticated our logic or algorithms might be, they can not see so much as a nanosecond into the future. Consequently, the first source of a race condition is that when the queue checking logic sees that there are three elements enqueued, it has no way of knowing that a currently active clone will be available in a few milliseconds. While it is true that under certain circumstances it might be possible to provide this logic with a bit of “foresight”, there is no generalized solution to this aspect of the problem. Consequently, this is an issue that we may just have to live with.

The news, however, is better for the second cause. Here the problem is that with all the clones having the same timeout between data checks, it is probable that sooner or later one of the clones is going to become “synchronized” with the others such that it is always checking the queue just after it was emptied by one of the others. However the solution to this problem lies in its very definition. The cure is to see to it that the clones do not have constant timeouts from one check to the next. To implement this concept in our test code I modified the routine that returns the delay to add a small random difference that changes each time it’s called.

The bottom line is that while completely removing all race conditions is not possible, they can be managed such that their impacts are minimized.

New Tests for New Code

The testing of the modified code starts the same as it did before: open and launch Data Processing Queue Handler.vi and Queue Test.vi. The first difference that you will notice is that only 1 clone is initially launched, but at the default data rate, 1 clone is more than adequate.

Now decrease the delay between data packets to 2 seconds. Here the queue depth will bounce around a bit but the clone count will stabilize at 3 or 4.

Finally, take the delay all the way down to 1 second. Initially the clone count may shoot up to 8 or 9, but on my system the clone count eventually settled down to 6 or 7.

At this point, you can begin increasing the delay again and the slowly the clones will start dropping out from disuse. Before you shutdown the test, however, you might want to set the delay back to 2 seconds and leave the code running while you go about whatever else you have to do today. It could be instructive to notice how other things you are doing on the same computer effect the queue operation. You might also want to rerun the test but start Queue Test.vi first and let it run for a minute or so before you startData Processing Queue Handler.vi — just to see what happens.

Further Enhancements?

So we have our basic scalable system completed, but are there things we could still do to improve its operation? Of course. For example, we know that due to timing issues which we can only partially control, the number of clones that is running at one time can vary a bit, even if the data rate is constant. One thing that could be done to improve efficiency would be to change the way clones are handled. For example, right now a data processor is either in memory and running or it is closed. One thing you could do is create a new state that a clone could be in — like loaded into memory, but inactive. You could implement this zombie state by setting the timeout to 0, thus effectively turning off the state machine.

It might also be helpful to change to queue depth limit at which a new clone is created by making it softer. Instead of launching a new clone anytime the queue depth exceeds 3, it might be useful in some situations to maintain a running average and only create a new clone if the average queue depth over the last N checks is greater than 3.

Who knows? Some of you might think of still other modifications and enhancements. The point is to experiment and see what works best for your specific application.

Parallel Data Processing — Release 2

The Big Tease

So what is in store for next time? Well in the past we have discussed how to dynamically launch and use VIs that run as separate processes. But what if the code you want to access dynamically like this happens to exist in a process that you have already compiled into a standalone application? If this application is working you don’t want to risk breaking something by modifying it. As it turns out there are ways to manage and reuse that code as well, even if it was created in an older version of LabVIEW. Next time we’ll start exploring how to do it.

Until Next Time…

Mike…

Expandable Data Processing

When creating an application a common approach to managing application computer bandwidth is to structurally isolate the portions of the application that are doing the data processing from those implementing the data acquisition. The goal of this segmentation is to prevent the operation of the one from impacting the speed of the other. But if we think about what we learned from the posts preceding this one, we can see that this use case is just a specialized case of a general principle. Namely, that allowing processes to run in parallel reduces the likelihood that their execution will impede the operation of their peers by improving the overall utilization of the computer’s resources. Consequently, it behooves us to consider how we can allow data processing to run in parallel with the rest of the application’s code — and so garner a few of the many advantages that this approach offers.

Parallel Data Processing

It first should be noted that we already have most of the tools and concepts that we will need to build this parallel data processing capability. In fact, one possible implementation is not so different from our general approach to building state machines. So with that idea in mind, let’s consider our perspective data processor’s high-level functional requirements.

Just the One?

First, since the idea is to improve efficiency by allowing more things to happen in parallel, we need to ask ourselves the obvious question, “Why stop at just one data processor?” This question is particularly urgent if you already know that your data processing takes longer than the time required to acquire another dataset. It is an appealing idea to have 2 or more data processors that can share the processing load. However, if you are going to be running multiple instances simultaneously, the process has to be reentrant. But that’s not a problem, we’ve done reentrant processes before and know how to make that work.

State-ly Expandability

Second, we need to consider the data processor’s internal structure. I have given this point away already by talking about state-machines, so let’s consider why this choice is a good one. At first glance it might seem like using a state-machine might be overkill for this application, after all you might figure that there is at most 2 states: Wait for Data and Process Data. This line of reasoning is valid, but it assumes that the data processing is a one-step process, so are there circumstances where this assumption isn’t valid?

If your processing is going to involve the complex analysis of potentially large datasets, you might not want to spend a lot of time processing flawed or invalid data. Hence before jumping right into the full-blown analysis it would be reasonable to have a state that does a quick sanity check on the data — and then another to handle the error that results if the data is bad. Next, consider what you are doing with the data when you are done with the analysis. There could be added limit checking of the results, and perhaps transfer of the results to a database or automated report generator — all tasks that could (and probably should) be expressed as separate states. Looking back now, by my count we are now up to 6 states — and we are still assuming that the analysis process itself is a single atomic operation that you will not want or need to breakup.

My point is that if you start with the basic outline of a state-machine and end up never having more than the two states, there’s no harm done because that basic structure is pretty lean. However if, as your project progresses, you discover that more states are needed, having that infrastructure already be in place can save a lot of time.

By the way, as a quick aside, I have often had the experience of having a customer request a significant change and then have them be amazed in how little time it takes to implement the modification. I have even had customers comment on how “lucky” I was that the code was structured in such a way that the change was even possible. Well let me tell you in the strongest terms possible that if luck was involved, it was luck of my own making. The better you design your code up front, the more “lucky” you will be when the time comes to make modifications.

The Right Kind of Communications

Third, we have to look at how we are going to communicate with these cloned state-machines. In past work on our testbed application, we have used UDEs and FGVs for communications between processes, and they might work here as well, but there would be problems. If you have one UDE or FGV feeding data to all the cloned data processors, you would have the problem of deciding which clone handles the each new update. While you don’t want to miss any updates, you certainly don’t want to process the same dataset twice either. The whole process would be fraught with opportunities for errors and race conditions. Of course you could solve that problem by creating a separate UDE or FGV for each clone, but then scalability goes down the tubes as the process(es) generating the data must keep track of how many clones there are any which one will get the next dataset.

Our way out of this conundrum is to properly apply a structure that people so often misuse: The Queue. This is the use case for which queues were born. The key feature that makes them so good in this sort of situation, is that if you have multiple receivers waiting to dequeue data from the same queue, LabVIEW guarantees that each new item inserted will only be seen by one of the receivers, and that enqueued data will be distributed to all the receivers on a first come, first served, basis. Moreover, because queues can be named, different processes running in the same instance of LabVIEW don’t need the queue reference to be sent to them. All they need to know is the name of the queue and they can get their own reference.

Yes, queues do still require polling, but at least there’s only one part of your code involved, and not the whole application. Moreover, there are two other mitigating factors present. First, in this sort of application the poll rate can often be much slower, on the order of once every few seconds. Second, because the state machine resides inside an event structure, it would take but a moment to implement logic that would allow the polling to be turned off altogether.

Let’s Look at the Test Code

As we start considering the code to implement this functionality, let’s first look at the test routines that we will use to evaluate the queue functionality. The VI Data Processing Queue Handler.vi has the responsibility of starting everything off by initializing the data processing queue and launching a predetermined number of data processing clones. This logic takes place in the VI’s Timeout event handler.

Queue Handler Timeout Case

You will notice that to simplify the process of obtaining a queue reference, I have encapsulated that logic in a set of subVIs for interacting with the queue. The other event of significance is the handler for the Check Queue Size UDE. While a test is running, we want to be able to monitor the number of items in the queue, but rather than simply polling the queue status, I created a UDE that flags the handler every time a new item is enqueued.

Queue Handler Queue Size Check Case

When the event fires, the handler calls the built-in queue function that returns, in addition to some other stuff that we don’t need, the number of items that are currently in the queue. Next, to feed data to the queue, I created a second test VI called Queue Test.vi. It’s whole job is to wait a delay period specified on the front panel, and then enqueue an item into the data processing queue.

Queue Test

You can see that at this point, the only value in the queue data is a timestamp, but the data is defined as a typedef. Remember! Any time you are creating a datatype that will be accessed by reference, whether it be a UDE, a queue, a notifier or an event, always make the data structure a typedef.

After the data value is enqueued, the code fires the event that tells the handler to check the queue size.

Introducing the Data Processor

Turning finally to the reentrant data processing state-machine (Data Processor.vi) we see that in addition to an event for stopping, the VI’s Timeout event handler includes the logic for three states, the first of which is Initialize.

Data Processor Initialize

This state’s job is to get the clone ready to start processing data. Consequently, it initializes the shift register holding the queue reference, and a second shift-register carrying a boolean value that we will discuss in a moment. The state also sets the next state to be executed to Check for Data, and retains a timeout value of 0 ms so, assuming that there are no errors during initialization, the state machine will immediately start waiting for data to process. Note that the Initialize state also opens the VI’s front panel. You probably would not want this feature in deliverable code except as, perhaps, a debugging option. I have included it here to make it easier for you to see the code at work.

Data Processor Check for Data

The Check for Data state starts by calling the queue subVI that is responsible for dequeuing an item. Inside this subVI, the dequeuing function is given a timeout of 0 ms so if there is not any data immediately available, the call will terminate with the timeout flag asserted. This Boolean value is inverted and passed out of the subVI to indicate to the calling code whether there is any data that needs processing. If the Check for Data state logic finds this bit set, the code sets the next state for execution to Process Data and sets the timeout value of 0 ms. If the bit indicating that data is available is not set, the code retains Check for Data as the next state to execute but sets the timeout to a longer value (5 sec) read from the application’s INI file.

Before we go on to talk about the Process Data state, we need to have a quick conversation about the boolean shift register. Normally, when the standard Stop Application event fires, a VI wants to immediately stop what it’s doing and abort. However in one significant way, this is not a “normal” VI. In order to protect the data that has been acquired, this process should only stop if the queue driving it is empty. To create that functionality, the VI incorporates deferred shutdown logic in the form of this shift-register. Because the value is initialized to false the loop will, during normal operation, continue regardless of whether data is available or not. However, when a shutdown is requested, the event handler does not immediately stop the loop, but instead sets the shift-register value to true and branches to the Check for Data state with a 0 ms timeout. If the queue is empty, the process will end at that time. However, if the queue is not empty, the VI will continue toggling between the Check for Data and Process Data states until the queue contents are exhausted.

Data Processor Process Data

As you would expect, the Process Data state basically consists of processing the last data dequeued and branching back to the Check for Data state to look for more. However, given that the only data in our test queue is a timestamp, you have probably guessed that the actual data processing to be done isn’t very expansive — and you’re right. In fact, the “data processing” consists largely of a wait, the duration of which varies at random between 4 and 6 seconds.

Putting it to the Test

To test this code, open and run Data Processing Queue Handler.vi. You will immediately see the front panels of three data processing clones open. Move them so they aren’t overlapping each other or anything else.

Now open and run Queue Test.vi. After a few seconds it will enqueue an item and then enqueue a new one every 6 seconds. Note that as each clone handles an item it will display that item’s timestamp on its front panel. Note also that the indicated queue depth never exceeds 1.

Now change the delay on Queue Test.vi from 6 sec to 2 sec. You will notice that the queue depth chart is now updating faster. Likewise, the queue depth will begin to show momentary increases to a depth of two, but the chart will always drop back to 1. In other words, there might be slight delays now and again, but for the most part three clones can keep up with the flow of data.

Finally, drop the delay to 1.5 seconds. With data coming at this rate, the queue depth will continue to go up and down, but now it is always going up more than it is going down. This overall upward trend shows us that we have reached the point where three clones are getting overwhelmed by amount of data that is being enqueued.

Queue Overrun

Now if you increase the delay back to 2 seconds, the queue depth will gradually begin to decrease as the slower flow of new data allows the clones to begin catching up on the backlog. Alternatively, if you just click the stop button, the two test VIs will stop and close immediately, but the clones will continue running until the queue empties out.

Parallel Data Processing — Release 1

The Big Tease

So we have learned the basics behind creating an environment for an application that supports an expandable data processing capability. For many applications this simple structure will be more than adequate, but (as we have seen) if the data starts coming too fast the queue can grow without limit. Of course this isn’t necessarily a problem if the periods of high data generation are interspersed with periods of comparative idleness. However this sort of variability can be a bit of a two-edged sword. The periods of low data throughput can give the system time to recover from a large backlog of data, but this variability can also make it difficult to estimate how many copies of the data processing process will be needed. Pick a number that is too high, and you’re wasting computer resources. Pick a number that is too low and you could still end up with a situation where too much data is being queued — perhaps to the point of running out of memory.

Well, the next time we get together we’ll look at how to modify the basic structure we have created thus far to add the ability for the software to decide on the fly when more clones are needed, and when to kill off existing ones that aren’t being used.

Until Next Time…

Mike…

Pay Attention to Units

The title for this post was a common quote from a favorite teacher of mine in high school. Mr Holt taught chemistry and physics and it seemed like I spent most of my life in his classroom — and I guess I did because in four years of high school I took 3 years each of chemistry and physics. Being a small school in a small Missouri town with a small budget we spent a lot of time (especially in physics) working on what would today be called, “mathematical modelling”. Back then, we just called it, “figuring out what would likely happen if we did something that we can’t really try because there isn’t enough money available to actually run the experiment”.

It was in those classes that I began to develop an understanding of, and appreciation for, all the pieces of contextual information that surrounds numbers. For example, if you have the number 4 floating around in a computer the most basic thing you can define about that number is what that 4 is counting or quantifying. Is it the mass of something (4 pounds)? Is it the cost of something in England (also, interestingly, 4 pounds)? Or is it a measure of how much your college roommate could drink before passing out (4 beers)?

In my work today, where I sometimes still have budgetary constraints, the need to remember that I have to, “…pay attention to the units…” remains. The difference is that now I have a powerful ally in the form of the facility that LabVIEW provides for assigning units to floating point numbers.

LabVIEW Units Aren’t Smarter Than You Are

Before we get into the meat of the discussion, however, we need to deal.with a few concerns and consideration that developers — even some very senior developers — have about using units. While there are a few behaviors that are clearly bugs, there are many more that people think are bugs but really aren’t. We should, of course, not be surprised by this situation. Anything related to floating-point will exhibit similar issues. For example, there used to be claims that LabVIEW didn’t do math correctly because the results of some calculations were different from what you got from Excel. Such claims ignored the fact that at the time Excel did all math in single-precision, while LabVIEW has always used double-precision math.

Similarly, a lot of the upset over “bugs” in units handling comes from an incomplete understanding of the entire problem. The simple truth is that there are a lot of things we do unconsciously when handling values with units so when we try to automate those operations, things can get messy. For example, we realize that sometimes when we square a number with units we want to square the unit, so meters become meters2. But sometimes we just want to square the number. Does it after all really make sense to convert volts into volts2? In implementing the units functionality, NI had to deal with similar ambiguity in something approaching a systematic manner so they made some decisions and then documented those decisions. So we may disagree with those decisions, but disagreement is not the same thing as a bug.

Another complaint that you sometimes hear is that using units complicates simple math. People will say things like, “…all I want to do is increment a number and LabVIEW won’t let me.” Ok, let’s look at that problem. You say you want to add 1 to a value, but 1 what? Even assuming a given type of unit, like length, how is LabVIEW supposed to know what you want to do? Do you want to add 1 meter, 1 furlong, 1 light-year?

Finally, there can be issues caused by shortcuts we take in our everyday thinking and communications, that are (to put it kindly) imprecise. For example, there are two units that can be, and often are, referred to as “PSI”. However, one is a force and one is a pressure, so mathematically they are very different beasts. In the end, a good way of summarizing a lot of this discussion is that using explicit units requires us to be more precise and careful in our thinking — which is always a good thing any way.

How Units Work

The basic idea behind LabVIEW’s implementation of units is that it draws a sharp distinction between how a number is stored and how it is expressed. The first step in making this distinction is deciding what kind of unit the developer selected and then expressing the value they entered in the “base unit” for the type of unit being used. For example, lengths are converted meters, temperatures are converted to degrees Kelvin, and time is converted to seconds and it is this converted value that exists on the wire. Likewise, when this wire encounters an indicator, LabVIEW converts the value on the wire to the units specified in the indicator before display. It is this basic functionality that makes simple math on timestamps possible.

pi time

In this snippet, each data source (the constants) converts the number it contains into seconds using its associated time unit. For example, 3 hours is converted into 10,800 seconds (60 x 60 x 3), 14 minutes is converted into 840 seconds (14 x 60) and the last input is left as is because it is already in seconds. Because all the numbers are now expressed the same unit, we can simply add them together and then add the result to the timestamp — which is also expressed in seconds. By the way, this is one of my favorite places to use units because it does away with so many magic numbers like 86,400 (the number of seconds in a day).

In terms of selecting the units to use, we have already seen one technique. On a control or constant you can show the units label and then enter or select the unit you want. To select the units, right-click on the units label and select the Build Unit String… option from the popup menu. The resulting dialog allows you to browse the standard units that LabVIEW supports and select or build the desired units.

However, if you have an existing number to which you want to apply a unit, you need to use a Convert Unit node. It looks a lot like the Expression Node but instead of typing simple math expressions into it, you enter unit strings.

unit conversion nodes

This snippet is one that I sometimes use when specifying timeouts. Note that a Convert Unit node is used to both apply are remove units.

When applying units, the unit string tells LabVIEW how to interpret the unitless number. It does not, as some suppose, specify the specific unit that value will have. For example, you may tell LabVIEW to interpret a particular number as feet, but internally the value will still be stored (like all lengths) in meters. When removing units, the node’s unit string tells LabVIEW how to express the internal value. In the above snippet, time is always carried as a floating point number of seconds, so the node driving the output causes LabVIEW to express the number of seconds as milliseconds.

Dealing with Temperatures

Another place I like to use units is with temperatures, but let’s be honest things can seem to get a bit strange here — or it can seem strange until you understand what is going one. The root of this apparent strangeness lies in the fact that there are absolute temperatures (like the room is 72° Fahrenheit) and there are relative temperatures (like the temperature went up 5° Fahrenheit in the last hour). Moreover, as we think about and manipulate temperatures we unconsciously, and automatically, keep track of the difference between the two. Unfortunately, when using a computer nothing happens unconsciously or automatically. Things happen because we tell them to.

The first thing NI had to decide when implementing this functionality was what base unit to use for temperature, and they picked Kelvin because it is metric and 0° K is the coldest something can get. Being metric is important because it means that the magnitude of change in 1° K is the same as for 1° C. By the way, did you notice what your brain just did? It went from thinking about absolute temperatures to thinking about relative temperatures — without thinking about it. This is what computers can’t do. Computers need some sort of indication or hint to tell them how to think about things, and in the context of temperature, this hint needs to provide a means for drawing a distinction between the units used for absolute temperature and units for relative temperatures. To see this issue in action, let’s revisit the previous time example but recast it to use temperature unit:

bad temperature calculation

Here we are taking a temperature value 20° C and incrementing it by 5° C. Unfortunately, when you run this example, the result is 298.15° C. “But,” you protest, “I thought that converting everything to the same units allowed you to perform any math you wanted with impunity. Where did this wild result come from. Surely this is a bug!”

Actually, it’s not. The only “problem” is that LabVIEW did exactly what the programmer told it to do, and not what they wanted it to do. LabVIEW added together two absolute temperature values and came up with an answer they didn’t expect. While its true that adding absolute temperatures probably doesn’t make any logical sense — how is LabVIEW supposed to know that? The real error is with the way the developer framed the problem and not with LabVIEW or its handling of units. What the developer obviously intended was to add a relative temperature value to the absolute temperature. To do that, you have to change the units of the increment input.

The way that LabVIEW identifies a relative temperature is to change the unit label from degC or degF, to Cdeg or Fdeg. By the way, now that you have the unit definitions correct, you can freely mix and match temperature scales. For example, you can add a 5° F increment to a 23° C room temperature and LabVIEW will happily give you the right answer (25.7778° C).

good temperature calculation

More Calculations

Other math operations can sometimes be confusing as well. For example, we have already discussed why you can’t simply increment or decrement numbers with units, but multiply and divide operation do not have the same restriction. You can multiply or divide a number with units by one without units with no problems. This apparent inconsistency makes sense when you remember that these two math operations used in this way simply scale the value that is there, regardless of how it is stored internally. Also NI has decides that many operations only operate on the numeric value. For example, if you square 5 meters using the square function, you get 25 meters. If you want to operate on the unit, you have to use the multiply function which will generate any unit squared.

But moving beyond these issues, there are many operations that using units will simplify, and even prevent you from making some kinds of logical errors. For instance, dividing a distance by a time will automatically generate a velocity unit (meters/sec by default). In the same way dividing a velocity by time will calculate an acceleration. And these conversions work the other way too: multiply a velocity by time and you get a distance (by default, meters).

The only negative here is that some of the unit labels can look a bit odd at first glance — though they are mathematically correct. For example, meters per second is shown as s^-1 m. However, despite the strangeness of this representation, you can alter the unit displayed by changing either of the base units shown. If you wanted feet per second, the unit string would be s^-1 ft, or miles per hour would be h^-1 mi. By the way, if you want to hide these potentially confusing representations from your users, the units string is a writable property so you could create a popup menu on a control that shows prettier labels.

unit selector

Note that he enumeration making the selection from the array has three values: Miles per Hour, Kilometers per Hour and Meters per Second.

Carrying on further, the units understands Ohm’s Law. Divide a voltage by a resistance and you get amps. Multiplying ohms and amps returns volts, and the calculations even know that the reciprocal of a frequency is time. All and all, this is some very cool stuff here that can significantly improve your code — and perhaps prevent you from making some silly mistakes. But as with all things new, work into it slowly, and thoroughly double-check the results when you try something different.

The Big Tease

There seems to be rumor running around that I don’t like queues. Well that is not true. They have a couple features that make them uniquely qualified for certain kinds of operations. My next couple posts will demonstrate a really good place use a queue — a data processing engine for manipulating data that will dynamically resize itself on the fly to provide more or less processing bandwidth.

Until Next Time…

Mike…

Objectifying LabVIEW

I suppose a good place to start this post is with an admission that, in a sense, it is flying a false flag. One way that you could reasonably interpret the title is that in this post I am going to be showing you how to start using objects in LabVIEW. That interpretation is not correct, and the troublesome word is “start”. The fact of the matter is that you can’t use LabVIEW without interacting with objects and many parts of it (think: VI Server) are overtly object-oriented — even without an obvious class structure. The language is built on an objects oriented foundation and so, in a very real way, has been object-oriented since Version 1.

What I am going to be showing you is how to simplify your work by building your own classes. As I stated in the teaser last time, the starting point for this discussion is the recommendation given in NI’s object-oriented training class that you should make your first attempts at using explicit object-oriented technique small, easy to manage subsystems — or put more simply, we need to start with baby steps.

Object-oriented baby steps

OK, so this is the point in the presentation where most presenters hauls out some standard theory, and moth-eaten descriptions of objects and classes — often lifted wholesale from a book on C++ programming. The problem with this approach is of course that we aren’t C++ programmers and the amount of useful information we can draw from an implementation of objects oriented programming that is so fundamentally flawed is minimal at best. The approach I intend to take instead focuses on key aspects of the technique that are of immediate, practical importance to someone who is working in LabVIEW and wants to take advantage of explicitly implementing object-oriented class structures.

A Quick Glossary

The first thing we need is a vocabulary that will let us talk about the topic at hand.

OOP Clouds

Now be forewarned that some of these definitions may not exactly match what you may read elsewhere, but they are correct for the LabVIEW development environment.

  • Class — An abstract datatype.
    If you think that sounds a lot like the definition of a cluster, you’re right! Due to the way LabVIEW implements object orientation, a class is essentially a very fancy cluster. In fact, when you create a class the first item that LabVIEW inserts into it is a typedef consisting of an empty cluster. Although you don’t have to put anything into the cluster, it provides a place to put data that is private to that class.
  • Object — An instance of a class.
    As with a normal cluster, every instance of a class has its own memory space. Consequently, a class wire is in most ways the same as any other wire in LabVIEW. We are still working in a dataflow environment.
  • Property — A piece of data that tells you something about the object.
    This is why there is a cluster at the heart of the class. You want to put in that cluster information that will describe the object is a way that is meaningful to you application. Because each instance of the class is a separate wire that has its own memory space, the data contained in the cluster describes that particular object.
  • Method — A VI that is associated with a particular class and which does something useful.
    So what do I mean by, “…something useful…”? Well that all depends on the class’ purpose. A the class that is responsible for creating a visual interface might have a method that causes an object to draw itself. While a class that manages the interface to data storage would likely have a method to store or retrieve application data.

From this simple list of words we can begin to see the general shape of the arena in which we will be playing. To recap: A class is a kind of wire. An object is a particular wire. A property is data carried in the wire that describes it in a useful way, and methods use the object data to do something you need done.

Dynamic Dispatch

Now that we have a basic vocabulary in place that lets us talk about this stuff, there are a couple of concepts that we need to discuss. I want to start with this exploration is with the mechanism that LabVIEW uses to call methods. Referred to as dynamic dispatch this feature it is often a source of confusion to developers getting started with object-oriented programming. A good way to come to grips with dynamic dispatch is to compare and contrast it to a feature of LabVIEW with which you may already be familiar: polymorphism.

Polymorphism (from the perspective of the developer using a polymorphic subVI) is the ability of a single functions to adapt to whatever datatype is wired to its inputs. For example, the low-level Add node in LabVIEW is polymorphic. Consequently, it can add scalar numeric if all types, as well as arrays of numerics of varied dimensions, clusters of numerics and even arrays of clusters of numerics.

Of course, from the perspective of the developer creating a polymorphic VI the view is much different. This flexibility doesn’t happen on its own. Rather, you have to create all the individual instance VIs that handle the various datatypes. For example, I often want to know if a value at a specific point in the code has changed from the last time this bit of code executed. So I created a polymorphic VI that performs this function. To create this subVI, I had to write variations of the same basic logic for about a half-dozen or so basic datatypes, as well as a version that used the variant datatype to catch everything else.

Dynamic dispatch (which is actually a form of polymorphism) works much the same way, but with a couple significant differences.

  • When the decision is made as to which instance VI is to be executed
    With conventional polymorphism, the decision of which instance VI to call happens as you wire in the subVI. In the case of my polymorphic subVI, as soon as I wire a U32 to the input, LabVIEW automatically selects the U32 version of the code. However, with dynamic dispatch, that decision gets put off until runtime with LabVIEW making the decision based on the datatype present on the wire as the subVI is called. Of course for that to work, you need a different kind of wire. Which brings us to the other point…

  • The criteria for choosing between VIs
    The wires that conventional polymorphism uses to select a VI all have one thing in common — they are all static datatypes. By that I mean that a wire is a U32, or a string or whatever and it can’t change on the fly. By contrast, with dynamic dispatch, the basis for selection is a wire that is an instance of a class, and the datatype of an object can be dynamic. However this variability is not infinite. A given class wire can’t hold just any object because class structure is also hierarchical.

Say you have a class named Geometric Shapes to Draw. You can define other classes (called subclasses) like Circle or Square that are interpreted by LabVIEW as being more specific instances of Geometric Shapes to Draw objects. Due to this hierarchical relationship, a given wire can be typed as a Geometric Shapes to Draw but at runtime really be carrying a Circle or Square. As a result, a dynamic dispatch VI can call different instance VIs based on the datatype at runtime.

However, one big thing that conventional polymorphism does have in common with dynamic dispatch, is that the power doesn’t come for free. You still have to write the method VIs for dynamic dispatch to call.

Inheritance

Remember a moment ago I referred to class datatypes as being hierarchical? The fancy computer science concept governing the use of hierarchical class structures is called inheritance. The point of this label is to drive home the idea that not only are subclasses logically related to the classes above them in the hierarchy, but these so-called child classes also have access to the properties and methods contained in their parent classes. In other words they can “inherit” or use data and capabilities that belong to their parents.

Handled properly, inheritance can significantly reduced the amount of code that you have to write. Handled poorly, inheritance can turn an otherwise promising project into a veritable train wreck. Which brings up our last point…

Proper Organization

Although organization isn’t really a feature of object-oriented programming, it is never the less critical. The simple fact of the matter is that while a disorganized, undisciplined developer might be able to get by when working in conventional LabVIEW, introducing the explicit use of classes can result in utter chaos. Of the real object-oriented failures that I have seen over the years, they all shared a lack of, or inconsistent, organization.

So what sort of organizational things am I talking about? Well it’s a lot of the same stuff that we have talked about before. For a more general discussion of the topic you can check out a post that I wrote very early on titled, Conventional Wisdom. What I want to do right now is highlight some of the points that are particularly important for object-oriented work.

The two main conventions (directory structure and file naming) go together because the point of one is to mirror the other. But rather than simply list some rules, I’ll demonstrate how this works. To start, I will create a directory that is named for the class hierarchy that I will build inside it. So if the point of this class hierarchy is, for example, to update my program’s user interface, I would call the directory something obvious like GUI Update. Inside this directory I would then create the top-level class with the file name GUI Update.lvclass. At this time I will also create a couple subdirectories (_subVIs and _typedefs) that I know I will undoubtedly be needing. Finally, I have learned over the years that being able to tightly control access to VIs is very important, so I will also create at this time a project library named GUI Update.lvlib and put into it the top-level class and a virtual folder called _subclasses with its access scope set to Private.

So the parent class is set up, but what about the subclasses? I simply repeat the pattern. Let’s say the GUI Update class has subclasses for three types of controls that it will need to update: Boolean, Digital and Cluster. I create subdirectories in the parent directory that are named for the subclass that will go into each, and hierarchically name the three subclasses GUI Update_Boolean.lvclass, GUI Update_Digital.lvclass, and GUI Update_Cluster.lvclass. I am also careful to remember to add the subclass files to the _subclasses virtual folder in the library, edit their icon overlays, and set their inheritance correctly — which is to say, identify their parents. Note that while the hierarchical naming structure doesn’t automatically establish correct inheritance, this convention does make it easier to visualize class relationships in the project file.

And so I go building each layer in my class hierarchy. With each new subclass I continue the same pattern so if I eventually want to find, say a subVI associated with the class GUI Update_Digital_Unsigned Word.lvclass, I know I will find it in the directory ../GUI Update/Digital/Unsigned Word/_subVIs.

Having a pattern to which you stick relentlessly — even one as simple as this one — will save you immeasurable amounts of time.

Creating the Blueprint

The next thing I do when creating a class hierarchy (but the last thing I want to talk about right now) is how the rest of the application will interface with my new GUI Update class. This is where the access scope we have been so careful to create comes into play. In the top-level class I always create a group of VIs that have their access scope set to public. These interface VIs form the totality of the external interface to the class hierarchy and so include the functions that define what the application as a whole needs GUI Update to do for it. The logical implications of this interface layer is why I sometimes call this step in the process, “Creating the Blueprint”.

In addition to providing a very clean interface, another advantage of having this “blueprint” is that if you ever need to expand your stable of subclasses, these interface VIs will serve as a list of functions you need to support in the new subclass — or at least a list of functions that you should consider implementing in the new subclass. To see what I mean, consider that the scenario we have been discussing is actually drawn from an application I created once. The list of public interface VIs was really very short: There was a method that read a value from a remote device and wrote it to the GUI object, one that looked for control value changes to write them to the remote device, and one that allowed the calling application to set control specific properties.

Of these, all GUI objects had to implement the first one because even the controls needed to be updated once a second. The reason for this constraint was that the remote device could also be reconfigured from a local interface and the LabVIEW application needed to keep itself up to date. However, the second interface method was only applicable to controls. Finally, the third interface method was implemented very rarely for the few subclasses that needed it.

What’s up next?

We have just about run out of space for this installment, but you may have noticed that something is missing from this post: Any actual LabVIEW code. Next time we will correct that sad situation by considering how to apply these principles to the creation of a class hierarchy that provides a common mechanism for storing and retrieving program data and setup parameters that works the same (from the application’s perspective at least) regardless of whether the program is interfacing with a database or text files.

Until Next Time…

Mike…

Talking to a Database from LabVIEW

Ok, this is the third part of our discussion of how to effectively utilize databases in a LabVIEW-based application. To recap, In Part 1 we have covered the basic justifications for utilizing databases, and checked-out some drivers that implement basic database connectivity. Part 2 expanded on that information by going into the (very basic) basics of how to design and build a database. I also provided some links for further reading, and a LabVIEW tool for executing an SQL script to turn it into an actual database. What you are looking at now is Part 3 and it will present code that implements the LabVIEW side of the data management capability we discussed earlier. If you haven’t read the other two sections, I would recommend that you do so before continuing — don’t worry, we’ll wait for you.

Those of you who did read the previous portions, feel free to talk among yourselves until the others get back.

Everybody back? Good! Let’s continue.

Example Code

The database we have implemented so far covers three basic areas in the Testbed application. Something that I didn’t mention before is that these areas were not picked at random, or arbitrarily. Rather, if you look at them you see that each one presents an example of one kind of database operation while presenting useful concepts that you will want to know about in the future.

  • Processes to Launch: This section demonstrates data that has an inherent structure as embodied in its one foreign key relationship.
  • Event Recording: Here we considered a table to which applications will eventually write. It also shows a little more structure in that it relates to two different header tables: one that identifies the application generating the event and one that identifies the type of event that is being recorded.
  • Default Sample Period: Although much of the data in a system will be structured, there is still a place for simple setting such as you might store in an INI file. This last example showed such a situation.

Carrying forward with this idea of demonstrating a variety of concepts, as we go through the code that implements the LabVIEW side of the connection, I will point out a few different techniques that you will find useful. The thing to remember is that there are engineering decisions that you have to make and no one technique or approach will serve in every possible situation.

Reading Processes to Launch

The new VI that performs this operation is still named Configuration Management.lvlib:Get Processes to Launch.vi and has the same basic structure as the INI file version (which I renamed), but the configuration file IO logic is replaced with the database IO drivers I presented earlier.

Read Processes to Launch

Although the structure is basically the same as before, there are a few changes due to the improved structure that the database provides. First, there is a new input (Launch Condition) that is tied internally to a database search term. Second, the output data structure is modified to utilize the enumeration for the launch mode, replacing the boolean value used before.

In terms of the query code itself, the large SQL statement in the string constant is for the most part pretty standard code. The statement specifies what columns we want (label, item_path, launch_mode), the table they are in (launch_item) and the WHERE clause provides the search terms that define the output dataset we want. Likewise, note that although I never read the launch_order value, I use it to sort the order of the results. If you have data that needs to be in a specific order this is an important point. Unless you explicitly tell the DBMS how to order the results, the sequence of records is totally undefined. The only real complication is contained in the WHERE clause.

You will recall from our discussion of normalization that the two primary search terms we will be using are stored as ID numbers that reference a pair of header tables. These header tables contain the human-readable labels that we want to use for our searches. This code demonstrates one approach to resolving the ID references through the use of subqueries. A subquery is (as its name suggests) a small query that occurs inside the main query. They are typically employed to lookup data that the calling application doesn’t directly know. In this example, we know that the application name and launch condition, but we don’t know what ID numbers are associated with those inputs. The subqueries look up those values for us so the main query can use them to get the data we want.

The advantage of subqueries is that they allow you to specify what you want in the terms that are meaningful to the calling application. The disadvantage is that they can complicate SQL code and the resulting code can be more DBMS-specific. In addition, with some DBMS (like for example, Jet) there can be a significant performance hit involved in subqueries. You’ll note that the example code mitigates this potential issue by buffering the search results, thus ensuring that the “hit” will only occur once.

Saving Errors to the Database

This operation is performed by a new version of Startup Processes.lvlib:Store Error.vi. As with the original version, the input error cluster provides the information for the error record, and the output error cluster is the result of the record insert.

Store Error to Database

This code shows an alternative to using subqueries to obtain the ID references. The two subVIs look-up the required IDs in the database the first time they are called and buffers the results for later reuse. The two routines are very similar, so here is what the application ID VI looks like:

Get Appl_ID

The advantage of this approach is that the required SQL is much simpler and very standard. The only real “disadvantage” is that you have to create these buffers — which really isn’t very much of a disadvantage. I really like this technique for situations where there are common IDs that are used over and over again. Unfortunately, this sort of modularization isn’t always possible, in most real-world applications you will need to know both techniques.

Read Default Sample Period

You will recall that this is new functionality so there is no “old code” to compare it to. Here is one option for what Configuration Management.lvlib:Get Default Sample Period.vi could look like.

Read Default Sample Period

The code here is very similar to that for reading the processes to launch, the main difference being that two of the search terms are in essence hardcoded. Current, there is only one parameter being stored in this table, but that situation could change at any time. While replicating this VI for each additional parameter would work, the result could be a lot of duplicated code. Consequently, I typically prefer to start with a lower-level VI that is something like this…

Read Misc Setup Values

…and then make the VIs like Configuration Management.lvlib:Get Default Sample Period.vi simple wrappers for the buffer.

Read Default Sample Period - Improved

Hopefully by now you are beginning to see a bit of a pattern here. Every type of access doesn’t require a unique technique. In fact, there are really very few unique use cases — and frankly we just covered most of them. The real key is to get comfortable with these basic methods so you can customize them to address your specific requirements.

Moving On…

So we now have our database, and the VIs for accessing it. How do we roll this into out application? One way is to simply replace the old INI file versions of the VIs with the new database-oriented ones — which is what I have done for now. I say. “…for now…” because there is a better solution, but that will have to wait for another day. Right now, the solution we have is serviceable so I want to get back to a couple more important topics. Here’s the updated application:

http://svn.notatamelion.com/blogProject/testbed application/Tags/Release 7

Until next time…

Mike…

Managing Data — the Easy Way

As I work on this blog there are times when I have to put certain topics on hold because the infrastructure doesn’t yet exist to allow me to effectively cover the topic. Well, this is one of those times. What I want to do is start getting into some topics like advanced user interfaces. However, these capabilities presume the existence of a data management capability that our testbed does not yet possess. To begin filling in that blank, the next few posts are going to start looking at techniques for utilizing databases as a data storage mechanism for LabVIEW-based applications.

But why databases? Wouldn’t it be easier to store all our configuration or test data in text files? When considering the storage of any sort of data the first thought for many developers is often to just, “…throw it in a text file…”. Therefore, this question needs to be the first thing that we discuss.

The Ubiquitous INI File

The INI file, as a method for storing configuration data, is a standard that has been around for a long time. Its main advantage is that it is a simple human-readable format that anyone with a text editor can manipulate. However, INI files also have a significant downside, and its number one problem is security. Simply put, INI files are inherently insecure because they use a simple human-readable format that anyone with a text editor can manipulate. Anybody who knows how to use Notepad can get in and play around with your program’s configuration, so unless you are very careful in terms of data validation, this openness can become an open door to program instability and failure.

A second issue is that while INI files can be subdivided into sections, and each section can contain key name and value pairs; beyond that simplistic formatting they are largely unstructured. This limitation shouldn’t be a surprise, after all the format was developed at a time when programs were very simple and had equally simple configuration requirements. However, with many modern applications one of the big challenges that you have to face when using INI files is that it is difficult, if not impossible, to store configuration data that exhibits internal relationships.

For example, say you have 2 or 3 basic types of widgets that you’re testing and each type of widget comes in 2 or 3 variants or models. In this scenario, some of the test parameters will be common for all widgets of a specific type while others are specific to a particular model. If you want to capture this data in a text file, you have two basic options. First you can put everything in one big table — and so run the risk of errors resulting from redundant data. Second, you can store it in a more normalized form and try to maintain the relationships manually — which is a pain in the neck, and just as error prone.

Text Data Files?

OK, so text files aren’t the best for storing configuration data. What about test data? Unfortunately, you have all the same problems — in spades — plus a few new ones: For example, consider Application Dependency.

Application dependency means that the data format or structure is specific to the program that created the file. Basically, the limitation is that outside programs have to know the structure of your file ahead of time or it won’t be able to effectively read your data. Depending upon the complexity of your application, the information required to read the data may expose more about the inner workings of your code than you really feel comfortable publishing.

Another problem is numeric precision. You can’t save numbers in a text file, just string representations of the numbers. Consequently, numeric precision is limited to whatever the program used when the file was created. So if you think that all you need when saving the data is 3 decimal places, and then find out later that you really need 4, you’re pretty much hosed since there is no way to recreate the precision that was thrown way when the data was saved.

Finally, data in a text file usually has no, or at least inadequate, context. Everybody worries about the accuracy of their test results, but context is just as important. Context is the information that tells you how to interpret the data you have before you. Context includes things like who ran a test, when it was run, and how the test was configured. Context also tells you things like what unit of measure to use in reading numeric data, or when the instruments were calibrated.

The Case for a Database

Due to all the foregoing issues, my clear preference is to use databases to both manage configuration data and store test results. However some LabVIEW developers refuse to consider databases in a misguided effort to avoid complication. Out of a dearth of real information they raise objections unencumbered by facts.

My position is that when you take into consideration the total problem of managing data, databases are actually the easiest solution. Of course that doesn’t mean that there won’t be things for you to learn. The truth is that there will be things to learn regardless of the way you approach data management. My point is that with databases there is less for you to learn due to the outside resources that you can leverage along the way. For one simple (but huge) example, consider that you could figure out on your own the correct ways to archive and backup all your test data — or you could put the data into a database and let the corporate IT folks, who do this sort of thing for a living, handle your data as well.

So let’s get started with a few basic concepts.

What is a DBMS?

One common point of confusion is the term DBMS, which stands for DataBase Management System. A DBMS is software that provides a standardized interface for creating, maintaining and using databases. In a sense, the relationship between a DBMS and a database is exactly the same as the relationship between a word-processor, like Microsoft Word, and a letter home to your Mom. Just as a word-processor is a program for creating textual documents, so a DBMS can be seen as a program for creating databases. One big difference though, is that while word-processors are programs that you have to explicitly start before you can use them, a DBMS will typically run in the background as a Windows service. For example, if you are reading this post from a Windows-based computer, there is at least one DBMS (called Jet) running on your computer right now.

What is a database?

A database is a structured collection of data. In that definition, one word is particularly important: “structured”. One of the central insights that lead to the development of databases was that data has structure. Early on, people recognized that some pieces of information are logically related to other pieces and that these relationships are just as important as the data itself. The relationships in the data are how we store the data context we discussed earlier.

By the way, as an aside, people who talk about an “Access Database” are wrong on two counts since Access is neither a database or a DBMS. Rather is it an application development environment for creating applications that access databases. By default, Access utilizes the Jet DBMS that is built into Widows, but it can access most others as well.

How do you communicate with a database?

It wasn’t that long ago that communicating with a database from LabVIEW was painful. The first product I ever sold was an add-on for the early Windows version of LabVIEW that was built around a code interface node that I had to write using the Watcom C compiler. Actually, looking back on it, “painful” is an understatement…

In any case, things are now dramatically different. On the database side the creation of standards such as ODBC and later ADO (also called OLE DB) provided standardized cross-vendor interfaces. For their part, National Instruments began providing ways of accessing those interfaces from within LabVIEW (starting with Version 5). Today accessing databases through ActiveX or .net interfaces is a breeze. To demonstrate this point, I’ll present a package of drivers that I have developed and posted on the user forum several years ago.

Getting Connected…

The VIs we will now discuss are the core routines that you will need to interact with any DBMS that supports an ADO interface — which is basically all of them. The only common DBMS that doesn’t support ADO is SQLite. Instead, it has its own DLL that you have to access directly. Still, if you want a very lightweight database engine that will run on nearly anything (including some real-time hosts) it is a good choice and there are driver packages available through the forum.

Getting back to our standard interface, the following six routines provide all the support most applications will ever need. One thing to notice is that with the exception of one subVI that requires added code to work around a bug in certain version of SQL Server, most of the code is very simple, which is as it should be.

To start that exploration, we’ll look at the routine that is the logical starting place for any database interaction — and incidentally is the only routine that cares what database you are using.

ADO Database Drivers.lvlib:Acquire Connection.vi

The start of any interaction with a database, naturally involves establishing a connection to the DBMS that is managing it, and then identifying which specific database you want to use. This VI calls the ADO method that performs that operation.

Open Connection.vi

You’ll notice that the ActiveX method only has one required input: the Connection String. This string is a semicolon-delimited list of input parameters. Although the exact parameters that are required depends on the DBMS you are accessing, there is one required parameter, Provider. It tells the ADO functionality what driver to use to access the DBMS. This is what a typical connection string would look like for connecting to a so-called Access database.

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\mydatabase.mdb;User Id=;Password=;

Note that in this case the Data Source consists of a path to a specific file. For another DBMS this same parameter might point to an IP address, a logical name the driver defines or the name of a Windows service. But what if you don’t know what connection string to use? Well you can ask the DBMS vendor — or you can check: www.ConnectionStrings.com. Yes, there is an entire website dedicated to listing connection strings, and it covers nearly every ADO-compatible DBMS on the planet.

ADO Database Drivers.lvlib:Execute SQL Command.vi

OK, you are connected to your DBMS and you have a reference to your database. The next question is what do you want to do now? One answer to that question would be to send a command to the DBMS telling it to do something like create a new table in which you can store data, or add a new record to an existing table. This VI meets that need by sending to the DBMS command strings consisting of statements written in a language called SQL.

Execute SQL Command.vi

The core of this routine is the connection Execute method. Because this method could be used to execute a command that returns data, the 0x80 input optimizes operation by telling the method to not expect or handle any returned data. The method, instead, returns a count of the number of rows that the last command impacted.

ADO Database Drivers.lvlib:Create and Read Recordset.vi

While sending commands is important, the most common thing to do with a database connection is to use it to read data from the database. To optimize operation when you have multiple users concurrently accessing the same database, ADO creates something called a recordset. A recordset is a memory-resident copy of the requested data that you can access as needed without requiring further interaction with the database itself.

Create and Read Recordset

The three subVIs shown create a recordset using a source string consisting of an SQL query, reads the recordset’s contents, and then closes the recordset to free its memory. For details of how these subVIs work, checkout the comments on their block diagrams.

ADO Database Drivers.lvlib:Release Connection.vi

Once you are finished using a connection, you need to tell ADO that you are done with it.

Close Connection.vi

Note that although the method called is named Close, it doesn’t really close anything. Thanks to built-in functionality called connection pooling the connection isn’t really closed, rather Window just marks the connection as not being used and puts it into a pool of available connections. Consequently, the next time there is a request for a connection to the same database, Windows doesn’t have to go to the trouble of opening a new connection, it can just pull from the pool a reference to a connection that isn’t currently in use.

ADO Database Drivers.lvlib:Start Transaction.vi

A topic that database folks justifiably spend a lot of time talking about is data integrity. One way that a DBMS supports data integrity is by ensuring that all operations are “atomic”, i.e. indivisible. In simple terms, this means that for a given operation either all the changes to the database are successful, or none of them are. This constraint is easy to enforce when inserting or modifying a single record, but what about the (common) case where a single logical update entails adding or modifying several individual records?

To handle this situation, DBMS allows you to define a transaction that turns several database operations into a single atomic operation. This VI marks the start of a transaction.

Start Transaction.vi

To mark the other end of the transaction, you have to perform either a Commit (make all the changes permanent) or a Rollback (undo all the changes since the transaction’s start).

ADO Database Drivers.lvlib:Rollback Transaction on Error.vi

This VI combines the Commit and Rollback operations into a single routine. But how does it know which to do? Simple, it looks at the error cluster. If there is no error it commits the transaction…

Commit Transaction on no Error

…but if there is an error it will rollback the transaction.

Rollback Transaction on Error

What’s Next

Clearly the foregoing six VIs are not a complete implementation of all that is useful in the ADO interface — for instance, they totally ignore “BLOBs”. However I have found that they address 95% of what I do on a daily basis, and here is a link to the new version of the toolbox that includes the drivers.

Toolbox Release 3

But let’s be honest, having an appropriate set of drivers is in some ways the easy part of the job. You still need a database to access and you need to know at least a bit about the SQL language. This is where what I said earlier I said about leveraging the time and talents of other people come into play. Your corporate or institutional IT will often assist you in setting up a database — many times, in fact, they will insist on doing it for you. Likewise, with an interface that is based on command strings written in SQL, you will find a wealth of talent willing to lend a hand, perhaps in-house but certainly online. Still, having said all that, it’s good for you to understand at least a bit about how these other aspects operate. Therefore, next time we’ll create a small local database (using Jet) and start implementing the data management for our testbed application.

Until next time…

Mike…