A Tree Grows in Brooklyn LabVIEW

This time out, I want to start exploring a user interface device that in my opinion is dramatically under-utilized. I am talking about the so-called tree control. This structure solves a number of interface challenges that might otherwise be intractable. For example, the preferred approach to displaying large amounts of data is to avoid generating large tabular blocks of data, opting instead to display these datasets on graphs. However, there can be situations where those large tabular blocks of data are exactly what the customer wants. What a tree control can do is display this data using a hierarchical structure that makes it easier for the user to find and read the specific data they needs. A good example of this sort of usage is Windows explorer. Can you imagine how long it would take to you find anything of all Windows provides was an alphabetical list of all the files on your multi-gigabyte hard drive?

Alternately, a tree control can provide a way of hierarchically organize interface options. For instance, to select the screen to display in the testbed application we have been building, the program currently uses a simple pop-up menu containing a list of the available screens. This technique works well is you have a limited number of screens, but does not scale well.

We will structure our evaluation of tree controls around two applications that demonstrate its usage both as a presentation device for large datasets and as a control interface. Starting that discussion, this week we will look how display a large amount of data (all the files on your PC). Then in the following post we will explore its usefulness for controlling the application itself by modifying the testbed application to incorporate it.

Our Current Goal

To demonstrate this control’s ability to organize and display a large amount of tabular data, we are going to consider an example that displays a hierarchical listing of the files on your computer starting with a directory that you specify. The resulting display will represent folders as expandable headings, and for files show their size and modification date.

I picked this application as an example because it provides the opportunity to discuss an interesting concept that I have been wanting to cover for some time (i.e. recursion). Moreover, on a practical level, this application makes it easy to generate a very large set of interesting data – though it isn’t very fast. But more on that in a bit. For now, let’s start by considering what it takes to make this tree grow. Then we can look at the application’s major components.

Becoming a LabVIEW Arborist

Although tree controls and menus occupy different functional niches, their APIs bear certain similarities. For example, they both draw a distinction between what the user sees and the “tags” that are used internally to identify specific items. Likewise, when creating a child item, both APIs use the parent’s tag to establish hierarchical relationships.

A big difference between the two is that a tree control can have multiple columns like a table. In fact, one way of understanding a tree control is as a table that lets you collapse multiple rows into a header row. So in designing for this thing, the first thing we need to do is decide what values are going to represent the “header rows”, and what values the “data rows”. For this, our first excursion into utilizing tree controls, the “header” rows will define the folders – so that is where we go first.

Showing the Folders

The code that we will use to add a new folder to the tree resides in a VI called Process New Directory Entry.slow.vi (the reasons for the “slow” appellation will be explained shortly).

Process New Directory Entry.slow

Because this logic resides in a subVI, the reference to tree control comes from a control on the VI’s front panel. Next, note that the way you get the row into the control is by using an invoke node that instantiates the Edit Tree Items:Add Item method. I point out this fact because it tells you something important: All the data we are going to be displaying in the control are properties of the control, not values. Consequently, they will be automatically saved as part of the control whenever you save the VI that contains the control.

Next, let’s consider the inputs to the method. The top-most item is Parent Tag. The assumption is that the method is defining a new child item, so this input defines the parent under which the new child will reside. Therefore, a Parent Tag that is a null string indicates an item with no parent (i.e. a top-level item). The next item down from the top is Child Position and its job is to tell the method where to insert the new child that it is creating. A value of -1, as is used here, tells the method to put the new child after any existing children of the identified parent. In other words, if this code is called multiple times, the children will appear in the control in the order in which they were created.

The next two input items (Left Cell String and Child Text) control what the user will see in the control. You will recall that I said that tree controls are sort of like hierarchical, collapsible tables. In that representation, the left-most cell shows the hierarchical organization through indentation. In addition, by default, every row that has other rows nested beneath it shows a small glyph indicating that the row can be expanded. The other cells in the row are like the additional columns in the table and can hold whatever data you want. When creating entries for directories, the left-most cell will contain the name of the directory, and the remainder of the row will be empty. To implement this functionality, the input path is stripped to remove the last item. This value is passed to the Left Cell String input. In addition, an empty string array is written to the Child Text input.

Next, the Child Tag input allows you to specify the value that you want to use to uniquely identify this row when creating children under it, or reading the value of control selections. Now the documentation says that if you don’t wire a string to this input, it will reuse the Left Cell String value as the tag, but you don’t want to depend on this feature. The problem is that tags have to be unique so to prevent duplication, LabVIEW automatically modifies these tags to insure that they are unique by appending a number at runtime. While it is true that the method returns a string containing the key that LabVIEW generated, not knowing ahead of time what the tag will be can complicate subsequent operations. To avoid this issue, I like to include logic that will guarantee that the tag value that I write to this input is unique. For this application, if the parent tag is a null string (indicating a top-level item), the code takes the entire path, converts it to a string and uses the result as the child tag. In the parent tag is not null, the code generates the tag by taking the parent tag value and appending to it a slash character and the string that is feeding the child’s Left Cell String input. If this reason for this logic escapes you, don’t worry about it – you’ll see why it’s important shortly.

Finally, the Child Only? input is a flag that, when false, allows other rows to be added hierarchically beneath it.

Showing the Files

With the code handled for creating entries associated with directories, now we need to implement the logic for creating the entries that represent the files inside those directories – which as you can see below, utilizes the same method as we saw earlier, but plays with the inputs in a slightly different ways.

Process New File Entry.slow

Named Process New File Entry.slow.VI, this VI is designed to use the additional columns to provide a little additional information about the file: to wit, its size and last modification date. Therefore, the first thing the code does is call a built-in function (File/Directory Info) that reads the desired information. However, this call raises the potential of an error being generated. When errors are possible you need to spend some time thinking about what you want to have happen when they occur. In this situation, there are three basic responses:

  1. Propagate the Error: With this approach, the error would simply be propagated on through the code and be reported like any other error. This action would ensure that the error would be reported, but would stop the processing of the interface.
  2. Don’t Include this File in the List: By trapping the error and preventing it from being passed on, we can use it to block the display of files for which we can’t retrieve the desired error. This technique would allow the interface processing to run to completion, by simply ignoring the error.
  3. Include the File but not its Data: A variation of Option 2, this approach would still block the error from being propagated. However, it would still create the file’s entry in the table but with dummy data like, “Not Available”, or simply “n/a” for the missing data.

So which of these options is the correct one? This is one of those situations where there is no universally correct answer. Sorting out which option is the correct one for your application is why, as a software engineering professional, you earn the “big bucks”. For the purpose of our demonstration, I picked Option 2.

Next, the New Directory Tag input is the tag that is associated with the folder in which this file resides. Finally, the Child Tag value is calculated by taking the Parent Tag value and appending to it a slash character and the name of the file stripped from the input path.

Pulling it all Together

So those are the two main pieces of code. All we have to do now is combine them into a single process that will process a starting directory to produce a hierarchical listing of its contents. The name of this VI is Process Directory.slow.vi, and this is what its code looks like:

Process Directory.slow

So you can see that the first thing is does is call the subVI we discussed for creating an entry in the tree control for the directory identified in the Starting Path input, using the tag value from the My Parent Tag input. The result is that the folder is added to the tree, and the subVI returns the tag for the new folder item. The next step is to process the folder’s contents so the code calls the built-in List Folder function to generate lists of the directory’s files and subdirectories.

The array of file names is passed into a loop that repeatedly calls the subVI we discussed earlier that creates entries in the tree control for individual files. The array of subdirectory names drives a loop that first verifies that the first character of the name is not a dollar sign (“$”). Although this check is not technically necessary, it serves to bypass various hidden system directories (like $Recycle Bin) which would generate errors anyway. Assuming that the subdirectory name passes the test, the code calls a subVI that we haven’t looked at before – or have we? If you open this subVI and go to its block diagram and you will see this:

Process Directory.slow

Look familiar? I have not simply duplicated the logic in Process Directory.slow.vi, rather I am using a technique called recursion to allow the VI to call itself. This idea might sound more than a little confusing, but if you think about it, the idea makes a lot of sense. Look at it this way, to correctly process these subdirectories, we need to do the exact same things as we are doing right now to process the parent directory, so why not use the exact same code?

The way it works is that Process Directory.slow.vi is configured in its VI Properties as a shared clone reentrant VI. To review, when LabVIEW runs code utilizing share clones, it creates a small pool of instances of the VIs code in memory. When the shared clone VI is actually called, LabVIEW goes to this pool and dynamically calls one of the share clones that isn’t currently being used. If the pool every “runs dry” LabVIEW automatically adds more clones to the pool. It is this behavior relative to shared clones that is key to the way LabVIEW implements recursion. In order to see how this recursion operates, let’s consider this very basic top-level VI:

Getting Processing Started.slow

The code first clears any contents that might already exist in the tree control and then makes the first call to Process Directory.slow.vi. When the runtime engine sees that call, it goes to the pool, gets a clone of the VI and starts executing it. An important point to remember is that even though all the clones in the pool were derived from the same VI, they are at this point separate entities. It is as though you manually created several copies of the same VI, except LabVIEW did the copying for you.

When running this first clone, LabVIEW will eventually get to the call that it makes to Process Directory.slow.vi. As before, the runtime engine will go to the pool, get a second clone of the VI and start it executing, and so it will go until execution gets to a directory that only has files in it. In that case, the cloned VI will not get called and that Nth-generation clone will finish its execution. At this point LabVIEW will release the clone back to the pool for future reuse, and return to executing the clone that called the one that just finished. This calling clone may have other subdirectories to process, or it may be done – in which case it will also finish its execution, LabVIEW will release it back to the pool, and continue executing the clone that called it. This process will continue until all the clones have finished their work.

Some Further Points

And that, dear readers, is how the process basically works, but there are a couple important things still to cover. We need to talk about memory consumption, performance and how to interact with this control in your program once you have it populated with data.

Memory Considerations

I mentioned earlier that the information that you enter into a tree control are actually properties of the control – not its data. I also stated that as a result of that fact, said information will automatically be saved as part of the control. As a demonstration of that fact, consider that the very basic top-level VI I just showed you consumes about 14 kbytes on disk. However, as a test I turned the process loose on my PC’s Program Files (x86) directory. After it had finished processing the 14,832 folders(!) and 122,533 files(!!) contained therein, I saved the VI again. At that point, the size of the VI on disk ballooned to 2.6 Mbytes.

The solution is to remember to always remember to delete all items from a tree control when the program using it stops. Although you obviously don’t have to worry about this sort of growth is a compiled application (a standalone application can’t save changes to itself), this convention will help to keep you from inadvertently saving extraneous information during development and artificially expanding the size of your application.

Performance Considerations

The test I did to catalog my PC’s Program Files (x86) directory also highlighted another issue: execution speed. To complete the requested processing took about an hour and a half. Doing the same processing, but minus the tree control operations, took less than a minute, so the vast majority or this time was clearly spent in updating the tree control. But what exactly was it that was taking so long? As it turns out, there are two sources of delay, the first of which is actually pretty easy to control.

The way the code is currently written, the tree control on the front panel updates its appearance after each addition – a problem by the way that is not unique to tree controls. The solution is to tell LabVIEW to stop updating the front panel for a while, and here is how to do it:

Getting Processing Started w-Defer Panel Updates

A VI’s front panel has a property called Defer Panel Updates when you set this property to true, LabVIEW records all changes to the VI’s front panel, but doesn’t actually update it to reflect those changes. When the property is later set to false, all pending changes are applied to the front panel at once. The additions shown reduces the time to process my entire Program Files (x86) directory by 66% to just 30 minutes – which is much better, but still not great.

To reduce our processing time further, we have to take more drastic measures – starting with a fundamental change in how we add entries for individual files. The issue is that the technique we are using to add entries is very convenient because we are explicitly identifying the parent under which each child is to be placed. Consequently, we have the flexibility to add entries in essentially any order. However, as the total number of entries grows larger we begin to pay a high price for this convenience and flexibility because, under the hood, the control’s logic has to incorporate the ability to insert entries at random into the middle of existing data.

The solution to this problem is to use a different method. This method is called Edit Tree Items:Add Multiple Items to End and as it names says it simply appends new items to the end of the current list of entries. Of course for this to work, it means that we have to take responsibility for a lot of stuff that LabVIEW was doing for us, like updating the control in order and maintaining the indentation to preserve the hierarchical structure. Thankfully, that work isn’t very hard. For instance, here is the code for creating the new directory entry:

Process New Directory Entry

The first thing you will notice is that the invoke node is gone. The method that we will be invoking sports a single input which is an array of clusters representing the tree entries that it will add. The purpose of the logic before us is to assemble the array element that will create the parent folder’s entry in the tree.

Next, note that the information needed to define the entry is slightly different. First, we don’t need to specify a tag for the parent because we are assuming that the node are going to be simply added to the display in the order that they occur in the array. However, that simplification raises a problem. How do you maintain the display’s hierarchical structure? The thing to remember is that the hierarchy is defined visually, but also logically, by the indentations in the entries. Therefore the entry definition incorporates a parameter that explicitly defines the number of level which the new entry should be indented. Due to the way that we have been building the tags, this value is very easy to calculate. All we have to do is count the number of delimiters (“\”) in the entry’s tag and then subtract the number of delimiters in the starting path. The first part of that calculation occurs in the subVI Calculate Indent Level.vi and the second part is facilitated by a new input parameter Indent Offset.

Making the same adaptations to the routine for adding a new file entry and you get this:

Process New File Entry

Nothing new to see here. The important part is how these two new VIs fit together and to see that we need to look at the recursive VI Process Directory.vi (I have zoomed in on just the part that has changed):

Process Directory

This logic’s core functionality is to build the array of entry definitions that the Edit Tree Items:Add Multiple Items to End method needs to do its work. The first element in this array is the entry for the directory itself, and the following elements define the entries for the files within the directory. Finally, we have a make a small change to the top-level VI as well:

Getting Processing Started

Specifically, we need to calculate the Indent Offset value based on the Starting Path input. But the important question, is does all this really help? With these optimizations in place the processing time for my PC’s Program Files (x86) directory drops to just a hair under 10 minutes. Of course while that improvement is impressive, it might still be too long, but the changes to reduce the processing time further are only necessary if dealing with very large datasets. Plus they really have nothing to do with the tree control itself – so they will have to wait for another time.

Event Handling

The last we have left out so far is what happens after the tree control is populated with data. Well, like most controls in LabVIEW, tree controls support a variety of events including ones that allow event structures to respond when the user selects or double-clicks an item in the control. But this point begs the question: What is the fundamental datatype of a tree control? By default, the datatype of a tree control is a string, and its value is the tag of the currently selected item. Alternatively, if no items are selected, the tree control’s value is a null, or empty string.

Top Level File Explorah

Because the control’s datatype is a string, you can programmatically clear any selection by writing a null string to a Value property node or a local variable associated with the control. However, note the words “By default…” like a few other controls (such as listboxes) tree controls can be configured to allow multiple items to be selected at once. In that case, the control’s datatype changes to an array of strings where each element is the tag of a selected item.

The other thing I wanted to point out through this example is the importance of carefully considering how to define tags for items. it may seem obvious but if you are taking the time to put data into this control, you are probably going to want to use it in the future. it behooves you therefore to tag it in such as way as to allow you to quickly identify and parse values. For example, in this example I put together the tags such that they mirror the data’s natural structure – its file path. By mimicking your data’s natural structure you make it easier to locate the specific information that you need.

File Explorer – Release 1
Toolbox – Release 12

The Big Tease

OK, that is enough for now. Next time we will return to our testbed application and look at using tree controls as a control element. With this use case the focus shifts from volume of data, to organization of the GUI to simplify operator interactions.

Until Next Time…
Mike…

3 thoughts on “A Tree Grows in Brooklyn LabVIEW

  1. Regarding the performance issues, it’s often best not to stick a large amount of data in your control, but rather manage the data behind the scene and only give the control limited data. For instance, not populating the entire tree, but just the top level. When the user opens a specific item, you populate the top level of that item.

    If you want to be more user friendly, you do some prefetching and populate two levels – the one the user is on and the one of all its children. That way, when the user opens a child, it’s already populated and you use the event to populate the children of that item.

    In the specific case of the file hierarchy, you can actually also do this for the file enumeration itself and thus probably drop the whole process from the 10 minutes Mike got it down to to less than the minute it took him to scan the hierarchy. In fact, I expect it would go down to no more than a few seconds and possibly only a second or so.

    One last alternative is asynchronous updating. In this situation, you do the scanning and updating of the tree in the background, as data comes in. This allows you to keep working as data comes in. The practicality of this approach is system dependent.

Leave a Reply