Building a Testbed Database

As we continue our exploration of using databases to store data in your applications, the next thing we need to do is consider what the basic structure off a database should look like and how to construct a simple one that would be suitable for our Testbed application. Considering what we have assembled so far, we can immediately see two existing areas where a database would be useful:

  • Lookup for processes to launch: Currently this information is being stored in the INI file.
  • Recording errors that occur: Currently errors are being logged to a test file that is stored in the same directory as the application.

In both cases, moving this data into a database will offer a number of benefits in terms of configurability and reliability. But in addition to these two existing opportunities, there is a third value to implement that is new.

  • The default sample period: In the existing code, this value is defined by the default value of a control on the front panel of the Display Data process. This technique works, and in many case is a perfectly fine solution, but there are also cases where your customers will want to be able to change a value without you needing to modify the code. So we’ll make that change too.

As we go through the following discussion, an important point to remember is that project development is — in addition to everything else — a process of ongoing refinement. As you work through a project you are always going to be learning more about what it is that you are trying to accomplish and you need to be willing to incorporate those “lessons-learned” in your code. Of course this point assumes that your code is modularized such that this sort of “mid-course correction” can be implemented without doing violence to what you already have in place.

In order to break this work up into bite-sized pieces, this post will address the structure of the database itself. The VIs needed to implement this configurability will be addressed in the next installment.

Why be Normal(ized)?

According to Wikipedia, normalization (in the database sense) is “…the process of organizing the fields and tables of a relational database to minimize redundancy.” Basically, this means that a given piece of information, like a person’s name, should only be stored in one place in the database. Normalization is important because it is one of the major ways that databases ensure consistency in the data they contain. For example, if an operator’s name is only stored in one place, you don’t have to worry after the fact if “Robert Porter” and “Robert Portor” are really the same person.

Beyond this basic explanation, there is a lot of detail that we don’t need to discuss right now. In fact, there are whole books written on the topic of normalization. For your further reading, I have included links to two very good ones are at the end on the post. Our goal right now is to simply cover the basics so as you can understand what a real database developer puts together for you. Moreover, if the need should ever arise for you to create a database for yourself, this information certainly won’t make you an expert, but it will help you avoid a few major pitfalls.

The first step in creating a database is called Data Modelling. Data modelling is a process of looking at your data in order to understand its inherent structure. Ideally, data modelling should be something that you do early in your design process, regardless of whether you are going to be using a database. If you think about it, this point is particularly true when doing LabVIEW development. After all, LabVIEW’s core paradigm is one of dataflow, so how can you expect to properly define an application’s dataflows if you don’t understand the data’s inherent structure? Naturally therefore, when starting work on this testbed, I did some basic data modelling. Let’s go over what I discovered.

Supporting Many Applications

When studying the data that an application needs to handle, a good place to start is with a list of the data items that the code will need to complete the intended tasks. Some of this information will be common to all the tasks the code will be supporting, and some will be unique. At first glance, it might not seem like there is any information that the various functions have in common, but if we think ahead a bit, we see that there is one. To this point, we have been talking about the program that we are creating, as if it is the only one we are ever going to build and install on a user’s computer. But is this really a reasonable assumption?

In a way, this is sort of like talking about a computer that will only have one program installed on it. In addition, we have talked before about the advantages of seeing an “application” as the aggregation of the independent behaviors of many different processes that operate more or less independent of one another. It seems to me unlikely that the assumption we have been making will continue to hold up in the future.

In the world governed by INI files, this reality is addressed by simply having a separate INI file for each program. Databases accomplish the same task by adding a field that identifies which application is associated with (or in database-speak, related to) each data record. But how should these relationships be recorded in the database? Well, we could just include a field in each table called something like application_name, but this solution has two problems. First, remembering what we said about normalization, this approach would produce a lot of redundant data and a lot of opportunities for things to be misspelled, or mislabeled. Second, since we would be using that value to qualify the queries we perform in order to extract just the data related to one particular application, those string fields will end up being search terms and indexing or searching strings is inherently very inefficient.

Both problems are solved by creating a separate table that uniquely identifies each application. Primarily, it would hold the application name, but could contain other information such as version number, who wrote it, and when it was modified. In this table, each application will have one record and each record has a unique number associated with it called the Primary Key. Any other table that needs to relate to its data to a specific application can simply include a field that holds this number — and numbers are very fast to search and index. Here is the SQL code needed to create this type of table, sometimes called a header table:

CREATE TABLE appl (
    id       AUTOINCREMENT PRIMARY KEY,
    label    TEXT(40) WITH COMPRESSION,
    ver_nbr  TEXT(10) WITH COMPRESSION,
    dev_name TEXT(50) WITH COMPRESSION
  )
;

The first line contains the command (CREATE TABLE) and specifies the new table’s name (appl). Everything between the parentheses is a comma-delimited list of the columns that the table will contain. Each column definition consists of a column name and a definition of the column’s contents. Note that column names can contain spaces, but it complicates the syntax, so I don’t use them.

The first column is named id and the column definition presents the Jet DBMS syntax for identifying a column that you want to be an automatically generated integer primary key. Other DBMS also have mechanisms for accomplishing the same thing, but the syntax varies. The next three columns are defined as variable-length text fields of differing sizes. Note that defined in this way, the indicated count specifies the maximum number of characters that the field can contain, but the DBMS only stores the actual number of characters you put in it. Finally, the WITH COMPRESSION keywords are needed because a few versions ago, Jet was converted to storing UNICODE characters, which are 2-bytes long. The effect of this change is that ASCII strings started taking up twice as much memory as was needed. The WITH COMPRESSION keywords tell Jet to store the strings as 1-byte values.

One more point. The last field (dev_name) is intended to contain the name of the developer that last modified the application. But, wouldn’t what we said earlier about normalization apply to this information as well? Yes it would. To be formally correct, there should be another table of developers to which this table should relate. However, I put this in here to highlight the important point that in database design — as with LabVIEW programming — there trade-offs and so we need to ask ourselves whether the added normalization provides sufficient benefit to justify the added complication of another table. In this case, the answer I came up with was, “No”.

Processes to Launch

When we consider the information needed to launch the startup processes with the desired user feedback, we find that there are three things we need — plus of course a reference to the application identified in the table we just created. This is what we have.

  1. appl_id: This is the numeric reference to the table identifying the applications.
  2. label: This is the string that will be displayed to the user while the process is being launched. Before it was derived from the name of the VI. This approach is more flexible.
  3. item_path: As when this data was stored in the INI file, this string is a relative path to the VI to be launched.
  4. item_mode: This string indicates the mode used to launch the process VI. It is associated with an enumeration in LabVIEW that (for now at least) simply indicates whether the VI is reentrant or non-reentrant.
  5. launch_order: This is an integer column we will use to sort the query results so the process VIs are launched in the right order.

So with this information in hand we are ready to define our next table, right? Well, not quite. We need to consider another assumption that we have made. To this point, the code we have created only uses VIs that are dynamically loaded in one place: at program startup. Will this assumption always be true? Simple logic would say that dynamic loading is a very powerful technique, so we will very likely be wanting to use it in the future. So let’s create another header table that will store a string indicating the condition where the VI will be launched. Right now the table will only have a single value in it, “Startup”.

Turning all this discussion into SQL commands to create the needed tables, this code creates the new header table:

CREATE TABLE launch_cond (
    id                AUTOINCREMENT PRIMARY KEY,
    launch_condition  TEXT(128) WITH COMPRESSION
  )
;

…then this code inserts the one record we need now…

INSERT INTO launch_cond (launch_condition) VALUES ('Startup');

…and finally we can create the table for holding the launch items:

CREATE TABLE launch_item (
    id              AUTOINCREMENT PRIMARY KEY,
    appl_id         INTEGER NOT NULL,
    launch_cond_id  INTEGER NOT NULL,
    label           TEXT(40) WITH COMPRESSION,
    item_path       TEXT(128) WITH COMPRESSION,
    launch_mode     TEXT(40) WITH COMPRESSION,
    launch_order    INTEGER,
    CONSTRAINT launch_cond_FK FOREIGN KEY (launch_cond_id) REFERENCES launch_cond(id),
    CONSTRAINT launch_appl_FK FOREIGN KEY (appl_id) REFERENCES appl(id)
  )
;

This table contains all the same stuff we saw in the other table creation commands, but with a couple additions. The last two items in the column list aren’t columns. Rather, they create the relational links between this table and the two header tables by defining a pair of foreign key constraints. The name of the constraints are important because they will be used in error messages associated with the constraint. The constraint definitions themselves are straight-forward, specifying a column that references a specified column in a specified table.

Storing Errors

The foregoing case demonstrates how to deal with tables that store configuration data. Now we turn our attention to a table that the application writes to while running. Specifically, we will look at the table for storing errors that occur during program execution. Here are the pieces of information we know we will need going in:

  1. appl_id: This is the numeric reference to the table identifying the applications. In this case it is the application that generated the associated event.
  2. evt_dttm: This field holds timestamp values indicating when the events occurred.
  3. evt_type_id: In addition to knowing when an event occurred, you also need to capture what type of event it was. Starting off we can imagine three types of events (errors, warnings and notes) but there could be more, so we’ll store the type in another header table and reference it here.
  4. evt_code: This integer column hold the error code from the LabVIEW error cluster.
  5. evt_source: Another text column, this field holds the Source string from the LabVIEW error cluster.

First comes the new header table and its three records. Note that these event types correlate to the three values of a LabVIEW enumeration (Event Types.ctl).

CREATE TABLE event_type (
    id     AUTOINCREMENT PRIMARY KEY,
    label  TEXT(40) WITH COMPRESSION
  )
;

INSERT INTO event_type (label) VALUES ('Error');
INSERT INTO event_type (label) VALUES ('Warning');
INSERT INTO event_type (label) VALUES ('Note');

And then the table for storing the events…

CREATE TABLE event (
    id             AUTOINCREMENT PRIMARY KEY,
    appl_id        INTEGER NOT NULL,
    event_type_id  INTEGER NOT NULL,
    evt_dttm       DATETIME DEFAULT now() NOT NULL,
    evt_code       INTEGER,
    evt_source     MEMO WITH COMPRESSION,
    CONSTRAINT event_appl_FK FOREIGN KEY (appl_id) REFERENCES appl(id),
    CONSTRAINT event_event_type_FK FOREIGN KEY (event_type_id) REFERENCES event_type(id)
  )
;

The first new idea on display here is in the definition of the evt_dttm field. In addition to stating the data type (DATETIME), it also specifies that the field cannot be empty (NOT NULL) and tells Jet what value to use if an insert statement does not contain a time value: now(). This built-in Jet constant returns the current date and time. You also see that I introduced a new datatype MEMO to hold the textual description of the error. Given that it uses the same WITH COMPRESSION keywords as we use with the TEXT, you might assume that it is another way of storing textual data — and you’d be right. The difference between the two is that while both are variable length fields, a TEXT field can only hold a maximum of 255 characters. By contrast a MEMO field can (in theory at least) hold strings as long as 2.14-Gbytes.

The Default Sample Period

Finally there are always simple, unstructured setup values, like the default sample period, so let’s set up a table for them too. Note that I picked column names that reflect the simple organization of an INI file with sections, keys and values.

CREATE TABLE misc_setting (
    id         AUTOINCREMENT PRIMARY KEY,
    appl_id    INTEGER NOT NULL,
    p_section  TEXT(50),
    p_key      TEXT(50) WITH COMPRESSION,
    p_value    TEXT(255) WITH COMPRESSION,
    CONSTRAINT miscsettings_appl_FK FOREIGN KEY (appl_id) REFERENCES appl(id)
  )
;

INSERT INTO misc_setting (appl_id, p_section, p_key, p_value)
   SELECT id, 'Data Acquisition','Sample Period','1000'
     FROM appl
    WHERE label = 'Testbed'
;

The syntax used for inserting the record we need is a bit different because it has to look up the value that will go into the appl_id field using something called a subquery. The syntax for the Jet implementation of this type of operation is rather obscure and is in no way compliant with the SQL standard, but this is what it looks like. Unfortunately, it is not the only place where Jet runs counter to the standard. For example, getting the event table to automatically insert the event timestamp value using by using a default value of now() sidesteps the problem of formatting time values, which is also very non-standard.

A Few More Things

So there we have it: the structure for our simple database — or as much of it as we need right now. But you may be wondering what you are supposed to do with all that SQL code? I’m glad you asked. I have created a simple utility that executes the code to create the database, and you can find it (and the complete SQL source code file) here:

http://svn.notatamelion.com/blogProject/local database builder/Tags/Release 1

When you run the utility, it builds the new database file in the directory where the source code file is located. While I’m linking to things, it can be helpful sometimes to simply open a database file and look at its contents. If you have Access installed on your computer you can look at the resulting database file immediately, otherwise I can recommend a small, lightweight utility called Database .Net. It doesn’t require an installer, supports a bunch of different DBMS, and does not impact a computer’s registry so can run from anywhere. I like to keep it on a USB thumb-drive in case I need to take a quick look at the contents of a database.

Finally, here are a couple more links for books that I recommend for learning more about the issues we rushed through in this post. I really like The Practical SQL Handbook because it covers all the important points in a readable, entertaining way. A dry college text, this is not. Designing Quality Databases With IDEF1X Information Models, on the other hand, is a college text that goes into a tremendous amount of detail on the topic of data modelling and a common technique for documenting what you find — which of course makes it very through. This book is not a quick read, but it is worth the effort.

All we have left to cover now is the LabVIEW side of the job, which we’ll get into with the next post.

Until next time …

Mike…