Managing Data — the Easy Way

As I work on this blog there are times when I have to put certain topics on hold because the infrastructure doesn’t yet exist to allow me to effectively cover the topic. Well, this is one of those times. What I want to do is start getting into some topics like advanced user interfaces. However, these capabilities presume the existence of a data management capability that our testbed does not yet possess. To begin filling in that blank, the next few posts are going to start looking at techniques for utilizing databases as a data storage mechanism for LabVIEW-based applications.

But why databases? Wouldn’t it be easier to store all our configuration or test data in text files? When considering the storage of any sort of data the first thought for many developers is often to just, “…throw it in a text file…”. Therefore, this question needs to be the first thing that we discuss.

The Ubiquitous INI File

The INI file, as a method for storing configuration data, is a standard that has been around for a long time. Its main advantage is that it is a simple human-readable format that anyone with a text editor can manipulate. However, INI files also have a significant downside, and its number one problem is security. Simply put, INI files are inherently insecure because they use a simple human-readable format that anyone with a text editor can manipulate. Anybody who knows how to use Notepad can get in and play around with your program’s configuration, so unless you are very careful in terms of data validation, this openness can become an open door to program instability and failure.

A second issue is that while INI files can be subdivided into sections, and each section can contain key name and value pairs; beyond that simplistic formatting they are largely unstructured. This limitation shouldn’t be a surprise, after all the format was developed at a time when programs were very simple and had equally simple configuration requirements. However, with many modern applications one of the big challenges that you have to face when using INI files is that it is difficult, if not impossible, to store configuration data that exhibits internal relationships.

For example, say you have 2 or 3 basic types of widgets that you’re testing and each type of widget comes in 2 or 3 variants or models. In this scenario, some of the test parameters will be common for all widgets of a specific type while others are specific to a particular model. If you want to capture this data in a text file, you have two basic options. First you can put everything in one big table — and so run the risk of errors resulting from redundant data. Second, you can store it in a more normalized form and try to maintain the relationships manually — which is a pain in the neck, and just as error prone.

Text Data Files?

OK, so text files aren’t the best for storing configuration data. What about test data? Unfortunately, you have all the same problems — in spades — plus a few new ones: For example, consider Application Dependency.

Application dependency means that the data format or structure is specific to the program that created the file. Basically, the limitation is that outside programs have to know the structure of your file ahead of time or it won’t be able to effectively read your data. Depending upon the complexity of your application, the information required to read the data may expose more about the inner workings of your code than you really feel comfortable publishing.

Another problem is numeric precision. You can’t save numbers in a text file, just string representations of the numbers. Consequently, numeric precision is limited to whatever the program used when the file was created. So if you think that all you need when saving the data is 3 decimal places, and then find out later that you really need 4, you’re pretty much hosed since there is no way to recreate the precision that was thrown way when the data was saved.

Finally, data in a text file usually has no, or at least inadequate, context. Everybody worries about the accuracy of their test results, but context is just as important. Context is the information that tells you how to interpret the data you have before you. Context includes things like who ran a test, when it was run, and how the test was configured. Context also tells you things like what unit of measure to use in reading numeric data, or when the instruments were calibrated.

The Case for a Database

Due to all the foregoing issues, my clear preference is to use databases to both manage configuration data and store test results. However some LabVIEW developers refuse to consider databases in a misguided effort to avoid complication. Out of a dearth of real information they raise objections unencumbered by facts.

My position is that when you take into consideration the total problem of managing data, databases are actually the easiest solution. Of course that doesn’t mean that there won’t be things for you to learn. The truth is that there will be things to learn regardless of the way you approach data management. My point is that with databases there is less for you to learn due to the outside resources that you can leverage along the way. For one simple (but huge) example, consider that you could figure out on your own the correct ways to archive and backup all your test data — or you could put the data into a database and let the corporate IT folks, who do this sort of thing for a living, handle your data as well.

So let’s get started with a few basic concepts.

What is a DBMS?

One common point of confusion is the term DBMS, which stands for DataBase Management System. A DBMS is software that provides a standardized interface for creating, maintaining and using databases. In a sense, the relationship between a DBMS and a database is exactly the same as the relationship between a word-processor, like Microsoft Word, and a letter home to your Mom. Just as a word-processor is a program for creating textual documents, so a DBMS can be seen as a program for creating databases. One big difference though, is that while word-processors are programs that you have to explicitly start before you can use them, a DBMS will typically run in the background as a Windows service. For example, if you are reading this post from a Windows-based computer, there is at least one DBMS (called Jet) running on your computer right now.

What is a database?

A database is a structured collection of data. In that definition, one word is particularly important: “structured”. One of the central insights that lead to the development of databases was that data has structure. Early on, people recognized that some pieces of information are logically related to other pieces and that these relationships are just as important as the data itself. The relationships in the data are how we store the data context we discussed earlier.

By the way, as an aside, people who talk about an “Access Database” are wrong on two counts since Access is neither a database or a DBMS. Rather is it an application development environment for creating applications that access databases. By default, Access utilizes the Jet DBMS that is built into Widows, but it can access most others as well.

How do you communicate with a database?

It wasn’t that long ago that communicating with a database from LabVIEW was painful. The first product I ever sold was an add-on for the early Windows version of LabVIEW that was built around a code interface node that I had to write using the Watcom C compiler. Actually, looking back on it, “painful” is an understatement…

In any case, things are now dramatically different. On the database side the creation of standards such as ODBC and later ADO (also called OLE DB) provided standardized cross-vendor interfaces. For their part, National Instruments began providing ways of accessing those interfaces from within LabVIEW (starting with Version 5). Today accessing databases through ActiveX or .net interfaces is a breeze. To demonstrate this point, I’ll present a package of drivers that I have developed and posted on the user forum several years ago.

Getting Connected…

The VIs we will now discuss are the core routines that you will need to interact with any DBMS that supports an ADO interface — which is basically all of them. The only common DBMS that doesn’t support ADO is SQLite. Instead, it has its own DLL that you have to access directly. Still, if you want a very lightweight database engine that will run on nearly anything (including some real-time hosts) it is a good choice and there are driver packages available through the forum.

Getting back to our standard interface, the following six routines provide all the support most applications will ever need. One thing to notice is that with the exception of one subVI that requires added code to work around a bug in certain version of SQL Server, most of the code is very simple, which is as it should be.

To start that exploration, we’ll look at the routine that is the logical starting place for any database interaction — and incidentally is the only routine that cares what database you are using.

ADO Database Drivers.lvlib:Acquire Connection.vi

The start of any interaction with a database, naturally involves establishing a connection to the DBMS that is managing it, and then identifying which specific database you want to use. This VI calls the ADO method that performs that operation.

You’ll notice that the ActiveX method only has one required input: the Connection String. This string is a semicolon-delimited list of input parameters. Although the exact parameters that are required depends on the DBMS you are accessing, there is one required parameter, Provider. It tells the ADO functionality what driver to use to access the DBMS. This is what a typical connection string would look like for connecting to a so-called Access database.

Provider=Microsoft.Jet.OLEDB.4.0;Data Source=C:\mydatabase.mdb;User Id=;Password=;

Note that in this case the Data Source consists of a path to a specific file. For another DBMS this same parameter might point to an IP address, a logical name the driver defines or the name of a Windows service. But what if you don’t know what connection string to use? Well you can ask the DBMS vendor — or you can check: www.ConnectionStrings.com. Yes, there is an entire website dedicated to listing connection strings, and it covers nearly every ADO-compatible DBMS on the planet.

ADO Database Drivers.lvlib:Execute SQL Command.vi

OK, you are connected to your DBMS and you have a reference to your database. The next question is what do you want to do now? One answer to that question would be to send a command to the DBMS telling it to do something like create a new table in which you can store data, or add a new record to an existing table. This VI meets that need by sending to the DBMS command strings consisting of statements written in a language called SQL.

The core of this routine is the connection Execute method. Because this method could be used to execute a command that returns data, the 0x80 input optimizes operation by telling the method to not expect or handle any returned data. The method, instead, returns a count of the number of rows that the last command impacted.

ADO Database Drivers.lvlib:Create and Read Recordset.vi

While sending commands is important, the most common thing to do with a database connection is to use it to read data from the database. To optimize operation when you have multiple users concurrently accessing the same database, ADO creates something called a recordset. A recordset is a memory-resident copy of the requested data that you can access as needed without requiring further interaction with the database itself.

The three subVIs shown create a recordset using a source string consisting of an SQL query, reads the recordset’s contents, and then closes the recordset to free its memory. For details of how these subVIs work, checkout the comments on their block diagrams.

ADO Database Drivers.lvlib:Release Connection.vi

Once you are finished using a connection, you need to tell ADO that you are done with it.

Note that although the method called is named Close, it doesn’t really close anything. Thanks to built-in functionality called connection pooling the connection isn’t really closed, rather Window just marks the connection as not being used and puts it into a pool of available connections. Consequently, the next time there is a request for a connection to the same database, Windows doesn’t have to go to the trouble of opening a new connection, it can just pull from the pool a reference to a connection that isn’t currently in use.

ADO Database Drivers.lvlib:Start Transaction.vi

A topic that database folks justifiably spend a lot of time talking about is data integrity. One way that a DBMS supports data integrity is by ensuring that all operations are “atomic”, i.e. indivisible. In simple terms, this means that for a given operation either all the changes to the database are successful, or none of them are. This constraint is easy to enforce when inserting or modifying a single record, but what about the (common) case where a single logical update entails adding or modifying several individual records?

To handle this situation, DBMS allows you to define a transaction that turns several database operations into a single atomic operation. This VI marks the start of a transaction.

To mark the other end of the transaction, you have to perform either a Commit (make all the changes permanent) or a Rollback (undo all the changes since the transaction’s start).

ADO Database Drivers.lvlib:Rollback Transaction on Error.vi

This VI combines the Commit and Rollback operations into a single routine. But how does it know which to do? Simple, it looks at the error cluster. If there is no error it commits the transaction…

…but if there is an error it will rollback the transaction.

What’s Next

Clearly the foregoing six VIs are not a complete implementation of all that is useful in the ADO interface — for instance, they totally ignore “BLOBs”. However I have found that they address 95% of what I do on a daily basis, and here is a link to the new version of the toolbox that includes the drivers.

Toolbox Release 3

But let’s be honest, having an appropriate set of drivers is in some ways the easy part of the job. You still need a database to access and you need to know at least a bit about the SQL language. This is where what I said earlier I said about leveraging the time and talents of other people come into play. Your corporate or institutional IT will often assist you in setting up a database — many times, in fact, they will insist on doing it for you. Likewise, with an interface that is based on command strings written in SQL, you will find a wealth of talent willing to lend a hand, perhaps in-house but certainly online. Still, having said all that, it’s good for you to understand at least a bit about how these other aspects operate. Therefore, next time we’ll create a small local database (using Jet) and start implementing the data management for our testbed application.

Until next time…

Mike…

3 thoughts on “Managing Data — the Easy Way”

Yair says:

2015-01-21 at 8:39 am

Actually, there’s a lightweight ODBC driver for SQLite, but you have to install it separately:
http://www.ch-werner.de/sqliteodbc/

It should probably also be pointed out that older versions (pre-LV 2011) of NI’s DB toolkit had a bug where some DBs (including SQLite) would return an error in one of the subVIs because that VI was calling a property that was unsupported.

Log in to Reply
- mlportersr says:
  
  2015-01-22 at 5:11 pm
  
  Actually, you just did! Thanks for the comment.
  
  Mike…
  
  Log in to Reply
Pingback: Building a Web Backend in LabVIEW | Not a Tame Lion

Not a Tame Lion

"Safe" isn't always an option…

Managing Data — the Easy Way

The Ubiquitous INI File

Text Data Files?