Understanding Web I/O

A question that seems to be popping up with increased urgency is how to incorporate LabVIEW into an environment that is largely web-based. As you would expect, in a technology that is developing as fast as the web, it can be hard to keep track of where things are right now. In fact at least one major tech publisher (O’Reilly Media) is diving full throttle into electronic publishing because with the lead time inherent in conventional publishing, web development books tend to be out of date before they can even be printed.

This is important to us because we don’t have the time or money to waste on technological dead ends. For example, I wouldn’t recommend you buy stock in companies manufacturing compact fluorescent lightbulbs. I mean who really wants to have to call in a hazmat team to clean up the mercury contamination when one breaks? The “curly light bulb” may be all the rage right now, but it is a technological dead end — to be replaced by LEDs. In the same way, I believe that many of the technologies that are in use today to access online resources are also technological dead ends.

If your company is online right now or wants to offer some service online, the standard way of doing that is with an “app”. Or rather, I should say, several apps. You will need an app for Android, one iOS, one for Windows Mobile, and perhaps others as well. But to get them you will need separate at least three separate development team because none of those operating systems are compatible with one another — or for that matter even very similar. This approach is simply not sustainable, and is another technological dead end. The solution is to build a system based on the platform independent standards that drive the web. So we need to learn where LabVIEW fits into this picture.

The Web Landscape

The first thing to understand is that the web as a collection of static HTML pages is gone. The web of today is a dynamic environment where the web pages you view (including this one) are built on the fly and in some cases may only exist for the few milliseconds it takes for the server to transmit it to you. While the underlying technology that accomplishes this magic is fascinating, the good news is that we don’t really need to know very much about it to accomplish our goals. Instead it is adequate for our purpose to know who the “main players” are from our (rather limited) point of view and how they interact.

One of the key developments in the past few years has been the understanding that each piece of the overall web environment needs to have a clearly defined role and that the various components don’t step outside their designated roles. To see what I mean, let’s consider a few of the basic pieces that you will encounter working with LabVIEW:

HTML — The Content

Tn the beginning the need was simple, Researchers at CERN needed a way to use this new “internet” to share research results with colleagues. But they wanted more than just a way to ship text file around. They wanted documents that could include pictures and graph, and could be linked to one another. In short, they wanted (to borrow a buzzword from an even earlier time) hyper-text.

The approach that they used to reach for this goal was simple: take bare text documents and embed in them tags (which were also text) that could tell a custom designed program called a “browser” how to treat the text contained in the tags. The thing to remember is that this idea was hardly a new one — so-called “markup languages” had been used for decades in the mainframe environment to implement word processors and report generators.

Even today, HTML (hyper text markup language) documents are just text files with embedded tags. This point is an important one to remember because it means that while browsers have one way of interpreting tags, other programs — even LabVIEW, or programs you write in LabVIEW — can read the same files and parse them to extract the data, the same as they would any other text file.

CSS — How Things Should Look

As it was originally created, HTML mushed together a lot of different things. Some tags defined the structure of the document, some were links and some simply defined what the text should look like. But this approach soon began to present some limitations. For example, if the logic defining what a particular bit of text looks like is embedded in the document, everybody has to see the same thing no matter how big their browser window might be or what the physical characteristics of their screen might be.

The solution to this problem with as easy. All you had to do is define a separate syntax for defining the content’s appearance, and then take all that logic and move it to a separate file with a .css file extension called a style-sheet. Now two people could view the same HTML document and would see it formatted to best fit their system because their browsers were using style sheets that were optimized for their system hardware.

As a practical example of this flexibility in action, consider the very webpage you are reading right now. Regardless of whether you are looking at it on a monitor on your desk, or a smartphone in the palm of your hand, everybody gets to see a nicely styled layout that fits their device — and I only had to generate the content once.

PHP — Server-Side Programmability

But even with that styling flexibility, there is still a very large problem. If all you have at your disposal is HTML, then each web page becomes a separate program that somebody has to write. Imagine if you will how much effort it would take to maintain a site such as Amazon if every on of its thousands of products needed its own handcrafted and programmed web page. The solution is to automate the creation of web pages using a programming language that can automatically assemble pages from predefined common components as they are needed.

One of the most popular language for this kind of work is PHP — though there are many others. It works by inserting itself into the HTML process in one of two ways. First, there are special HTML tags that cause the server to momentarily stop sending the data from the HTML file and run a PHP function. The purpose of this function would be to programmatically build and echo to the waiting browser some portion of the page, like a standardized menu structure or footer.

The second way PHP may appear is that some web servers (like Apache, for instance) will automatically call a predefined file called index.php if you access the site without specifying a specific page. Consequently, while it might look to you like you are accessing a webpage, you are really running a PHP program that builds the initial screen that you see on the fly. In addition, if you watch your URL bar while browsing you will occasionally notice that you are accessing pages with the same php file extension.

Database — Data Storage

If you have PHP building web pages dynamically, a logical question to ask is where it gets the data that it uses to build the dynamic pages that we see in our browsers? Not surprisingly, PHP uses a database, and it does so for the same reason that we use databases when building a test application: because it’s better to reconfigure than it is to recode. In fact, one of the major functions of PHP programs is to query a local database and pass the result to the browser for display. You will also find PHP programs storing to a database the information that you provide when filling out forms to do things like ordering pizza online or registering for a blog.

The type of databases that are used can vary widely though the most common is MySQL. The important thing to remember is that the specific database that is used to fill this particular niche in the overall web ecosystem is largely irrelevant.

Getting back to LabVIEW

Now that we understand how some of the major pieces fit together, we can begin to explore how our LabVIEW programs fit into this picture. Since the point of all this structure is to manage test files being sent to browsers, it shouldn’t be too much of a surprise to learn that one of the major ways that LabVIEW programs fit in is by pretending to be browsers. In support of that “deception” LabVIEW comes with a set of HTTP client VIs that support all the major HTTP methods as well as routines for dealing with access control, and setting session parameters that you are wanting to use. As an example of how the interface works, I have created a VI which illustrates about the simplest transaction possible:

simple get method

The code starts by opening an HTTP session, It then uses the VI that implements the HTTP GET method to fetch a URL and display the contents in a string indicator. Finally, it closes the connection. As you will see, the tricky part of this operation isn’t so much how you do it, but rather what do you do with the data you get back. For example, if you run this VI with the default value in the url control you get something that looks like this:

raw web results

Now you have to admit that the data in the Response indicator would seem to fit the definition that most people have for “unintelligible garbage”. But wait a minute, look at the first 6 bytes of the file. Many proprietary file formats will put a short identifier in the first few bytes of the file. For example, if you look at a Zip archive, the first three characters are “ZIP”. Applying this understanding to the data we just read from the internet, “GIF89a” is how files are identified that meet the 1989 Graphics Interchange Format standard. In other words, it’s a gif format graphic. To see the image all you have to do it provide a way to save the data to a file by adding a bit of logic to the block diagram.

gif saver

You see the url is the link that the LabVIEW forum uses to fetch one of the predefined avatars, in this case one called “fatbot”. But there are more serious things that you can do also. For example, by parsing the HTML that you get back from reading an actual webpage, you can obtain data that you might not otherwise be able to access. However, if you plan to do such data harvesting, be sure that you have the permission to do so. Otherwise the practice could lead to serious ethical/karmic/legal issues.

So why would someone need to do this sort of data collection as part of a professional test system? One possible scenario would be that the application you are needing to test is inaccessible, and all you can see is the data that it publishes online. Or perhaps, the web interface is the thing that you are wanting to test.

The big risk, of course, is that if you don’t have control over the page that you’re accessing, you can have your code suddenly stop working because the owner changed their webpage. Ideally, you should be able to negotiate changes with the site owner, but if not you need to remember that your number 1 mitigation strategy is good design. While you might not be able to keep changes from breaking your code, you can at least limit the damage to perhaps 1 or 2 VIs.

Assuming that you can negotiate with the folks that are working on the webpage, one of the things that you can request is that they surround the data that you need with special tags like so: (Imagine this is in the middle of a block of text on the web page)

… the test was performed by <lvData id="testOp">John Doe</lvData> using test fixture <lvData id="testLoc">5</lvData> …

This snippet defines a unique HTML tag lvData that allows the LabVIEW code that is parsing the content to quickly find the values it want, while the id="…" clauses allow the LabVIEW application identify the label to associate with each value. Moreover this creation of an ad hoc tag won’t impact the viewing of the page with a standard browser because fundamental to the operation of all browsers is the simple rule: If you see a tag you don’t recognize, ignore it. Here is code that you could use to locate and parse these tags.

parse lvData

This idea could also be expanded by embedding the custom tags in HTML comments. Like this:

<!-- <lvData id="testOp">John Doe</lvData> <lvData id="testLoc">5</lvData> -->

In this implementation, the comment tag would hide the custom tags from view when the page is being displayed in a browser, but they would still be available to a LabVIEW application parsing the page as text. Using this technique you could hide data inside any page — perhaps even this one.

The Big Tease

This should be enough for now. Next time we’ll look at how to use LabVIEW as a data source for a web-based application. We’ll check out the HTTP POST method, and even look at a little PHP code and some JSON data.

Until Next Time…

Mike…

Leave a Reply