Excel doesn't have enough rows

All this mention of SQL (Structured Query Language) leads me to point out that it is designed to manage relational databases, and that to archive a huge amount of monitoring data you do not necessarily need a relational database.

I use straightforward C code to deal with such things, usually quite easily, or just awk if the problem is simple enough. Often a sed will be done first to format the data conveniently. I have also used tools like Mathematica but they’re overkill for most tasks.

Perhaps my comment should be in brackets: “[Off-topic, for those who want to reminisce about the Jurassic era.]”

While you can use a number of free tools, you might wish to stick with excel or access for visualisation.

If you have more than 2gb, you exceed access’s database file.

But you can connect access to sql … eg MS SQL express (client server but single client …approx), mysqllite (file ) or mysql (client/server … Load the csv into the sql with its tool or with access.

Hint: link to the mysql or mysqllite via an odbc driver, or ms sql with the ms sql driver.

Now you can produce a query or queries to select the practical supply of data (eg averaged or sampled or a combo) data . The good thing is you can save these as step 1, step 2, etc.

eg

clear table2
clear table3
Sample table1 into table 2
query averages of table 2 into table 3
export table3 into xls.
Now switch to excel and graph table 3 and other tables as you wish. Feel free to program access to drive this in less clunky fashion, but its not necessary when its just a few basic steps 1 to 5.
Just saying that its not requiring to move to anything more than access and xls , as you aren’t requiring advanced maths when producing producing a standard 2nd graph.

I think the issue people who are talking about databases and Excel are missing is that this is data acquired by a digital oscilloscope. It is going to be time series data samples, and lots of them all in a row. Neither Excel or any sort of data-base are the correct tools for handling this data.

The problem seems to stem for the fact that they are in CSV format. And many people seem to equate CSV with Excel. Which is really pretty odd. Files full of data separated by commas has been around since the dawn of computing, and was simply a format Excel supported to allow it to be vaguely useful.

Minimally the requirement will be to graph each sample set out, and you will be talking millions of samples. Maybe multichannel data, maybe separate sample runs.
After that all the usual analysis tools might be usefully employed. And these are not your standard Excel or database analysis tools. They will be the standard signals processing tools. All the scientific tools mentioned above have these tools as their bread and butter. Excel or an SQL database will never have heard of them, and will be unable to implement them in any sensible fashion.

Thanks for the replies so far.

Here’s the deal:

There is a, uh, box that flies around on some aircraft that collects data. (It’s not sensitive or classified, but I feel uncomfortable giving out more info. PM me for the info if you’re curious.) The box is taken off the aircraft and brought back to the lab. Data residing in the box is uploaded to a computer via an RS-422 interface. It’s 56K of data, total. There are two, differential, RS-422 channels: a transmit (Tx) channel and a receive (Rx) channel. Each channel consists of two wires. So there are four wires between the computer’s RS-422 card and the box.

The computer system and RS-422 interface are very old; I think it’s a 286 computer, and the RS-422 card is no longer made. Operating system is DOS. They want a new computer system and card. In addition they do not have the source code for the existing software on the (old) computer. I am supposed to come up with a new system to upload the 56K of data from the box onto a (new) laptop computer.

Another young engineer was working the project. I told him, “The first thing you need to do is find out what’s on the RS-422 cable.”

So the old system and a few boxes were shipped to our lab. He hooked everything up, and then connected a digital oscilloscope to the Tx and Rx RS-422 channels in order to collect the data. Sampling rate was 40K samples/second.

He’s gone, and I am picking up the pieces. :frowning:

The csv files he generated are big – over 100 MB. When I open them in Excel it cuts them off at 1 million lines. I tried plotting a few thousand data points, just to get a feel for what the data stream looks like. But it’s damn slow. Plus I would love to plot the whole thing and “zoom out” so to speak to see if I can see blocks of data being transferred.

A long time ago I was a software wiz, but over the last 25 years my focus has been on hardware. I am so lost on how to do software stuff nowadays.

And even if I do determine the protocol of the data transfer (baud rate, handshaking, ASCII??, etc.), I need to come up with a new system. My initial thought is LabView and a USB-to-RS-422 adapter. But I have never used LabView. I have heard DASYLab is easier to use.

This is actually the only part of the thread that interests me. :slight_smile:
mmm

He does improv comedy at a local theater. The theater offered him a management position, and he took it.

A bit strange, as he wasn’t a humorous guy. At least at work. Nice guy, though.

Oh dear, this isn’t necessarily a trivial problem to solve. OTOH, it could be.

You need to know a little about the protocol between the data logger and the host computer. It is probably a trivial protocol, but there will be a protocol, and you need to know this to be able to decode the data. But the first part of the problem is easy. But you will almost certainly need to hack code or find someone who can do it for you.

The first task is to grind the bytes out of the communication. It is almost certain that you will have a simple serial encoded byte stream. Where you get fun is that the encoding can be one of a number of possible variants. The encoding typically has a start bit - used to synchronise the receiver, a number of data bits (usually but not always 8) optionally a parity bit (which may or may not be actually used even if present), and a stop bit(s). Building a bit of code that can read your data stream and decode it into the bytes in the data stream isn’t hard. But it isn’t for a novice. Simple finite state automata does the trick, but it will take a bit of trial end error to get stable. It will work out the baud rate quickly enough,

However there is a complication in working with flow control. You said there were Tx and Rx differential pairs. So no hardware flow control. This means there is software flow control, and you will need to filter that out from the stream. Again, not hard, but you need to be aware of what is going on. (good old control-s and control-q most likely, but not always.)

Where you have some fun is that the host computer almost certainly talks to the data logger with some simple command set to get the data. Once you have the byte stream it is often not hard to reverse engineer the protocol - the engineers that design these things usually don’t have a great deal of imagination, so it is likely pretty easy. But you would need to work out a range of things, like data framing, endineness of data might be an issue, and so on.

This is all the hard stuff. Once you have this, the code to make a new system to do the talking to the data logger should be trivial.

A data file this big requires real programming to process and hopefully, summarize it.

You may be able to summarize it and then write a csv to export to MYSQL. It’s a robust database that can handle big data files.

This is way beyond anything Excel or Access is designed to handle.

What language you use is entirely up to the programmer and his experience. COBOL, Visual Basic, even C++ are options. Any programmer you hire will have a language he has years of experience using. It’s a very bad idea to force a VB programmer to write in C++. It can be done but it will significantly slow down the project.

I have 25 years in COBOL and 12 years using SQL scripts. I would use that to organize and summarize the data. Then export to whatever application is best suited to analyze it.

~100 MB is by no means big data, but for reverse-engineering purposes you could simply truncate the file if you must. Again, though, I take it that for starters you just want a plot of the voltage with respect to time, or a Fourier transform, or something of that nature just to figure out the data transmission protocol? Load the file in any of the scientific software mentioned, not Excel if Excel consistently chokes on it.

Once you know what is going on, I agree with Francis Vaughan that implementing it in pure software is not completely for novices, but the good news is that there is a lot of existing code already written to do it. The buzzword is “bit-banging.”

117 MB is not “Big Data” and it does not require “real programming to process…it”. Any modern desktop or decent laptop computer has at least 2 GB of DDRAM; loading a file of this size in memory should not task the machine. As previously noted, it can be processed with any number of data processing or general scripting applications such as Matlab, Labview, SciLab, Octave, Python, Perl Data Language, et cetera, which can also slice, filter, transform, convolute, and plot the data in numerous forms. It does not need to be loaded into a relational database or use C++, Visual Basic, or for fuck’s sake COBOL, a nearly sixty year old language designed for business applications not signal processing or data visualization. Francis Vaughan is correct that figuring out the protocol and decomming the data into useful quantities is actually the bigger challenge than actually loading and viewing the data, particularly if this is raw voltage data requiring a complex transform to convert it into a meaningful signal.

Stranger

A requirement for a “relational” database is that it support a native query language. But that is only one of the requirements: SQL doesn’t define relational, and relational doesn’t define SQL.

As it happens, having a structured query language is good for any kind of situation where you want to do queries. I uses SQL for captured time data all the time…

But my captured time data has a known logical interpretation, even though it’s not relational in any way. It sounds like the OP has raw analogue data, which will need digital processing to reveal the logical content.

Now I /might/ put that data into a database and connect Excel to it to do processing, because those are tools I am familar with, but Excel graphs aren’t, as noted, exceptionally good in any way.

Note, python is a programming language often used for number manipulation. Just saying “I’d use Python” is like saying “I’d use FORTRAN” or “I’d use C++”. You still have to explain what graph package or environment you would use.

I don’t have a 64 bit version of Office either, so I use Scilab (not, as it happens, a Python package, but not much different, since you’d probably wind up in a Matlab compatible library) on Win64 (probably not going to work on Win32 for this dataset). Scilab 3d plot command is: “plot3d(t,t,z)”, where t and x are vectors (data). But (as I use it) it’s not as obvious as Excel. Which leads me to wonder about two tools I’ve NEVER used:

VSXu and Pure Data.

Both are explicitly designed as signal processing environments for audio-frequency data for people who don’t know anything about data. Keywords are “Visualizing” and “Multimedia works”.

One famous example of the use of a non-relational database to capture time-series data: the ATLAS experiment (of Higgs boson fame) at the CERN dumped their Big Data into a computer cluster running Apache Cassandra. Cassandra is not a relational database, nor does it support SQL, but certainly you can run queries. That is all I meant: sometimes the correct data model is not a table-oriented relational database.

In any case, it has transpired that Crafter_Man is not actually archiving exabytes of data in real time, and such considerations will provide diminishing returns in his case :slight_smile:

Pure Data and VSXu (I am only familiar with the former) are something a DJ or Thomas Dolby might use to create electronic music with trippy visuals. To quickly mess about with the OP’s 117 MB file I would personally load the CSV as-is into Matlab’s signal processing toolbox, and to do the final bit banging after everything is said and done he can probably use one of those USB to RS422 adapters to hook the black box up to a laptop. (Actually, if he gets a USB <-> RS-422 interface to replace the old interface card, and downloads FreeDOS to run on a laptop to replace the old computer, and the existing software can be configured to use a serial port, who knows, it might still work… if not, some useful information may be gleaned by running it through a decompiler.)

Hmm, it occurs to me to ask what the exact oscilloscope used to capture the data was. It isn’t out of the bounds of possibility that the software system for the device itself may be able to perform the decode to byte stream needed. There certainly are such systems with the needed capability.

I’m certainly not prone to plugging Microsoft Access. But any currently supported version of Access is totally capable of handling datasets this size, and it’s relatively easy to learn to use. It will pull in your CSV file painlessly and let you do some basic reporting on the contents, which sounds like all you really need.

The limitations on Access come into play using much larger datasets (in the gigabyte range), or if you need real-time processing or data access/updates from multiple concurrent users.

You say you don’t have the source code for the existing software, but is there anyway you can examine the executable that’s importing the data to determine what’s going on there? Depending on how the executable was compiled (and given its age it well could be text decipherable) you might be able to glean some pertinent info using a simple text editor. Just a thought.

I’m a little confused. You are just looking for a way to communicate over RS422 with a new computer to the “box” rather than use the ancient legacy 286 computer? Rather than reinvent the wheel, why don’t you use a RS-422 to USB adapter to interface and a serial protocol analyzer to figure out the serial bus settings?

I am assuming you have access to the “box” and to the legacy computer system. If you are trying to figure out both the transmit and receive data, you could use a RS422 bus sniffer.

If you are trying to figure everything out just from the scope trace data, than carry on…

I don’t think LJbroken’s post was meant as spam, but the correct answer, which I would have posted if I saw this a year ago, is to use a protocol analyzer like a Saleae Logic to decode the RS-422 signals.