Excel doesn't have enough rows

An engineer at work was working a project a couple months ago, wherein he acquired a bunch of RS-422 data using a digital oscilloscope. They’re CSV files, and quite large (around 117 MB).

The engineer abruptly quit to become a comedian. (I am not making this up.) I was tasked to pick up the pieces on the project. I need to “decode” the RS-422 data he acquired.

So I tried bringing up one of the files in Excel. Two problems:

  1. The version of Excel I have at work is limited to 1 million rows.

  2. I tried plotting some of the data (X-Y chart). It is incredibly slow.

So what do I do? Is there better software?

Sounds like it’s time to move to a true database platform instead of Excel. Do you have MS a
Access at work? It’s not the most robust database software out there, but it will import CSV files with no hassle.

Got it in one. Excel is a great application, but it’s not designed for very large data sets. There are lots of databases designed to handle large amounts of data. See what your company has to offer.

Many better software solutions. Depends upon what you want to do with them, and what the data is. If it is just time series voltages, you could plot it with just about anything.

Much will depend upon your platform and the tools you feel comfortable with. You could grab an evaluation copy of LabView and use it. I feel sure it would do the job. You would probably then decide you want to buy a copy.

Personally, I would just read the file in Python and use matplotlib to quickly render the data. But that requires you to be comfortable with doing that. There are quite a few environments would get you working with Python very quickly. Most free.

If anyone was familiar with Matlab, you could grab a copy of SciLab. Again, an almost must have tool. And free.

ETA, as above, Excel is the wrong tool. It is depressing how may tasks Excel is pressed into performing that it should never have been used for.

R, with RStudio for a GUI.

For simple Excel-like column arithmetic and plotting, any number of free babby’s-first-R video lessons out there should have you up and running in an hour or so.

ETA Python works too

How many rows of data *are *there?

You might be able to use PowerPivot in Excel, an add-in that can allow you to analyze >1M rows of data, but I don’t have personal experience with it. Depends on what kind of analysis/plotting you need to do.

You might try R, which is free, but I have no experience with this either. Might be a steep learning curve because it’s more of a language.

When the only tools you know how to use are a screwdriver and a hammer and the car still needs fixin’ … well a lot of less-than-stellar work results. At far greater effort than need be.

Such is the human condition.

An excess of one million rows of data may give one pause to think: Perhaps the data is improperly formatted, irrespective of the software spreadsheet expected to handle it.

What formatting?

A million isn’t very much. It just is for software designed for something completely different.

Nonesense. I frequently deal with data collected at 10 kHz or more over spans of many minutes. Getting over a million samples from a single channel is a normal occurance, andwe have to deal with dozens of channels at that data rate for each test.

Posters who have stated that Excel is not adequate for this task (or indeed, any complex data handling task) are completely correct. It is a spreadsheet application useful for accounting and other relatively simple data analysis, trade study, and visualization purposes, but once you get into large data sets or complex transforms it is wholly inadequate. Matlab, Labview, or free alternatives auch as Scilab or Python/NumPy/SciPy/Pandas which are designed for data manipulation and analysis are much better suited, especially since the first thing you are likely to want to do after looking at the raw data is filter it to remove noise or unwanted content. You can find resources on line to show you the basics of using any of these systems. My preference would be Python because it is free (as in beer and speech) and highly extensible, and with the Jupyter notebook you can use it directly from a web browser instead of a command line or complicated GUI. (I would not use R for this because it is not very computationally efficient with large data sets where as Python with NumPy is about as efficient as any interpreted code can be.)

Here is something to get you started: Ultimate guide for Data Exploration in Python using NumPy, Matplotlib and Pandas

Stranger

You did not say what you wanted to do with the data, but almost anything is better than Excel (a spreadsheet program) for this. As another free alternative to Matlab I suggest Octave, which is compatible, uses the same numerical libraries, has tons of plugins available, and of course reads CSV files.

117 MB is nothing; I don’t know what counts as “big data” these days, but I would guess it starts at terabytes.

Excuse me. I’m not accustomed to such data intensive circumstances as you. I stand corrected.

Excel is like a Swiss Army knife. It can do a whole lot of things, and you just need one tool. However, it is not excellent at very many of them, but gets the job done. And you just need one tool. But the OP should be using a chain saw instead of that little saw that’s in a Swiss Army knife.

If you can still use the data if it is split, you could open it up in Notepad or Wordpad or some other text editor, copy big spans of rows and paste into separate Excel worksheets. There are also windows batch files and powershell scripts available on the web to split a file for you.

R with R-studio is pretty standard for this type of thing.

However, I hate R, so I would recommend Python. The Anaconda distribution with associated IDE should come with all the data science libraries you will need preinstalled (pandas, numpy, scipy, mllib, etc.).

What exactly are you trying to do?

I don’t work on big data, but it’s generally meant more data than can be used at one time. So yeah, probably in the terabyte or tens of terabytes range.

This is the right answer. Excel is not a database.

All relational database systems have a method for automatically reading in CSV data. It’ll be a matter of figuring out your table structure, reading in the data, and then figuring out how to write queries on it.

Slight hijack: How funny is the engineer turned comedien?

It depends on what you are doing, but PowerPivot builds an in-memory tabular model, identical in technology to SQL Server Analysis Services - you can upsize to SSAS.

With PowerPivot, you are only limited to available RAM (4GB cap with 32bit Excel).

I do this for a living :smiley:

If it’s a one-time thing, can I suggest something simpler? Depending on what you actually need to do with that data, something like Google Fusion Tables may or may not be enough… it’s a hell lot easier than learning R or SQL though. It’s more of a “medium data” solution for the scenarios between Excel and a database.

As for databases, if this is something you very rarely do, doing it in the cloud might be cheaper both in terms of licensing (analyzing 100 MB of data should only cost you a few cents, or a few dollars at most) and hardware (let their computers crunch the data on your behalf). If you choose to learn SQL, Google BigQuery is fast and cheap and allows you to do all sorts of simpler analyses in a syntax that is IMO easier than R or Python:


What do you want to DO with that data anyway? What is the input and what is the expected output? That, not the number of rows, should determine what the best method is.