 # Expanding or compressing points on a data series

I have a couple of sets of data. (Illumination levels of street lighting installations over one block.) Some sets have 30 values, some have 35, etc. The X value needs to be the same for all sets. How do I reduce or expand the sets so they all have the same number of values so I can graph it, or alternatively how do I use Excel so they occupy the same space on the X axis?

What are you graphing against what? That is, what is the y value (assuming x = illumination level). Further, why do you have multiple values of x, and do you need to show all of them? You can’t average them? You can’t pick the highest value or the mean?
Have you considered a 3-D Graph, but mainly I think you need to tell us more about what you are trying to graph.

“X” is a given point on the street. “Y” is the illumination level at that point. I walked down the street taking a reading every four paces; I thought it would be good enough just to count paces rather than drag out a measuring tape.

I do have a second chart, a bar chart just showing the average, but there’s also standards for uniformity- between street lights is a lot darker than underneath one, so I’m trying to show that over the distance of a block with a line chart

That was about half the details we needed.

Are we to understand that you did this data gathering on multiple blocks, and each block was a different length (in 4-pace units) so you ended up with different numbers of samples on different blocks? But each block was sampled just once to create one set of 30-ish data points?

And that somehow you want a single graph with multiple lines, each line representing the distribution along a single block from start to finish?

Is this what data you’ve got and what result you’re looking for?

I’ve similar questions to LSLGuy. Also, did you notate when you passed a streetlamp? Is the streetlamp spacing constant?

Why do the X values of data points need to match? If you’re just showing the raw data points, then it’s misleading to do any kind of tweaking like you describe. It’s much better to plot all the points on the graph as they were recorded.

That’s what I was thinking. Plot a different graph for each street, but they can all go on the same graph where x = 0 marks the beginning of the street (or the beginning of the measurement). Some curves will be longer then others, but so what?

Assuming OP has several measurements of the same block (with different pacings on different walks), one logical way to normalize the data would be to run it through the discrete fourier transform and then run it through the inverse transform. This theoretical solution can be approximated simply with 4-point interpolation filters that approximate Fourier response. (That’s how Photoshop resizes when Cubic interpolation is selected; indeed one menial way for OP to massage his data would be to plot each run separately, and then to resize each graph with Photoshop so that the graphs have equal width! )

Data is from different installations on different blocks.

I want it all on one graph, with the lines the same length.

What do you mean by the “same length”?

Do you mean some data sets span a larger range of X values, but you want it to line up with others? E.g. one plot goes from x=0 to x=5, and another plot goes from x=0 to x=7, but you want the second plot shrunk so that the x=7 point from the second plot is in line with (directly above/below) the x=5 point from the first plot? If so, then all you have to do is normalize each data set to its x range. I.e. divide every x value by the maximum x value of each data set, and use that as the x axis. Each plot will then span the range from x=0 to x=1.

It doesn’t sound as if what you want to do can be sensibly depicted as a graph like this. The problem is that the distance walked isn’t the right metric. You seem to want “distance from a light installation” The problem being that as you walk away from one, you are walking towards another. Your light measurement will be the least half way between installations (roughly).

You can plot either, absolute distance from the nearest installation (which means the plots won’t be of the same length, but will probably be more understandable), or normalised distance from the nearest installation. It isn’t clear to me that the latter makes sense.

There is a whole world of algorithms for doing the interpolation of the graph, but you can probably get away with the simplest. Piecewise linear is almost certainly fine. Lagrange interpolation if you want to be fancy.

I agree. But even if plotting against normalized distance makes sense, interpolation is the wrong way to do it, if the goal is to show the raw data. It amounts to fabricating data points. Either show the raw data as data points, or show a fitted curve, or both. Don’t interpolate the raw data and show it as data points.

OK, so how do I do this in Excel? Right now I have three columns of data for three different streets, some data sets turned out longer than others. I think the X values now are a new point every cell downwards, so some lines wound up longer. Since each total distance is exactly one block this looks odd to have them different.

You’ve got a right mess here. It appears you gathered your data before you had decided how to report it. That’s a poor technique as you’ve now noticed.

Clearly each of your blocks is a different length in both 4-pace units and in actual measuring tape feet.

So your problem is you want the results in units of blocks and round fractions (e.g. 1/10ths, 1/35ths, whatever) of blocks but that’s not how you gathered your data.

As well I bet the street lights are in different spots along each block. e.g. the first light is found 20 feet from a corner on one block, but is found 40 feet from the corner on a different block. An interesting question is if the streetlights themselves are spaced the same distance apart on each block. I bet the answer is almost, but not exactly. With weird spacing situations at every intersection. Which will complicate any efforts to make block-to-block visual alignments on a single graph.
So you need to decide what factors you’re trying to preserve or emphasize and what you’re willing to sacrifice in your depiction of the data. Then gather the data to support your intended display. Here’s one potential approach that might match your goals. There are many others depending on what your trying to display most accurately / persuasively.
If my goal was to demonstrate (lack of) uniformity of lighting I’d first capture data starting with the first data point directly under the first light of each block, then continue to the next light past the end of that block. Then go back and capture more data in the other direction *before *that first point. If we call that first point #0, we might have 28 more points numbered 1 to 28 after #0, and 4 points numbered -4 to -1 before #0.

If the light spacing is uniform enough between the blocks, I’d also make sure that my measuring interval was chosen to put a data point directly under each successive light.

For display I’d align all those #0 points and add zeros above or below any shorter blocks to pad any shorter data sets to be the length of the longest data set.

Finally I’d draw the whole thing using a multiple bar graph.

I wouldn’t do that. But if I did, It would depend on what the data shows.

In some of the data sets I used, I could just fill in zeros at the end, or duplicate / extend the last point at the end. In some cases I would have to allign the peaks before filling the ends. If I had multiple peaks, I might have to split the data before I alligned the peaks and filled the ends. If the minima was more important than the maxima, I might have split and filled at the peaks instead of at the ends.

Does your block data have multiple peaks per block (multiple street lights)? Are there enough points so that the data is accurate, correctly showing the maxima and minima? If the data is not oversampled, doing a Forward and Reverse Fourier transform to normalisze the number of points will be very confusing for people who don’t understand the process.

Perhaps you would like to interpolated 30 / 35 points between each real point, which would preserver the real points while increasing the total to a consistant number.

I would do this quite differently. The distance you are between lights is immaterial to the core problem. You are trying to show that there is inconsistency in the light levels, and maybe that the minimum levels are too low. Just create a histogram of samples. Your 4 paces per measurement is as good an unbiased sample mechanism as any. So all you need do to demonstrate the point is to bin the measurements in a suitably chosen histogram. If there are actual standards for illumination you would want to put a bin boundary on the standards values(s). Play with the histogram bins - you can probable get a set of bins of the nature "X% dimmer than minimum standard, 2*X% dimmer, X% brighter … etc. Half a dozen to maybe ten bins. Should convey the point perfectly.