Area Under a Curve Question

dolphinboy · January 14, 2015, 6:08pm

This is something I should know from middle school, but with today’s tools I’m hoping it’s easier to do than it was 40 years ago.

I create line graphs for work using Excel. I need to be able to bisect one of these graphs horizontally at a specific point and determine the area of the curve above and below the line. Is there some easy way to do this that doesn’t involve solving equations?

If someone can provide me a simple example of how it’s done that might help me understand it better.

Chronos · January 14, 2015, 6:24pm

Important question: How are you drawing your lines? Are you just connecting the dots, or are you fitting a smooth curve to the data points? Also, if you are connecting the dots, do you consider the jagged bunch of line segments to be the “true” curve, or are you more interested in the “fundamental” curve of which that’s just an approximation?

beowulff · January 14, 2015, 6:25pm

How do you go about generating the graphs?
A simple approximation would be to take the rectangular height of each point on the curve (above and below the horizontal line), multiply by the width of each bin, and sum them.

dolphinboy · January 14, 2015, 7:44pm

I am just connecting the dots and not fitting a smooth curve to the data points. I need to include all points under the “curve”, even the jagged bunch of line segments.

If I plot the data for a year it will be mostly flat, but for a few weeks there will be large peaks. After those weeks it becomes mainly flat again at the same level as before the peaks happen. I want to draw a line at the level of the flat area so I can determine how much area is between the flat part and the top of the various peaks.

Thudlow_Boink · January 14, 2015, 7:50pm

By slicing the region vertically at the data points, you can carve it up into regions that are easy to calculate the area of (like a rectangle with a triangle on top), and add those individual areas together.

dolphinboy · January 14, 2015, 8:06pm

Is there an application around than can do this easily?

friedo · January 14, 2015, 8:22pm

Yes, it’s the one you’re using. Given that you are creating these graphs using Excel, it should be easy to set up a formula to calculate the needed areas from the original data. But without some more details about the problem it will be difficult to show an example.

Kyrie_Eleison · January 14, 2015, 8:31pm

Look for something involving Simpson’s rule. That’s a numerical method that is more or less what Thudlow Boink was suggesting. The basic idea is to break the area under a curve into small segments, approximate their area as polygons, and sum them. Usually, the smaller the segments, the more accurate the estimate. There are curves for which this doesn’t work well, but from your description, it seems that your problem is one where it would be a reasonable approach.

A quick googling turns up some instructions for doing this in Excel. I’m too lazy to evaluate them though, so I won’t vouch for them, and more than likely, you’ll need to adapt them for your particular situation.

dolphinboy · January 14, 2015, 8:34pm

Let me try one more time to explain it. I am tracking RMA data using a simple Excel spreadsheet. On the Y axis is the number of RMAs received and on the X axis is time (weeks), so for a one year long graph I have 52 points, each with varying heights depending on the number of RMAs received for that particular week.

For most of the year the RMA volume is fairly steady, but for 2 or 3 months of the year we see a spike in RMAs. This spike is related to a particular problem we are tracking. So we have the baseline level of RMAs and an additional amount of RMAs due to this specific problem.

I want to know the area under the peaks but above the baseline so I can determine how much additional RMA activity is the result of this particular problem.

How can I get Excel to tell me that?

Leaffan · January 14, 2015, 8:40pm

Can’t this information be gleaned from the raw data? Every place I’ve worked has coded RMAs with a “reason for return.” Don’t you do that?

dolphinboy · January 14, 2015, 8:43pm

Thanks, I’ll give it a try.

beowulff · January 14, 2015, 8:52pm

Don’t think of this as a graphing problem.
This is simply the sum (each month RMA’s - baseline).

Thudlow_Boink · January 14, 2015, 9:07pm

Yeah, although I was actually thinking of the “trapezoidal rule.” For an actual curved curve, Simpson’s Rule is more accurate, but the OP was talking about a graph made up of connected straight line segments, if I understood him properly.

Kyrie_Eleison · January 14, 2015, 9:27pm

Given the additional detail offered in post #9, I too, am wondering if you’re not making this a more complicated problem than it really is. If you have a weekly baseline that represents the norm, and data representing what is really happening, why is it difficult to to compare them on a weekly basis?

I think you may have understood him better than I did. :o

Pasta · January 14, 2015, 9:33pm

Agreed. To add a concrete example, if the whole calendar is 19 weeks long and weeks 8 through 16 are the ones of interest, just take the baseline from the outside weeks (in the example below, avg = 48.2 per week under normal conditions) and subtract it off. That yields the third column here:


 
columns: week, RMAs, RMAs-baseline
 1  56   7.8
 2  42	-6.2
 3  46	-2.2
 4  55	 6.8
 5  45	-3.2
 6  46	-2.2
 7  47	-1.2
 8  86	37.8
 9  89	40.8
10  76	27.8
11  87	38.8
12  89	40.8
13  92	43.8
14  90	41.8
15  87	38.8
16  65	16.8
17  54	 5.8
18  45	-3.2
19  46	-2.2
 .   .    .
 .   .    .
 .   .    .

Then just sum up (or average, as desired) the excess rates in the weeks of interest, #8 through #16. Here, the excess in weeks 8 through 16 comes to an average of 33.4 RMAs per week.

You can choose to graph the data and overlay a line at 48.2 as well, but there’s no reason to use that graph as a calculation tool.

Chronos · January 14, 2015, 9:36pm

Don’t think in terms of the X and Y axis of the graph. Think in terms of the columns of the spreadsheet that were used to create the graph.

Punoqllads · January 14, 2015, 9:38pm

This.

If values are in column V

The sum under the baseline is the sum of min(V, baseline)
The sum over the baseline is the sum of max(V - baseline, 0)

Topic		Replies	Views
Area Between Curves in Which One of the Boundaries is Itself a Function Factual Questions	3	723	January 23, 2009
solve this math question (not homework...HONEST!) Factual Questions	14	1055	December 6, 2002
Integrals and negative area, or help Rob win an argument with his professor. Factual Questions	17	1954	February 15, 2003
I need the equation of a curve someone else drew... Factual Questions	8	1106	August 8, 2002
what exactly are differentials and integrals Factual Questions	29	2532	December 21, 2004

Area Under a Curve Question

Related topics