MS Access question: Finding the minimum value in multiple fields

I was thinking about this last night and I agree, the data is probably not normalised; suppose you have three salesmen, Mr A, Mr B and Mr C, they are in charge of selling a range of different products - including apples, oranges and bananas; you wish to keep a record of exactly how many of each item each salesman has sold…

In a spreadsheet, you’d do it like this:


         Salesman  A    B    C
Product  

Apples            35   22   12
Oranges           54   12    1
Bananas           11   33   26

And it’s tempting to set up your tables the same way; a column for each salesperson and rows for each product (or worse, and we’ll see why in a minute) a column for each product and a row for each salesperson.

So what happens when you employ Mr D? - you have to redesign your entire application to accommodate a new column (and if the columns represented single products, that’s an even bigger problem because most businesses have more products than employees and change them more frequently).

The solution is to store the data something like this:



Product   Salesman   Qty
Apples        A       35
Oranges       A       54
Bananas       A       11
Apples        B       22
Oranges       B       12
Bananas       B       33
Apples        C       12
Oranges       C        1
Bananas       C       26

That way, if we want to find out which product sells most/least globally, or it we want to find out which salesman has the least turnover, or which salesman is the best at selling apples, they are all very simple queries.

The next step is to notice that the words ‘Apples’ etc are very commonly repeated - it’s wasteful to store the data like that (and if someone types in Aplles, it will be overlooked by your query criteria) - so the product descriptions are moved into another table, thus:


ID Product
1  Apples
2  Oranges
3  Bananas

and your sales data can be stored more efficiently, like this:


ProdID    Salesman   Qty
1             A       35
2             A       54
3             A       11
1             B       22
2             B       12
3             B       33
1             C       12
2             C        1
3             C       26

The column ProdID in the sales data table is then called a foreign key and it can be joined to the ID field in the product descriptions table whenever you want to see the description alongside the sales data.
This approach has many advantages, such as when you decide to change the description of Apples to ‘Juicy Apples’ - you only need change it once, in the product description table.

The process of conforming your data to this kind of model, unlocking the true power of relational databases, is called Normalizing. There’s a lot more to it than the basics I’ve described above

You’re presuming a lot about the database structure here… normalization is powerful of course, but it’s not relevant every time different fields are comparable. For instance, what if A, B, and C stood not for different salesmen, but for different expenses on a service ticket (to use an example from a DB table I’ve been working on recently.) Of course, you wouldn’t want to ever call fields in a database ‘fieldA’, ‘fieldB’ etc outside of a very basic sample or testing environment, but they could have been simplifications because the OP didn’t want to give us too many specifics.

so, if our columns A, B, and C stand for labor, parts, and sublet, I would certainly think that it might be of interest, ‘which of these was the least for each service ticket, and how much was it.’ Admittedly, you still could normalize out ‘serviceexpensedetails’ into a seperate table, but that does make things difficult for some queries.

this LEAST function seems very interesting – it’s too bad that microsoft doesn’t appear to offer anything comparable :smack:

My favorite book for understanding how to normalize data is Database Design for Mere Mortals–it’s really really helped me in my job.

May I ask a related question? Or should I start my own thread? Here’s the question:

I’ve got a table with the following fields:
Date
Donation_ID
Donor_ID
Donation_Amount

and some others. Donation_ID is the primary key; Donor_ID is a foreign key.

I want to design a query that returns the largest donation for each donor, including the date (and donation_ID) of this donation. But I’ve never been able to figure out how to do it. I know it’s got something to do with grouping; any ideas?

Daniel

You’re right that it was a presumption; I used to contribute a lot over at http://www.access-programmers.co.uk/forums/ and I’d say that probably 80%+ of requests for help of this general sort are rooted in some kind of normalizing issue, so I think it is a fairly safe presumption, not that I meant to be at all patronising of course. For the most part, I was responding to ccwaterback’s question that appeared to be asking for a general primer on normal form.

Perhaps now would be a good time for us to ask Jadis what the database is used for an in particular, what do these three columns of data represent?

Sorry I didn’t get back to this thread sooner, but as soon as I found the workaround for my problem, I got busy working. :wink:

Mangetout, the database I’m using is about 150k records, each showing a different area code and exchange combination. I work for a telephone company and I’m trying to work out a least-cost-routing informational table, which requires that I populate Fields A, B and C with rates for each NPANXX combination from 3 separate carriers that we deal with. Once A, B and C are populated, I need to look at them, find out which one is lowest and then populate field D with it. The final step was to look at D and say “Which company offers the rate in D?” and then populate E with a company name.

This isn’t a database that gets constant additions or manipulations. The number of records is basically fixed, the only time I’d make any changes to it is if we got a new carrier and needed to find out if new carrier’s rates are better for an NPANXX combination than the existing 3 carriers, in which case I’d need to start comparing 4 fields. But we won’t go there. :wink:

I think it’s fair to say that data of the same kind relating to three different suppliers should be normalized so that it is stored in a single field with another field to identify the supplier. I do understand though that quite possibly this isn’t a very practical suggestion at this stage of maturity of your database though…

That’s not a very robust way to do it. It can be a bit of a pain in the ass, but you really should be adding some more tables, rather than adding more columns. Here’s how I’d do it (note that I have no idea what the data’s supposed to look like, or what appropriate column names would be for some things, but you should get the picture):


--------------------------NPANXX---------------------------
NPANXX_ID    Area_Code     Exchange    Best_Rate_Carrier_ID
1            123           A           2
2            123           B           3
3            456           A           1
4            456           B           2

----------Carriers----------
Carrier_ID      Carrier_Name
1               "Baby Bell"
2               "Mama Bell"
3               "Papa Bell"
-----------------Rates------------------
Rate_ID  NPANXX_ID    Carrier_ID    Rate
1        1            1             3.25
2        1            2             2.58
3        1            3             3.13
4        2            1             5.89
5        2            2             5.88
6        2            3             5.55
7        3            1             1.25
8        3            2             2.38
9        3            3             1.97
10       4            1             3.58
11       4            2             3.14
12       4            3             3.29

You may notice that I’m from the school of thought that every table should have an artificial primary key. I always put a primary key in every table, and I never use actual data as a primary key.

Anyway, with this setup, it’s a piece of cake to both add new carriers, as well as changing carrier names. Sure, you may have to use a few extra queries, but I firmly believe that it’s the best way to do it.

Sorry for not getting back sooner - but to tell you the truth Mangetout did an excelent job of covering normalization already, so I don’t have much to add :smack:

Wooooo! Preach it brother! My boss absolutely refuses to understand this concept (in fact he’s of the opinion that relational databases are a fad that will soon lose popularoty, but that’s another story) and I’m stuck with developing a system where the primary key is a piece of live data for the product, data which, under occasional circumstances needs to be changed, which means the program has to trawl through all the historical sales data and change it there also, except that some of the historical data is offline in an archive and so on… Grrr…

Products should have an identity that is a totally unique, unstructured and artificial number; that they also happen to have a barcode number such as an EAN or UPC, or that they have an ISBN or something like that is all very nice, but irrelevant; products have an identity by which the system will know them for all time; ISBNs, EANs, UPCs etc are mere attributes.
[/end rant]

I hate doing that. I always assume that every piece of data is subject to change. I even refuse to use a person’s SSN as a key. After all, if someone dies, their SSN can be recycled, and I’m not willing to risk the hassle of dealing with that later (or inflicting the problem on someone else) if it only takes few seconds of work right now to avoid it.

However, the worst database that I ever had to deal with was one for entering sales data. I didn’t design the database, but I had to redesign the front-end for entering data into it. For each sale, the data included product description, product price, client description, client contact name and phone number, order date, shipment date, etc. All in one table. It’s like the person who designed it had no concept of normalization at all. I about pulled my hair out trying to get decent functionality out of that thing. I’ll never forget having to go through all the data and change the spelling of client names so that every instance of a name was spelled the same.

Too true - in my particular case, the primary index that I’m stuck with happens to be ISBN and this is about to come and bite us on the arse; the structure of ISBNs is about to undergo a complete overhaul in the next few years to accommodate new number ranges.

Mangetout, I’ve been there. I ain’t goin’ back, I don’t envy you.

I think I would do this …

Salesman table:
Salesman-name, Salesman-ID

Product table:
Product-name, Product-ID

Sales table:
Product-ID, Salesman-ID, Product-Quantity

Am I on the right track here?

Yep. The only thing I’d change would be to add a “Sales-ID” field to the Sales table. Every record should have a primary key, and I generally avoid using composite keys as a matter of principal. In fact, given the format of the above tables, each salesman would be limited to selling a single product only once, unless you wanted to make the entire record a composite key (which is not a good idea).

Maybe have an Invoice table:
Invoice-ID, Salesman-ID, Invoice-Date

Then expand the Sales table:
Invoice-ID, Product-ID, Product-Quantity

???

Very much on the right track, yes.

Generally speaking (and there are exceptions all over the place), these are a few mantras you’ll hear DB people chanting:

Always… no no…
-Never store the same data in two different places (for example, customer name in both the customer table and in every one of their sales history records) unless it is a specifically intended as a key to join those two places.
-Never store what you can easily calculate (FirstName, SurName, FullName for example)
-Never try to store more than one piece of information in a single field (for example expecting the contents to be structured so that the first two digits denote some property of the record, the next three digits mean something else etc)
-All tables should have an artificial key to uniquely identify each record, ideally an autonumber field; If at all possible, avoid using of this key as anything other than a key (customer account number, for example).
-Avoid creating multiple tables to store the same kind of data (for example, Sales2005, Sales2006) - a lot of databases start out that way and even get a rudimentary application built around them, which runs into trouble and needs a lot of maintenance/recoding at the end of the first year.
-Avoid creating multiple fields within the same table to store the same kind of data (for example JimSales, BobSales, FrankSales) -for the same sorts of reasons as the point above - it’s much easier to split data apart out of a single large table using a query than it is to join together an indeterminate set of similar tables/fields.

I’m sure there are more.

Actually, that can be combined into one single table. I know, it’s sometimes hard to decide how far to go when you’re breaking information apart into multiple tables. In your above example you’re using the same primary key for two different tables – essentially, you have two tables that contain data for the same thing (in this case, invoices). That should be an indicator that both of those tables should be combined. It would probably be better to have something like:

Invoice-Table:
Invoice-ID, Invoice-Date, Salesman-ID, Product-ID, Product-Quantity

You could easily calculate sales for any given Salesman-ID from that table, so there’s no need for a separate “sales” table.

Before I answer, this thread is chock full of excellent advice. But the correct way to do things isn’t always feasible, so Daniel’s particular question has a less-than-ideal solution that hasn’t been mentioned, but should have been:

Use nested queries. For this exercise, we’ll be using the graphical query editor, as opposed to just typing in SQL statement.

First query:

  1. Create a query and add the Donations table
  2. Select Donor_ID and Donation_Amount as the only two fields.
  3. Set it to be a totals query. Group by Donor_ID, Max by Donation_Amount
  4. Name the Donation_Amount column “MaxDonation” (in the grid, make the Field line “MaxDonation: Donation_Amount”
  5. Save and name this query “qryMaxDonation” (or whatever you like)

Last query:

  1. Create a query and add…
    1a) …the donor table first
    1b) …the query you just created (qryMaxDonation)
    1c) …the donations table
  2. Join the second two “tables” (one’s a query) to the donor table by…
    2a) …dragging the Donor_ID from the donor table to the foreign key in each of the other two tables
    2c) Select LEFT JOIN (option #2) for what reason I’m not sure, but I always like LEFT JOINs.
  3. Add the fields Donor_ID and Donor_Name (or whatever) from the Donor table
  4. Add the fields Date, Donation_Amount and Donation_ID from the Donations table
  5. In the Criteria line under Donation_Amount, enter [MaxDonation]
  6. Save and name this query “qryMaxDonationDetail”

At this point, I’m sure you can tailor it to your specific needs. If a donor donates the same maximum donation more than once, you will get multiple lines for that donor. If this is a problem, you may need to go three levels deep in your nesting, although the HAVING clause might conceivably work. (It didn’t on initial inspection over here.)

And finally, on the long list of “Never/Always do this”, I am reminded of one by your question. Never name a field after a reserved word. “Date” is an exceptionally poor field name, though not quite as bad as “Name”. shudders

So this is my solution to this issues columns a to h (I had to shorten the names as the field was too long for MS Access)



iif([SubSGB].[a]+[SubSGB].**+[SubSGB].[c]+[SubSGB].[d]<
[SubSGB].[e]+[SubSGB].[f]+[SubSGB].[g]+[SubSGB].[h];
           iif([SubSGB].[a]+[SubSGB].**<
           [SubSGB].[c]+[SubSGB].[d];
                      iif([SubSGB].[a]<[SubSGB].**;
                      [SubSGB].[a];[SubSGB].**);
                      iif([SubSGB].[c]<[SubSGB].[d];
                      [SubSGB].[c];[SubSGB].[d]));
           iif([SubSGB].[e]+[SubSGB].[f]<
           [SubSGB].[g]+[SubSGB].[h];
                      iif([SubSGB].[e]<[SubSGB].[f];
                      [SubSGB].[e];[SubSGB].[f]);
                      iif([SubSGB].[g]<[SubSGB].[h];
                      [SubSGB].[g];[SubSGB].[h])
           )
)


Correction


iif(
	iif(
		iif([SubSPB].a < [SubSPB].b; [SubSPB].a; [SubSPB].b)<
		iif([SubSPB].c < [SubSPB].d; [SubSPB].c; [SubSPB].d);
		iif([SubSPB].a < [SubSPB].b; [SubSPB].a; [SubSPB].b);
		iif([SubSPB].c < [SubSPB].d; [SubSPB].c; [SubSPB].d)
	)<
	iif(
		iif([SubSPB].e < [SubSPB].f; [SubSPB].e; [SubSPB].f)<
		iif([SubSPB].g < [SubSPB].h; [SubSPB].g; [SubSPB].h);
		iif([SubSPB].e < [SubSPB].f; [SubSPB].e; [SubSPB].f);
		iif([SubSPB].g < [SubSPB].h; [SubSPB].g; [SubSPB].h)
	);
	iif(
		iif([SubSPB].a < [SubSPB].b; [SubSPB].a; [SubSPB].b)<
		iif([SubSPB].c < [SubSPB].d; [SubSPB].c; [SubSPB].d);
		iif([SubSPB].a < [SubSPB].b; [SubSPB].a; [SubSPB].b);
		iif([SubSPB].c < [SubSPB].d; [SubSPB].c; [SubSPB].d)
	);
	iif(
		iif([SubSPB].e < [SubSPB].f; [SubSPB].e; [SubSPB].f)<
		iif([SubSPB].g < [SubSPB].h; [SubSPB].g; [SubSPB].h);
		iif([SubSPB].e < [SubSPB].f; [SubSPB].e; [SubSPB].f);
		iif([SubSPB].g < [SubSPB].h; [SubSPB].g; [SubSPB].h)
	)
)