I help a bit with a very small family business (more along the lines of a vaguely profitable hobby), and am beginning work on a website overhaul. As part of that process I’m contemplating using the website to build a sales database. I’ve never really built a database beyond a very simple, two-table thing, and I’m using this as an excuse to learn more about them. First, let me say, that even as a kinda smart guy, these are tough to get my head around. No wonder database admins are so well paid (at least, that’s what I hear).
Anyway, please bear with me on this, as a lot of the terminology is new to me; I’ll try to be clear.
So far I’ve got a good data structure and set of tables to describe the Product, and to describe a Customer. That was the “easy” part. What I’d like to do is track sales of specific products, and be able to look it up based on a variety of information. How much of Product A did we sell in November? How much of Product A has Customer B purchased?
I also have an entity that describes an Order. Now, I understand that the relationship between a Product and an Order is “many to many” (IOW, each Product can be in many Orders, and each Order can have many Products). So, I need to create a junction entity between the two of them, ProductOrder (or some such name).
Is this where I would store information about quantity purchased? As I’m writing this all out, I actually think that it’s clear that the answer is yes.
What, now, if I want to answer the question of how much of Product A that Customer B purchased? The tables in simplified form look like this:
I’m fairly new to SQL myself but it seems to me that if you have all of your relationships set up correctly it should be easy to set up a mathematical query that will SUM all the Product Quantities where CustomerID=x AND ProductID=Y. You shouldn’t need to specify an OrderID. Your query will return all orders and SUM the amounts WHERE ProductID=Y.
You’ve actually run into one of the major pitfalls of relational design. You could store the quantity ordered in the ProductOrder relation, but there’s a bigger problem: what if you’ve got many orders in your database for Product 123, but some time in the future, you decide you’re no longer going to sell that product? You could delete 123 from the product table, but now all your ProductOrders containing that item are either gone (cascade delete) or orphaned. That’s not good if you need to re-generate old invoices or print an old receipt for somebody.
Moreover, what if Product 123 changes between then and now? What if the customer’s address or tax information changes between then and now? If you were to print a second receipt with the relation intact, it would be different than the original receipt from before the product changed. Kind of like a Doper commenting on someone’s signature, and then the person’s sig changes but the post doesn’t, and now the post makes no sense.
One of the major ways in which purely relational designs fail is when you have to deal with temporal data like this.
So my suggestion would be not to make your order history relational at all. Blasphemy! Heresy! Yeah yeah yeah. (Well, not really, if you read a good book on dimensional modeling, but that’s beside the point.)
The way I would do this is to create a “flat” Orders structure, which has no foreign keys at all, and to copy your product and user information into it when an order is placed. This allows you to answer questions about orders, and to get information about them as it was when the order was placed. It goes totally against ideas of referential integrity, but that’s OK because the rest of your database can still be perfectly relational.
So I would do something like this:
Orders table:
Order ID
Customer Name
Customer Address, etc.
OrderItems
OrderID
Product ID
Product Name
Quantity
Note that Product ID here isn’t really a foreign key to the Products table, and isn’t really required at all, because you’ll copy information there instead of relying on relations.
Now a thousand hordes of angry database developers will call for my execution.
I disagree that the Product ID isn’t really required. Yes, you’re storing all of the necessary info as needed for reporting etc. but what about history for reordoring purposes? Imagine that a customer calls up and says I’d like to order more of that stuff I got the last time, or maybe they look through their past orders on a website and click on something they’ve ordered before. We want the salesperson or the website to be able to either find the product for reordoring or respond by saying that the product is no longer available or is currently out of stock, and maybe even suggest something similar.
Sure, my point is you could use a database Product ID, or a SKU number, or something like that. I was just pointing out that in my design it’s not a fkey.
DBA checking in. Your design is fine so far. You should have a unique compound key on table ProductOrder (ProductID + OrderID) and foreign keys back to tables Product and Order.
friedo, you make a valid point regarding temporal data, but I think you take it too far by placing the product description in the order lines table. There is a balance between accurately storing information vs having to update every copy of data when an update occurs. Shipping and billing addresses, on the other hand, should be recorded per document due to the frequency they are changed/overridden at order entry (generally speaking).
Eonwe, you want to add a unique Line Number (per order) to the order lines table because an item can typically appear on multiple lines on an order for a variety of reasons (different price, different requested ship date, different configuration options, etc.) - depending on your specific application of course.
Since this is all new to me, I’m going to go slowly, and piece by piece:
Heh, thanks. That’s what I need to know. Running querries is something that is also completely new (all I’ve done so far is to SELECT this FROM that WHERE types of things), and as long as I know that certain things are possible, I look forward to learning the logic behind querries once the data is set up.
Well, that actually happens fairly often; we discontinue products fairly regularly, and/or revive some as well. I figure I’d deal with this by just having an active/unactive flag on every product. Even if a product is discontinued, it stays in the database, it just doesn’t show up on searches for current products.
So, you’re suggesting having the information in this Orders table be completely independant? So each entry in it is just a direct transcribing of the order information, and not a pull from other tables? So that no matter what may change in other tables, the Order information is static once it’s been entered? That makes sense. Though, for my purposes, which are basically to track sales internally, and display products on the website, I’m not sure that I need that kind of stability in the data. As long as I can pull a customer and their orders, I don’t care if they’ve changed their address, their name, or any of that.
But, I like the solution, though. It keeps static records.
For full functionality, though, I’d want something like CustomerID in the Orders table, right? So that, even if Customer A changes his name to Customer A1, entries for both names in that table will share the same CustomerID?
This is actually something that’s been a little unclear in the books and websites I’ve been reading. Is the compound key one field with the other two concatenated within it, or is it just the existance of the two foreign keys in the new table? It sounds like you’re saying that there are three fields needed, ProductID, OrderID, and OrderID_ProductID?
Hm, so basically identifying each line item in the order by the sequence it shows up on the order form? So, if the first item is Product A, it gets a number 1, the second item is Product B, it gets a number 2, and the third item is also Product A it gets a number 3?
This is also something I don’t forsee needing, but from a general “sales database” perspective it makes total sense. And, since I’m doing this as basically a solution in search of a problem for my own edification, I’ll incorporate it!
So would my primary key for the OrderItem table be a combination of the OrderID and Line Number?
I think this is a good idea. A nice approach is to have a column called isActive (which has datatype BIT in SQL Server) which is either equal to 0 or 1.
I think Friedo is over-complicating things for a beginner creating a first database. I would keep the relationship between Product and Order but duplicate columns that are expected to change over time, e.g, Product.CurrentPrice and Order.OrderPrice.
You could set up an audit trail with triggers to capture changing data or a data warehouse with dimensions (as Friedo mentioned) but this would be complicated and overkill for your needs.
The compound key is just the existence of the two columns that must be unique. In your case, they happen to be foreign keys. You definitely don’t need column OrderID_ProductID.
Keeping in mind you are just beginning, you may not want to do it. However, I mention that one in particular because I’ve seen how often restrictions like that (not being able to do it) quickly become a problem.
Yes, that is one way you could do it that is perfectly acceptable
Primary key=Order ID, Line Number
Note: that’s 2 fields, not 1 field that is the combination of the 2 fields