I was thinking about this last night and I agree, the data is probably not normalised; suppose you have three salesmen, Mr A, Mr B and Mr C, they are in charge of selling a range of different products - including apples, oranges and bananas; you wish to keep a record of exactly how many of each item each salesman has sold…
In a spreadsheet, you’d do it like this:
Salesman A B C
Product
Apples 35 22 12
Oranges 54 12 1
Bananas 11 33 26
And it’s tempting to set up your tables the same way; a column for each salesperson and rows for each product (or worse, and we’ll see why in a minute) a column for each product and a row for each salesperson.
So what happens when you employ Mr D? - you have to redesign your entire application to accommodate a new column (and if the columns represented single products, that’s an even bigger problem because most businesses have more products than employees and change them more frequently).
The solution is to store the data something like this:
Product Salesman Qty
Apples A 35
Oranges A 54
Bananas A 11
Apples B 22
Oranges B 12
Bananas B 33
Apples C 12
Oranges C 1
Bananas C 26
That way, if we want to find out which product sells most/least globally, or it we want to find out which salesman has the least turnover, or which salesman is the best at selling apples, they are all very simple queries.
The next step is to notice that the words ‘Apples’ etc are very commonly repeated - it’s wasteful to store the data like that (and if someone types in Aplles, it will be overlooked by your query criteria) - so the product descriptions are moved into another table, thus:
ID Product
1 Apples
2 Oranges
3 Bananas
and your sales data can be stored more efficiently, like this:
ProdID Salesman Qty
1 A 35
2 A 54
3 A 11
1 B 22
2 B 12
3 B 33
1 C 12
2 C 1
3 C 26
The column ProdID in the sales data table is then called a foreign key and it can be joined to the ID field in the product descriptions table whenever you want to see the description alongside the sales data.
This approach has many advantages, such as when you decide to change the description of Apples to ‘Juicy Apples’ - you only need change it once, in the product description table.
The process of conforming your data to this kind of model, unlocking the true power of relational databases, is called Normalizing. There’s a lot more to it than the basics I’ve described above