What is a relational database?

I have spent most of my professional career as a developer (as opposed to user) of relational databases (Ingres, Sybase and Cloudscape), so let me take a stab at answering this question.

A relational database stores and presents its data in relations, which are commonly called tables. Relations have tuples (commonly called rows), and rows have attributes (commonly called columns). A relational table is like a set of rows - there is no defined ordering to the rows in a table (i.e. the order that you get the rows when you fetch them from the database is a happenstance - they have to come back in some order, but no particular ordering of the rows is guaranteed unless you specify it as part of the query). The rows in a table all have the same number of columns - there are no “variant records” in a relational database.

Note that a table is not necessarily stored in a file by itself. A relational database hides the details of storage from the user. What makes a database relational is how it presents the data to the user, and how it allows the user to get at the data. Some relational databases put the data from many different tables into one file. Others allow the data from one table to be split among several files, and even among several different computers on a network. Some store their data in memory. Regardless of how the data are stored, a relational database will allow the user to get at the data without any knowledge of the physical storage.

The data in a relational database may be broken into more than one table. Sometimes the data in one table will refer to data in another table - for example, if you have a table of employees and another of departments, each row in the employees table may refer to a row in the departments table. In a relational database, these connections between tables are always done with data values rather than with physical pointers. In the employees-departments example, the departments table could have a column with the department number (a unique number identifying the department), and the employees table could have a column with the number of the department the employee works for. The important thing is that this link between the tables is done with normal data values. In some other types of database systems, the link between one type of record and another is done with a pointer that gives the physical location of the linked record.

In the employees-departments example, the department number in the departments table would be the primary key of the table. The primary key is a set of columns (often one column) whose values uniquely identify the rows in the table. In a relational database, every table must have a primary key (this rule from relational database theory is often poorly enforced - even some expensive, high-end relational database systems don’t require the user to define a primary key).

A foreign key is a set of columns in a table that refer (or point to) the primary key in another table (or even the same table in some cases). In the employees-departments example, the department number in the employees table is a foreign key referring to the primary key in the departments table. Foreign key values need not be unique, and usually aren’t. Relational theory doesn’t require a table to have any foreign keys.

A relational database will enforce integrity rules with respect to primary and foreign keys. It will not allow a row with a foreign key that refers to a non-existent primary key. In the employees-departments example, this means it won’t allow an employee with a department number that doesn’t have a matching department number in the departments table. This integrity rule may be enforced when inserting new employees into the employees table, or when deleting departments from the departments table.

A relational database has a non-procedural language for manipulating data. That is, there will be a query language by which the user can describe what operations are to take place without telling the database system what steps to take in performing the operations. The most common query language is SQL, or Structured Query Language. In the employees-departments example, one might want to know which departments have more than ten employees. In some non-relational database systems, one would have to write a program to step through the data to figure this out. In a relational database system, one would write a query describing the results one wanted to see - something like:

select d.department_id, d.department_name
from departments d
(select count(*)
from employees e
where e.department_id = d.department_id) > 10

Note that the query doesn’t tell the database system how to get at the data - it just describes a desired result. It’s up to the database system to figure out how to get the requested data and, if there’s more than one way to get it, which is the best way.

I believe I’ve hit most of the high points. There are other parts of the relational model, including other types of integrity constraints, views, transactions, domains, normal forms, etc.

One last important point: the relational model was invented to promote data independence. Earlier types of database systems (such as hierarchical and network) required users to know how to physically get at the data. Software written to use these systems had to navigate through indexes and physical pointers between records. If the physical design of the database changed, the programs that used the data would also have to change. Relational databases largely free people from having to hard-wire details about physical storage into their programs, which makes it possible to re-organize data without having to re-write software.