I’m pretty much a beginner at working with SQL Server databases.
I know just enough SQL to get myself in trouble.
So I’ve hired a consultant who has been very helpful in creating some JAVA apps between my raw sources and my database.
The apps are pretty slick and I’m happy with them. The consultant also created a number of tables. Some of these tables hold tens of millions of records in them.
That’s where I get concerned. Not about the table length but the possible size of the tables.
The consultant has added four fields to each table - a created by, created when, modified by, modified when field. Two of those fields are Varchar and can each have as many as 40 characters. The other two fields are date/time so I’m less concerned about them.
My question is - how much size is being added to my database by these character fields repeated millions of times?
Any DBA’s who have an opinion please sound off.
The database is an analytical DB, not transactional. Does the inclusion of modify elements make any sense?
Are you having performance issues? If not, don’t worry about it. Too much data is only bad if it affects the performance of your applications, and even then there are things you can do (like indexing or query optimization) to help performance without trimming data.
Actually I’m concerned about the size issue. The configuration is such that I can’t mirror the database but I will need to back it up and restore to another machine frequently.
So, if the database is carrying around 5 GB of extra weight that may translate into time and money for transfer costs.
Well if you want you can make the Created/Modified fields INT types and relate them to another table with names. So instead of putting the same strings over and over again, you only have the strings once, and numbers are repeated instead.
What ZipperJJ said, which is probably the more common method. It isn’t typical to have Created/Modified data purely in the data warehouse that wasn’t migrated from the transactional side, so that part sounds a bit weird.
All that said, VarChar columns on MS SQL Server are pretty efficient. They take [actual number of bytes in the column] + 2 bytes for storage, so the value ‘abc’ would take 5 bytes of storage. If the column is Varchar(40), but the average actual usage is, say, 6 characters, you’re probably pretty efficient. If the average is much larger, then as ZipperJJ mentions, the Int datatype would probably end up saving a fair amount of space.
Thanks everybody for responding. I spoke with the consultant and he said that we could trim a lot of fat out of the database by dropping the “created by” and “modified by” columns.
His development team was already bitching at him about the size of the database. They wanted to backup and download the database in order to do some local development on it and it was getting too big too manage.
Zipperjj brought up an important issue. The database is actually designed with features as he/she described. That is, some tables use integer id fields which relate to character names in another table.
So while we were using efficient methods for some of source information we were also bloating the database with repetitive character fields.
If they can remove the fields within the next few days I’ll let you all know how much space was saved.