Some questions about using/managing MySQL on my Windows/Linux computer

I’m in the process of downloading and creating database files of a whole heap of historical Major League Baseball data. We’re talking a lot of data here.

There is a table containing basic data for all games since 1871, which contains about 190,000 rows. But the biggest set of data contains play-by-play details for just about every MLB game since 1957. I’ve only imported the period 1957-1969 into the database so far, but already it’s up to around 1.5 millions rows. Adding 1970 through 2006 is probably going to take it to well over 6 million rows.

This leaves me with some questions about access, storage, and performance. I’m still pretty much a newbie with MySQL, so i was hoping to get some tips and advice.

First, the data in my database is currently stored on my C: drive in the default MySQL location: C:\Program Files\MySQL\MySQL Server 5.0\data. The total size of the data is currently around 800Mb, so i’m anticipating that the final size, once all my data is imported, will be somewhere around 4Gb, possibly greater.

Now, my computer’s hard drives are partitioned such that the C: drive is mainly for Windows and program files. There is some free space there, but i’d prefer not to fill it with a 4+ gigabyte database file. I like to leave spare space to allow for defragging, and to ensure that the system doesn’t slow down.

So i’d like to move the data to another partition, preferably on another actual hard drive. But, despite some searching, i haven’t been able to find a straightforward set of instructions for doing this. It’s not like MS Access where you can just move a self-contained database file; in MySQL (if i understand it correctly) you have to move the data, and also tell MySQL where to look for it when you open the program. If anyone could help me find some instructions that would tell me how to move my database to (for example) M:\baseball\data, i’d be most appreciative.

In a related matter, i’m also wondering about making the database available to a version of MySQL running on my Linux installation. I know that MySQL will run on multiple platforms, but can instances of MySQL on multiple platforms read from and write to the same database files? That is, if i place my data on a FAT32 partition (so that both Windows and Linux can read and write), can MySQL from both operating systems use the same data files?

Also, FAT32 has a maximum file size of 4Gb, but MySQL seems to keep all its data in one large file. Currently, all my data appears to be contained in a file called ibdata1. As i said earlier, this file is likely to end up larger than 4Gb. Is there any way around this? Can MySQL, for example, split the data file into multiple smaller files, rather than one huge file? Or am i going to have to keep it on NTFS for Windows, and put a separate file on an EXT3 partition for Ubuntu?

And any other advice is welcome. Thanks.

Not an expert on MySQL here, but this page
http://dev.mysql.com/doc/refman/4.1/en/windows-troubleshooting.html
seems to say that to move your database to a different directory, you do the following:
stop the database
change the datadir value in the MySQL configuration file
move the contents of the existing MySQL data directory
restart the MySQL server

Note: I am not your DBA. Do not rely upon this as being professional advice. You should consult a DBA in your local jurisdiction. All other standard disclaimers will apply.

Thanks Arnold. I found those, and some other instructions, in my earlier Googling, but i couldn’t seem to get them to work. I’ll give it another go, and keep investigating.

ibdata1 is used for transactional type databases. It looks like you have set up your database to a INNODB engine rather than a MyISAM engine.

If you change to using a MyISAM engine, then the data is stored as “table” data and “index” data (ie. 2 files for each table) rather than a big single file for the database.

You will also notice (very quickly) that if you delete data from your INNODB, the size of you ibdata1 does NOT get correspondingly smaller.

Unless you have a good reason for using INNODB, swap over to MyISAM.

Thanks, Caught@Work. Very useful. I noticed after i installed MySQL and set up the GUIs (Query Browser, Administrator, etc.) that the database setup was InnoDB. I just assumed, because it had chosen InnoDB with no apparent input from me, that this must be the best way to go.

While i’m here, maybe i could ask another question about table sizes.

As i said above, when i’ve imported all my play-by-play data, it’s likely to result in a pretty big database (at least by my standards). Is there any advantage or disadvantage to having all that data in a single large table, as opposed to several not-so-large ones.

For example, a decade of stats is over 1 million rows and takes up about 600-700Mb, and i have basically 50 years of stats. Should i import them as one big table, 1957-2007, or would it be better to have, for example, five tables, each with ten years worth of stats?

I realize that this will make queries a little more complicated, because if i want to search the whole database i’ll have to select five different tables for each query. The reason i ask is that i opened the query browser today and launched a query that would essentially open a whole 10-year table. After it had loaded about 600,000 rows, MySQL started to slow down, memory usage went right up, and then it froze and i had to close it.

Now, i realise that most meaningful queries (i.e., queries that i can actually read and get some use out of) won’t have anywhere near that many rows, but i was wondering if there are any tips for setting up databases that contain this many rows? Is a single table fine?

In an Oracle database this would be a good recipe for a partitioned table (one partition per year for example.) I don’t know if partitioning offers you the same advantages in MySQL, but the feature seems to exist.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Thanks, that looks like a possibility, although it will involve a bit of reading to work out exactly what i need to do. As i said, i’m a bit of a newbie to all this, and those instructions tend to take for granted a level of expertise that i don’t yet have. Oh well, i’ve got plenty of time to work on it before next baseball season.

OK, let me be more specific. I think your table of play-by-play data should be in a range-partitioned table, with one partition per year.
See this page.
http://dev.mysql.com/doc/refman/5.1/en/partitioning-range.html

Suppose you partition your play-by-play data table with a date range, one partition per year.
A simple SELECT * FROM PLAY_BY_PLAY_DATA will read data from all partitions, same as if you had all the data in one table.
If you write your query correctly ( SELECT * FROM PLAY_BY_PLAY_DATA WHERE YEAR(PLAY_DATE) = 1961 ) , then MySQL will only read data from one partition - this would be like querying only one table with each year in a separate table. It’s a query optimization done for you by the database engine, described as partition pruning
http://dev.mysql.com/doc/refman/5.1/en/partitioning-pruning.html

Sorry, Arnold, i hope i didn’t seem ungrateful.

I understand the principle behind partitioning, the idea that you can partition a table and yet run queries as if it were a single table, just as you describe in your most recent post.

My comment about this being beyond my level of expertise had more to do with the actual commands that i would have to enter into MySQL in order to set up the partitions in the first place. Your links explain how to do that, but when i’m learning stuff like this i prefer to go through it slowly and understand exactly what my commands are doing, rather than simply copy examples given to me by someone else. That’s why i made the comment about having plenty of time before next season; it means i can take things slowly and learn about using MySQL properly.

Also, because i’m importing my tables rather than creating them from scratch, it’s not quite clear to me how i go about partitioning a table that has already been imported into MySQL, or if that’s even possible. My guess is that i create the table structure (column names) with the partitioning structure, and THEN import the data.

I’ve spent time in the past learning how to write SQL queries, but i’ve never really spent any time learning the commands for setting up and running the database itself. As someone who’s spent most of his time in Windows, i’m used to having GUIs that make this stuff easy. Working with the command line is still a fairly foreign concept for me.

I understand. I know that in Oracle you have to start with a partitioned table, you can’t make a non-partitioned table into a partitioned table. I’m guessing MySQL is the same.

To help you with your MySQL database, I would try this GUI tool (freeware download):
http://www.toadsoft.com/toadmysql/FreewareDownload.htm

(disclaimer - this is one of the products I work on at my company, but i don’t know a lot about it yet since I’m mostly on the Oracle side. However I don’t get any added bonus or pay raises if you download this product etc. etc.)

They have a Yahoo! group that you can sign up for if you have questions. Or you can always drop me an e-mail but I can’t guarantee that I’ll answer quickly or be able to solve any issues you may have.

Thanks for the link. It looks very interesting.

Earlier today, i actually downloaded and installed Navicat 8.0, which plenty of reviews seemed to suggest was one of the better MySQL frontends available. I’m currently running the evaluation version, which will work for a month before i need to decide whether or not to buy it.

Money’s a bit tight right now, and i’m not sure i really want to shell out 95 bucks, so a freeware GUI like Toad could be exactly what i need. I’ve also seen some good reviews of the free community edition of SQLYog.