Odd MS Access question regarding CPU/HD usage

OK, this is a little esoteric, even for me; but it’s confused several people here at work.

We use MS Access to do ad-hoc design and data validations, and often load millions of rows of data into tables for analysis.

In this case, we’re loading 8 million rows of data, and then doing a “date check” and a “group by” on the rows.

During this theoretically CPU intensive process, the CPU remains steady at ~40% of capacity. (It’s a 1Gb CPU, so roughly 400Mb).

The hard drive is pegged to the wall. The hard drive activity light is on continously, and it takes minutes to write anything else to the HD. All indications are that the PC is pounding the heck out of the HD.

The question is why is the HD being pounded when the CPU is 60% idle?

Eli

Your system’s bottleneck is the hard disk drive.
I’m guessing you never read a book on Windows system performance tuning.
If you have an old copy of the NT resource kit sitting around, it has a good chapter on that.
Your experience is fairly typical of database users in your position using typical modern hardware.

This article:
http://www.webdeveloper.com/servers/servers_rww_tuning.html

gives you a sample of what bottlenecks are all about.

The CPU is not the hard drive.

The statement “It’s a 1Gb CPU, so roughly 400Mb” makes no sense to me.

Is it possible you’re confusing the Central Processing Unit with the hard drive?

Also, it does not require a great deal of processing power to do read/write operations. It’s entirely possible for the CPU to be using only a small amount of it’s capacity for this task. What’s more likely is that the 40% is being used to run your query.

BTW I apologize if that came off as condescending.

8 million rows in msaccess? shudder

The above posters are smack on though.

We sometimes run into this problem at work as well. Essentially the work you are asking of your PC when doing that procedure is not very demanding on the CPU. Which means that the CPU will try and do a lot of work drawing data as fast as it can from RAM. RAM takes data from the hardrive and that’s the problem, your hardrive just can’t keep up. It can be a limitaiton on the bus (is it an old IDE bus?) or on the drive itself (could be, that sounds like a lot of data you’re trying to process).

Also, if your application justifies the man-hours, you could probably shave substantial time by converting to a real database back-end, be it MySQL (“real database” status arguable), MS SQL or Oracle.

My guess is that he has his units confused and meant to say “…It’s a 1GHz CPU, so roughly 400MHz…”.

The hard drive is “thrashing”

The operating system does it’s work in virtual memory, which is a combination of real memory (on chips) and swap space (on your hard drive). For a big job, such as your Access query, the operating system is using a huge amount of virtual memory, most of which is on disk. To update any byte that’s on disk, that page (block of memory) must first be moved into real memory. Bottom line, the O/S spends most of it’s time moving stuff on and off disk.

It’s like you’re doing your taxes, but you are only allowed to have 20 pieces of paper on your desk at any one time. If anything else is required, you have to take a piece of paper off your desk, drive to the storage unit across the city, file it and pull the paper you need, drive back and put that one your desk.

How to fix? Buy more real memory would help of course. But I would first look at the design of your table. Adding an index on the group by field would be an easy thing to try.

Access (like all database systems) must be robust and operate in a safe & secure fashion.

It can’t just load chunks data into memory & write them out when changes have been made, what if the PC crashes half way through an operation; at best all the work done may be lost, at worst your database may become corrupt and even unusable.

What access does is to load a record, make changes, write it out to a unused area of the disk copy of the database, (extending the space as necessary), notifying the database of the new location of that record, then marking the old record as unused space.

The speed of the CPU will have little or no effect on this process at all.

(Please note that i write database applications not databases so this is just my - vastly simplyfied - understanding of the processes involved.)

Because the CPU does things so much faster than the hard disk. Every time the hard disk goes away to write or retrieve some information, which happens at glacial speed by the CPU’s standards, the CPU twiddles its thumbs and gets a nice little breather. If the Access table was stored on some faster medium, like RAM, the CPU wouldn’t be idle nearly as much.

Well, crap:

That is what I meant.

I did find out (after much prodding) that IS replaced the HD that was in the machine with an older, smaller, and more importantly, slower HD in May.

So, basically, the “easiest” answer is to replace the HD. (keeping in mind that I work for a non-profit that isn’t going to buy a new back end software for us). This won’t fix the problem, I understand, just mitigate it some.

As much as I like Mr. Slant’s suggestion of changing the back end to MySQL or something; there’s no real justification. Access can handle the job, and it’s run once per month.

Thanks!

Eli

Bear in mind that with 40% CPU usage, your ABSOLUTE MAXIMUM speed improvement would be 60%.
Thus, if your job runs for 8 hours now, you could get it down to 3.2 hours by putting in THE HARD DRIVE OF INFINITE SPEED* +2/+3 and that assumes that doing so wouldn’t, for instance, swamp your PCI/SCSI/SATA/RAM bus, which defect you probably haven’t metered for.

*It’s +7/+8 against giant spiders, but who the heck uses a HARD DRIVE OF INFINITE SPEED to fight those?