Programming Effort: Which is more bang for the buck?

This is a question to programmer/dba type people:

I’d like to increase the performance of my web application. I am pretty sure the following 2 methods will give me better perforance/effort/repeatablility ratio. I have never done either before - based on your recommendations, I will choose one to do first.

Adding a better caching layer on my data objects in business logic code…

– OR –

Adding more indexes and declaring all foriegn keys on the Database.

Basic background: I’d consider my applications logic as average stuff - typical business objects: people/places/things/data. Database, typical integer index tables. Foreign keys are used, as ways to relate tables, but not enforced by the database, Most data is requested by stored procedures. No additional indexing capeabilities are used. They retrieve data to the business object layer on almost a 1:1 ratio - call a business object, and near all of it properties are requested from the database. I already cache some objects on the row level. would adding caching at the table level.

Consider even removing my application from the equation entirely - in your application, - which would you choose to implement first, and why?

The short and generic answer: Add indexes to the database first.

Why? Assuming you’re using a fairly modern DBMS, there’s no coding involved. Take a look at your stored procs or queries, figure out which ones are slow, and add the appropriate indexes. Voila, faster app.

The long answer: you need to analyze your application and figure out where the slowdowns are, and do what it takes to speed those up.

Why? No two applications are the same. There’s no telling what’s slow, and you don’t even mention a specific thing (“The person search is slow.”) Adding indexes might help, then again, it might be some other logic that takes forever. You never know until you take the time to do some analysis.

A hearty second for Athena’s ideas. Until you know what’s slow, you’re just guessing at what needs changing. hell, yuo might just need more server RAM . Couple hundred bucks & 20 minutes & the probelm is solved.

So:

Formal metod: Step 1: Profile the app, both in the DB server & in the web server. Step 2: rank order the areas from slowest to fastest. Step 3: estimate the effort to fix each of the top 4 slow things. Step 4: Fix whichever ONE of the 4 has the best effort/reward ratio. Step 5: If the results aren’t good enough, repeat from Step 1. Corrollary to step 5: Don’t simply fix the second item from your original top 4 list. What you do in step 4 may radically alter your profile.

Informal method: Turn on referential integrity in your DB (can be much faster on read, slightly slower on write), add indexes to match the most common join & where conditions, ensure your DB has a maintenence plan in effect which cleans indexes, does statistics, compacts deletes, clears logs, etc. Then take the rest of the afternoon off.

In the general case I’d probably try adding indexes to the back end first. That is likely to give you a performance boost no matter what your business and presentation layers are doing.

Before adding caching I’d want to look more at whether the requests tend to gather around a relatively small number of values or are spread out. If you are getting high number of hits for a small number of entries then caching in your business layer could be worthwhile but if the requests tend to be more random then the overhead of the caching might make things worse.

There isn’t any absolute rule since the design of the database and the usage patterns are different in each case. You can only know for sure after analyzing what you have to determine where the bottlenecks are.

On preview I third Athena and LSLGuy.

Profiling is the way to go. It may be that neither of the options in the OP is going to make a significant difference.

All answers so far are good, just like to add, without knowing size of DB, indexes may not even be used even ifyou do build them. Average table size in tens of thousands of rows, hundreds of thousands, tens of millions? Lots more information required before good advice can be given.

Are the app and the DB running on the same machine? If not, do a simple load test – get a script to bombard your app until you get a really bad performance hit (as in, you start getting dropped connections.) At this point you will probably find that one of the machines is running at 100% CPU AND/OR one of the machines is “thrashing” (out of physical memory, or nearly so, and pulling objects in and out of virtual memory)

Fix the problems found – those are your show-stoppers.

Consider putting your app on a separate machine just for this test, if they do run together.

I’d go with the DB index method, especially if it’s never been looked at before. Years ago I inherited an app that would, for each user, connect to a DB and run various queries. A few queries were running really slow (many tens of seconds), and I went from not having administered a DB before to creating indices that reduced the search time to sub-second levels in less than a day.

IMHO, DBs should be doing as much heavy lifting as possible, which is another great reason to make sure they’re working to their potential.

This is true. And that’s why I eventually just suggested taking my application out of the equation, reducing the question to, which method makes the most difference, or if you prefer, assume that this code is has already been profiled and tuned.

So, given the possibility of adding better database indexes & tuning, or adding caching, which would you choose?

…Wow.

It seems to me, that there is a sort of consensus here. I’m hearing “All things being equal, additional database tuning can really add a lot of performance.”

If it is helpful, I am using SQServer, and my data is in tables in 8-10 separate ‘catalogs’ on this dbserver. One catalog is the users. Another is data. Another is relationships of users and data.

We might have a few tables that have 200,000 records, but that about as high as any of the go. As for rate of change, a few might hit 500,000 records in 2-4 years.

Indexes.

All that an “index” does, is create a b-tree (or hash) for that one element in the table. So when you do a search for:

WHERE name=“Bob”

Instead of having to search every single entry in the table, if you have “name” indexed, it can go straight to the entry. Of course, you need the extra disk space for the b-tree nodes and whatnot.

So don’t just “add indexes.” Your goal is to figure out where indexes are needed, and that’s going to depend on what all WHERE statements you have in your queries.

In point of fact, it may be that you already have the indexes you need, just that your WHERE statements use non-indexed elements first. For instance:

WHERE
name=“Bob” //not indexed!
AND uid=123456789 //indexed

Simply switching that to:

WHERE
uid=123456789
AND name=“Bob”

May be all you need to do to boost performance by a few thousand times (theoretically, Oracle and such will make such optimizations for you, but I wouldn’t recommend trusting that.)

Oh, and in previous testing, I didn’t notice any major advantage to having foreign keys declared.

I did however gain significant performance boosts by getting rid of all usage of “NOT IN” and instead using eh…MINUS (?) I think it was in Oracle and something else like that in PostgreSQL. Most likely there is some other non-standard SQL word that does the same thing in SQLServer.

If you’ve already profiled the application, then what percent of the time is spent doing DB activity? You could reduce the time spent hitting the DB by 100x and not see anything significant if the real bottleneck in your app is some silly quadruple-nested bubble sort over a million item list.

Assuming you’re narrowed it down to the DB performance, questions you need to answer are: is this database shared or is this app the only client (if another app is getting zippy performance vs. the DB and not this one, it’s a good clue that “it’s not you, it’s me”); if it is the DB performance that is the target, is this DB being accessed over the network or on the same host (network performance, be it due to shared bandwidth or software issues like replication agents, could be a factor); is the DB operation that is slow the data reads (querying) or the data writes (insertions)?

And, “adding indexes” is not automatic, it will also take some thought to use the right key fields appropriate to your query. This may not be the same key fields as other queries use, nor as the “unique key” fields on the table. And the order of the declared fields in creating the index may matter as well.

Databases are pretty much always the bottleneck. They have to do tons of disk accesses, and a disk access takes an eternity for a computer.