MySQL best practice: How to combine multiple queries into one to save on network overhead?

I understand you inherited a, shall we say, “idiosyncratic” installation. But I have a question about this part, which you’ve said in more than one post:

Am I understanding that your webservers and your MySQL server(s) aren’t co-located and instead are connected across the public internet? What the heck is up with that?

I certainly understand SOA, cloud databases, etc., as modern architecture principals, but your installation almost certainly predates that era.

You mean whichever script updates the database also updates the cache? That makes sense, as long as there’s only one (or a small number of) scripts that regularly interfaces with the database. I’ll check with my colleagues tomorrow.

[ramble]

We ended up using Redis instead of Memcached, but they seem much the same except Redis has some really neat features (built in master->slave->slave replication, hashes instead of just key-value pairs, etc.). The Predis library works really well with it, though another one (phpredis) is supposed to be faster, just somewhat harder to install.

Earlier today (before I read this), we got it working where the Redis master cache server polls MySQL every minute for changed products – a simple cron job. That took less than a millisecond each time if there were no changes, and about 3 ms if there was a changed product. And then a slave Redis would automatically replicate it from the master, and other slaves (if any) would replicate it from the alpha slave. And so on and so forth, the data would clone itself until it took over the universe and everything was gray goo…

I’m surprised the replication works as effortlessly and as well as it does. Even if we change the master Redis to be updated via the update script instead of a separate cron job, the replication can stay. The best part is how one alpha slave connects to the master Redis cache over the internet, and then ever individual developer machine’s slave Redis connects to the alpha slave over the LAN. It only hits the live cache once over the internet, and everything else just propagates internally.

Setting all that up took just one command – SLAVEOF redis.server.com. Magic!

[/ramble]

Hah! Good question, and sorry for the confusion. Short answer: The other servers are test environments, and to reduce programmer confusion, we wanted them to have the same live dataset.

Long answer:
In the beginning, there was just one server, hosting both the web server and MySQL.

As our team grew, we started to need more local dev environments and also some remote test servers to test out different CDNs without affecting the live server.

For a while we just occasionally exported the live MySQL db and manually imported it into whatever environments needed it. That worked OK at first, but eventually we ran into sync problems where the test databases wouldn’t have some product that the live database had and caused various errors. Our programmers would forget that they were on an older database and spend hours debugging product display issues that arose simply because their database wasn’t up-to-date.

So instead of having local, rarely-updated MySQL databases, we switched the code to make ALL the dev and test servers connect directly to the live MySQL server. On the live webserver this wasn’t an issue because it was all localhost, but once those queries started going over the internet, it would take 45+ seconds(!) to load a page.

We started looking into MySQL replication, but then somehow stumbled upon Memcached and Reddit and wondered if we could kill two birds with one stone. And indeed we did… after replacing MySQL calls across the board with Redis, the product display loop saw a 5x speed improvement and the webpage as a whole sped up 2x, from 8s to <4s load time.

Eventually we might just drop MySQL out of the loop altogether, because everything we need can be stored in Redis hashes and we never make really complex SQL queries anyway. But that’ll need further testing.

Rewrite the getPrice function to do something like this (not a php programmer, so written is pseudo code)

global variable for xml table of product and price

function GetPrice (In product, out price)
if global xml is empty
populate global variable from mySQL database for all products
end if
search xml for product
return price
end function

All I can say is that we have no issue. I really can’t say more as it is talking about the technology at my workplace. Our daily active users is in the millions.