Anyone care to help with a database design conundrum?

First, some quick background. I’m my company’s jack-of-all trades computer person. I mostly do light programming in Visual Foxpro 9, in which I am self taught. I’m also familiar with Access, VBA (though mostly in Excel, I have yet to use VBA in Access), and Crystal Reports. All of which I’m also self taught in.

I’m familiar with the concepts involved in relational databases, but have never mastered implementing them properly, so my databases tend to be hybrid flatfiles/relational. That’s not a huge concern for me right now, as I tend to get the job done, I just throw that out there to give you an idea where I’m coming from.

So here is my dilemma. I need to create a new Access database to track MLS (Multiple Listing Service, it’s where real estate agents list properties they want to sell) statistics. In the MLS there are essentially two types of properties I’ll be tracking:

  1. Sold properties - Simple as that, these properties have been sold and that won’t change.

  2. Active properties - These properties have not yet been sold and can change status at any time to Pending (there is an offer on the house, but it’s not final) to Expired or Canceled (either way they are off the MLS and I no loner care about them).

Each month I’ll go into the MLS and run a report on all listing activity for that month. That is: Number of sold properties, number of pending properties, and the total number of active properties. Where I am having trouble is in how to deal with the active properties. Let me explain. As I mentioned once a property is sold that’s that. I store it in my table and never have to worry about it reappearing (if it gets resold later it counts as a different transaction so no worries there). But with actives, as house can be on the market, for all intents and purposes, indefinitely. So each month a property is active, I need to report it as such. Yet I also need to be able to report on historical active properties. My initial thought was, well who cares if I have duplicates, each time a property is active I’ll add it to my table and note which month it was added. The problem, however, is that each month, in one MLS alone (and I could be doing this for 20 or more MLS’s in the future) there were 15,000 or so active listings, of which about 80% were active the month before. So as you can see, in no time this database could get very unwieldy. However, I only need to retain sold records after reporting on a given month.

If you’ve read this far, I’m already grateful. If you have any thoughts on how to setup a database to allow for “easy” reporting of historical sold properties as well as month by month active properties without having an insanely large database I’ll be eternally grateful! Or if you think this might be a bit much (though if I may brag I’m quick to learn) then I’d be interested to know that too.

Thanks!

Jeff

I don’t think you need to worry too much about separating sold and pending properties into different tables - in fact, I think you definitely shouldn’t. I wouldn’t worry too much about the number of records either - you can have millions of records in an Access database, especially if the table is a fairly simple one.

So I’d go for something like:
A table of properties - containing the address and general attributes of the property, maybe including a field ‘status’. Give each property an autonuber ID

A table of events, which can contain multiple records for each property - so if Property ID 1234 went on the market a month ago and sold today, this would be represented in the events table as:
1234, Listed, 2011/01/11
1234, Sold, 2011/02/11

That way, you can track anything that happens to a property and by joining the two tables in a query, list properties that experienced a specific kind of event in a specific time period.

Thanks for the tip. That will actually work well, as each property in the MLS has a unique ID (listing) number attached to it. So as new records come in, I can check against the listing number and if it is already in my table I can just add a record to the event table.

My plan was to do the monthly data imports with the built in Access import wizard, but this sounds like I’ll need to use code. Is that correct? If so, know of any good websites with sample code? I’m gonna search for myself, but it’s always nice to get recommendations from those in the know!

Thanks,

Jeff

Not necessarily. It should be easy and effective to set up a staging table, that stores the month’s data dump from MLS. Then, you can write a set of queries to make whatever changes to the data tables need to happen based on a comparison of the latest import to the current data; append queries to load your new properties to the properties table, and new events to the events table; update queries to change the current status of properties in the properties table, etc. And a delete query to clean out that staging table for when you’re done. Then, you can put all of those queries in a macro, so you can run them all each month with one click of the mouse. (You could even have the macro call the import wizard & import the data, so that it does all the work for you.)

If you go that direction, database maintenance will also be important. Access does not reclaim disk space after deletes–you need to compact the database. So a compact & repair should be part of that monthly task list as well.

Awesome, thank you!

So, here’s how I’m seeing it work.

The tables I see are one I’ll call “tblProperties” which has all MLS data except status, instead it has as ID field pointing to table two, “tblStatus” which of course has an ID (key, whatever) linking each record to “properties,” a date field, and a status field.

I import my monthly csv into a staging table I’ll call “tblTemp.” And here’s where I get a little fuzzy, not in the SQL code, but in the actually process. Are my steps then to:

  1. Append all solds from tblTemp to “tblProperties”
  2. Delete all records from tblTemp where status is sold
  3. Update query against tblProperties based on listing ID, if match is found update tblStatus with current status and date
  4. Delete from tblTemp where listing ids match in tblProperties
  5. Append existing records from tblTemp into tblProperties
  6. Delete all from tblTemp

Does that sound about right?

Let’s see… so you need to track a few things:

  • Active/Sold/Pending/etc…

-How long on market/current status

I think Mangetout’s on the right track- you have a status history table showing the entire history for a property, and then a table for the property itself.

I’d probably not use the MLS listing number as your key between them - doesn’t that number change every time a house is listed? If you use a different key, you could track a property through multiple sales.

Let’s see… so you need to track a few things:

  • Active/Sold/Pending/etc…

-How long on market/current status

I think Mangetout’s on the right track- you have a status history table showing the entire history for a property, and then a table for the property itself.

I’d probably not use the MLS listing number as your key between them - doesn’t that number change every time a house is listed? If you use a different key, you could track a property through multiple sales.

I don’t know…this is one of the few times when I don’t actually see a need to normalize your data into separate tables. One table with the data about the property, as well as fields for “Date Listed” “Date Sold” with a radio field with options for “Active”, “Pending”, “Sold”…or whatever. Then you simply have to run a query with a specific date range. Show me all the records with “Active” between these date ranges. Unless I’m missing something, you shouldn’t need to make a historical table. I guess if someone wanted to look back and see what date a property was active vice sold…but even then I’d just take the easy way and make a couple of date fields. On for each type of status a property could be.

A resource that I used fairly extensivly when I did database work was here. I’ve found a lot of great code that I could tweak to what I need. Let me know if you need any more help, I could always whip up an example database and mail it to ya.

If you’re keeping a history of all status changes, you’ll want this to be an append query. Otherwise, you’ll overwrite last month’s status & lose the history. Otherwise, this all looked fine to me.

The problem I foresee with that is that properties can and do flip back and forth between Active and Pending*. If you really want a full history of what was active when, you’ll need a history subtable.

*Way too often, judging by the curse words I learned back when my father used to sell real estate.

Right. I was going down the same track that he did, until it occurred to me that if you only kept the current status, and the status change date, you’d have no visibility into any prior activity, even if you’re technically fulfilling the requirements.

I want to thank you all for the help here, you guys rock!

Over the weekend I got a Database Demystified book, so I’ll be reading that to get a better grasp of how to properly setup a relational system with historical changes and all that.

In the mean time, though, since this report needs to get out Thursday I’m going to go with dumping everything into one table. Once I have a better grasp of RDBMSs I’ll take on the task of separating the data out.

Thanks again for the help, and I’ll probably have more questions once I finish that book.

Jeff

Yeah, I thought about that after I posted as well, even made a quick and dirty database to play around with it. And I think this runs into the “do it right, or do it the way that they’ll use it” conundrum. With all the databases I’ve ever done, I’ve set the data up right. With change history, great normalization, able to be queried six ways from Sunday. And the users never utilize those ‘advanced’ features. I’m willing to bet that GildedLily will run into the same thing. Nobody will ever need to data mine the listings to see that yes, a property has switched between pending and active 6 times in the last 4 months, what is going on there? They’ll just want to see how many properties are active in a given month, how many have been sold in a given month, and how many are pending. It’s probably my irritation at stupid users showing through, but I’ve gotten over doing things the right way and now just give them what they’ll use.

However GildedLily, as it seems like you’re just starting to get into more serious database design, I think the methods suggested by bump and SCSimmons are a better option. Do take a look at the site I linked to, because I specifically remember using a piece of code I found there for doing record history. Here it is. Note that especially when dealing with that site, the version of Access you’re using will dramatically change what you can and can’t do. Every version I’ve used has changed something in the way the database application works. Good luck!