SQL Server gurus, tell me why this is a horrible idea

I’m working within a SQL Server 2007 database engine. I’ve got a sequence of tables that store data about widgets and the companies that can order them. To enable a company to order a widget, I use a SQL script that takes a single manual entry for each value, inserts them into the various tables, then creates additional automatic entries depending on the number of business units within the company.

These scripts were written somewhere around the beginning of time by people whose names are long since lost to the annals of history. That or contractors; same basic thing. Because we’re a small organization, there have traditionally been only two people that do this process, and up to now, the SOP for screwups has essentially been “don’t”, because we don’t have a script to undo the creation process once the tran is committed.

Guess whose job it’s become to correct this.

The data in these tables are interrelated with various foreign keys (and columns that really should be fkeys but aren’t). To create the Universal Widget Deletion Script, I can trace the logic of the Create_Widget proc, collect the values that uniquely identify a specific range of entries (say, Widget #7 ordered by MegaCorp), query the relevant tables to find entries matching these values, and delete those rows.

Now, here’s the actual question. I can do that, OR I can create a script that deletes every row with a CreatedDt timestamp of the millisecond that the original entry was created. That’s the one value that all of these tables share, and per the results of my testing, it’s the same for all entries created at the same time (as it should be, since the creation script only sets the value for CreatedDt once).

The problem is, every programmer’s instinct that’s been trained into me is insisting that this is a bad idea. When that happens, it’s usually a sign that there’s something I’m missing. In this case, I think it’s a realization that there’s nothing inherently forcing the CreatedDt values to be A) identical between tables and B) unique within them; they’re not foreign keys and there are no other constraints involved. If that’s the only problem, it’s okay, because A) nobody manually adds entries without using the Create_Widget script, and B) there are only two people who do this, and we’re not both going to assign a widget at the same millisecond.

That said, I’ve been staring at db values and SQL scripts for too long, which does funny things to one’s judgement…that’s why I want to use you guys as a sanity check before I do this. As per the title, please tell me why this is a horrible idea.

Thanks!

I think it’s your best way, given the information you’ve provided. I take it that there is no UI key anywhere to be found, and if that is the case, you just have to latch onto what you can.

One helpful hint might be to write the script in VB (old VB 6 will do just fine), and then put a stop or break point just after your delete SQL statement (don’t forget the WHERE clause!) on the execute line. (Or however you have it arranged, stop just before the statement executes.)

Read your SQL statement in the debug window. Open the relevant tables. Determine whether what you’ve programmed is what you expect. Then check the next two or three the same way, and then if all is okay, remove your break point or stop and let 'er rip.

I don’t know if your server is remote, but keep in mind that running the program on the server would have a different connection string than running it remotely.

Are you looking to delete old records, or just be able to fix screwups in the future with one script?

If it’s just for the future, and you have the creation scripts…can you add a key to the tables, alongside the creation date? In the scripts, everywhere you insert CreationDate, also insert DeleteKey. Then keep those keys in a table with company names, so if you need to go do a mass delete you have the means to do it w/o relying on time.

Of course you will need to add the DeleteKey field to all the tables too. Is this something you can do?

Lib, good point about writing it in VB. Due to not wanting to translate dozens of 1,000+ line SQL stored procs to .NET and the company not wanting to spring for Visual Studio, I’ve gotten used to not being able to do line-by-line debugs in the db world, but there’s no sense in constraining myself to that in code I’m writing. That’s just the sort of thought I like to think I would’ve had if I’d slept last night.

ZipperJJ, it’s more of a screwup fix than anything else. If I needed to delete old records, I could do that a number of ways; realistically we’re not going to prune by Client -> Business unit -> Product, which is the sort of deletion this script will do. It might get repurposed for that at some point, but that’s not how I’m writing it. Adding a more solid identifier value to the creation script did occur to me, but I don’t want to mess with the tables, because this whole thing is hooked up to a SharePoint install that displays info to users based on these db values, and that’d be a nightmare to reconfigure. Didn’t mention that in the OP though, so thanks for the advice. :slight_smile:

Hey Roland, I feel your pain. None of my stuff is quite as bad as your situation, but close enough. I assume that an upgrade & rewrite isn’t really an option. :stuck_out_tongue:

I think you’ve got it figured.

Your options are between accuracy and efficiency.

The first method (to query all the tables to find the exact correct info) will be accurate. No matter what, you’d only delete the correct info.

The second method (using the timestamp) is more efficient, both in programming time and run time. As you say, the chances of this not working correctly are vewy vewy small - but there is a chance. Especially when you really don’t know who/how/what may happen with your script in the far distant future.

I’d probably go for accurate, myself, unless I just had to do the quick&dirty. I’m like that, I am.

Because the quick & dirty solution is always a horrible, horrible idea. Unless you really have to. :wink:

I feel for you, but deleting based on timestamps makes my skin crawl. It seems like the perfect example of today’s solution becoming tomorrow’s problem. Maybe someday when you start loading the database with a batch script and you forget (or are at a different job) about this delete function.

Does running a select on the timestamps - finding which records would be deleted - give you enough to find some common factor? Sometimes I’m not able to trace through the logic from the beginning, but working both ends I can get through it.

If you decide to delete based on timestamps, I’d do it in a two step process. Step one finds the info, displays it to the user, and generates the delete script. User approval is required to actually delete. Even better if you can flag suspicious deletes - if you’d wind up deleting two records from the widget table, for example. It won’t do much, but if it saves you one time…

You need to consider how the records were created during the Create_Widget proc. Were they done in a transaction? If not, then the timestamps will not be guaranteed to be identical for that “transaction” across all the tables. You will need to make sure that somebody doesn’t add another step to the end of the Create_Widget proc (after the commit statement).

You also need to consider the possibility that another process could have created a record in a table with the same millisecond timestamp.

Be aware that Microsoft SQL Server (at least up to SQL Server 2005) has a well-documented “bug” with regards to milliseconds. (I said “well-documented”, and then it took me 10 minutes to find it. Check out the DATETIME information on this page: http://dcx.sybase.com/index.php#http%3A%2F%2Fdcx.sybase.com%2F1100en%2Fmlserver_en11%2Fmicrosoftss-datatypes-ml-ref.html). Because of the rounding, it is entirely possible that some other process can start a transaction just after the first transaction ended, and possibly get the same DATETIME value.

This behavior may or may not be “fixed” in later versions of MS SQL Server.

You can test it by creating a table with a DATETIME field and then entering specific values ending with each of the digits. They will either be correct, or they will be rounded.

First I’d like to commend you on findong one of the rare copies of SQL Server 2007. We’ve only been able to buy SQL 2005 and SQL 2008 here, if only we’d been abel to get the SQL 2007!!! :slight_smile:

Second, I woudl like to second the idea that you are potentially creating tomorrow’s problems. Long after you’re promoted or take that job working for the SAS, someone’s going to scale out this application in some way, and still try to use your “fixer” script with potentially disastrous results. The reason “Best Practices” exist is because people made assumptions about how things were and then something unexpected happened. Creating an automated script to delete things by timestamp is asking for a good deal of trouble. It’s OK to use it as a helpful way of finding some suspected records, but it should not be relied upon.

On preview: feh. SQL Server 2005. Normally if I’m on message boards asking tech questions, it’s regarding SharePoint Server 2007. Forgive me, I’m wired.

Solid advice thus far. The warnings I’ve received haven’t cautioned me against anything I wasn’t already aware might cause trouble, so I’m going to go with the timestamp method for now, if only because I need to get this done quickly. I’m sure my conscience will force me to revisit it later and write a proper script.

Of course, now I’ve run into reason #2 why I don’t want to do this: I’ve never learned SQL. Or, more specifically, I am learning it, but it’s self-taught and I’ve only been at it for a couple months…but I’m the “go-to programmer”, so it falls to me, forcing me to spend time researching how to do simple tasks, or else ask questions on message boards that make me look like a moron. Here comes one now!

I now have a foreign key constraint conflict with one of my delete statements. So, I need to delete from Table1 where Value1 = (all the Value2s from Table2 where timestamp = x). There are multiple Value2s where this is true. I don’t know how to go about handling this. Any of you kind folks feel like saving me the research time?

…and to answer BobArrgh’s question, everything that happens in Create_Widget is one tran. That’s what I meant when I said that the value of the timestamp is only set once.

Just FYI for responders, I am aware of the potential downsides to the timestamp method that have been brought up thus far. S’what prompted me to start the thread. There’s no need for panic, though…right now, my company owns two licenses for SQL Server, one for me and one for the sysadmin (who’s the other one who runs these procs). Nobody is going to be running this script willy-nilly, and to say that the danger is minimal is still to overstate the case by a couple orders of magnitude. If it makes everyone feel better, I’ll promise to delete the timestamp-based proc after I use it. :wink:

Triple-post on the offchance anyone cares: solved the “delete multiple values” problem using an inner join. Learn something new every day.

I had gotten the impression from the OP that the CreatedDt value was stored in a variable, and then written to the database. If that is the case, then every reference to it will pop identical values off the stack. That would make it a sure bet for match-ups. But if that’s not the case, then there are all sorts of reasons why CreatedDt would differ from table to table.

Your assumption was correct. CreatedDt is stored in @CreatedDt, which is then assigned to the CreatedDt column of all the relevant tables (except a dozen or so layers-deep fkeys, but I’ll have to trace those the hard way anyway). This is why I feel safe using it as a stand-in unique identifier for the time being. I’d post the proc itself, but A) it’s huge and B) it’s probably proprietary. Still, I’ll be happy to clarify specifics where necessary.

Thanks for coming back to the thread, Lib. I’m at the “trace foreign keys for eight hours” stage of script development, which doesn’t really make for thrilling post material.

Yeah. God, it’s a shame. There’s nothing more difficult, irritating, or maddening than working with someone else’s code. Especially when they left it undocumented, and when they tried fancy shit when something simple would have sufficed. Been there. Bought the tee shirt. Good luck, and we’ll see you on the other side. :slight_smile:

Boy that’s the truth.

I’m guessing your using SQL Server Enterprise Manager to do your work?

Visual Studio Standard edition is only $200.

Have you done any object oriented programming? Sounds to me like you need a front end.

I was also wondering how you got a hold of a version of 2007 server ;).

The SDMB is great for questions like this. If you get into the nitty gritty, TheCodeProject is really good too. http://www.codeproject.com/ (forget the fact that their home page looks to be designed by a 3rd grader with bad taste. Nerds are like that).

enipla, yeah, I cut my teeth in app development on Borland Visual C++ and VB6, and now I work mostly in Java and VB/C# .NET, so OOP is my bread and butter. We have a front end of sorts, but it’s SharePoint, and it’s intended for end users. We don’t have an “insider” portal where we can manage things from within the environment itself. (Don’t ask me; like I said, I inherited this mess.) Wrangling that beast takes plenty of my time anyway, and I try to keep the custom stuff to a minimum.

I have VS2008 on my home machine, which I bought out-of-pocket for contract work and boring-weekend Rentacoder projects. I have no desire to drop any further cash on another license…I don’t work with this stuff often enough to make it worth my while*, and the Express editions I have on my office machine generally suit my day-to-day purposes.

Like I said, I know where you and Winsling are coming from. It’s that same instinctual shudder that made me ask the question in the first place. For now, though, time constraints being what they are, it will have to do.

*For anyone wondering what the hell it is that I do…you tell me.

SUCCESS!

The madness has been successfully untangled.

The cabal of conscientious coders will be happy to know that they ended up getting their wishes…while testing the routine on some existing widgets in the QA environment, it turned out that one of the widget-related procs generates a foreign key that has a different timestamp. Real data that had just been entered wouldn’t have any values in that table (it’s tied to a customer review of the widget, which wouldn’t happen with a newly-created one that nobody had purchased), but that pushed my already-stretched tolerance for sloppiness over the limit, so I had to find a new UI value.*

Nonetheless, problem solved. Thanks to everyone for their advice. As my final question, does getting paid overtime to be here until four in the morning debugging scripts and guzzling Mountain Dew qualify me for my professional programmer’s card?

  • Which, of course, didn’t exist, so I had to cobble one together out of a dozen interrelated values in eight different tables. I know, I know; world’s smallest violin and all that. It still sucked.

No, you’re not a professional programmer until they STOP paying you overtime :slight_smile: