So here I am looking at working past midnight for the 20th or so night in a row. Why am I doing that? Because it seems no programmers anymore have any native sense of the diagnostic process. So the people on the project team I run,either dick around for days banging their head against the wall and we miss dates, or I spend my nights going through each of their latest problems so they will at least be making progress for a little while in the morning.
It’s computers for goddamn sake, it’s not like being a medical diagnostician, or psychological diagnostician. Every thing goes in discrete steps you can watch, and repeat as many times as you need. It amazes me that people can reach adulthood without understanding the basic process of eliminate and narrow, when you can control every singe variable.
It annoyed me when I was working directly with non-tech people how many had zero sense of diagnosing. But it seems the majority of professional CS people, even with years of experience just don’t get. When the way they try doesn’t work, they scratch their head, then try a completely different approach that is asinine. Can’t get your JDBC connection working? just try to convert the 35 million row dataset to a csv. Your java works locally but not on the server? Just make the assumption we will be converting the production infrastructure to Windows for your piddly ass enhancement. Can’t get your pl/sql query joins right, just pull everything into the batch process lists and make a quad nested loop brute force walk and compare to find what you need.
I wasted 3 hours trying to explain how to diagnose to one guy today. We have a query that pulls 100k records. It was running out of resources on the 60123 entry. So obvious first thoughts, most likely that entry is corrupt or the process is corrupt. 1. Exclude that entry and run the whole rest of the feeds. If it succeeds then it points to a bad entry. 2. Exclude the first 1000 entries and run. If it crashes approximately at entry 70123 then it is likely a process issue not releasing resources. If neither happen then we try something else, but those are best first steps.
I felt like I was talking to the guards at Swamp Castle. He just kept running it and reporting that it failed or succeeded. The first 1000 entries run ok? what the hell does that tell us? It failed but you didn’t record where it failed, how does that help?
Arggh just a basic understanding of the process of controlling variables, then re-testing and observing to focus on the problem is worth so much more than knowing every little syntax quirk of the programming language.
Sorry if it’s a bit disjointed, it was composed in dozens of breaks while I waited for DIAGNOSTIC TESTS(DUN DUN DUNNNNNN) to complete.
to late to edit. *Before anyone snarks the obvious, running that entry alone on a test doesn’t do anything. Most entries builds on previous ones in this set , so a run without predecessors is such a different trial it doesn’t help. Cutting the first thousand does change things a bit and on occasion that is where the problem lies, but for this data set it is the best first test.
I manage a Service Desk and see this all the time. You know the joke “Why are looking for your keys here if you dropped them over there? Because the light is better.” that is how most people seem to troubleshoot.
They attack what they know, particularly what problem they’ve solved before, even if the symptoms don’t fit.
I find it’s easier for me since all my technical expertise is long past the point of usefulness to be able to just ask simple, basic questions like “What’s changed?”
How about - the coworker reports that their batch of cookies has come out flat and funny-looking. When asked to figure it out, he puts another batch in the oven and says, “yup, it happened again.” Someone who knows about baking might check things like whether the baking soda/powder is expired or if the person who mixed the cookie dough was melting the butter instead of letting it come to room temperature. Instead, coworker just does the same thing over again, gets the same result, and shrugs uselessly.
And then when finally convinced he needs to try something else to get to the heart of the problem, first turns the oven 90 degrees to the left, Then tries replacing the flour with dry-mix cement.
My complaint isn’t about the tech, It’s about the new generation of computer programmers(get off my lawn) not having a clue how to diagnose and fix problems in an organized and efficient manner.
The tech only makes it so much easier because your oven doesn’t have a switch that will make your baking powder tell you
INITIALIZING LEAVENING
Detecting Acid = 25%
Detecting base = 0%
checking air temperature = 325
ERROR leavening process failed
And that’s it. Even then it doesn’t help because error messages are rarely specific enough to diagnose a problem. If someone doesn’t have an understanding of the process involved there’s no way they’re going to be able to deduce the cause of a problem.
All software problems are caused by one of two things: There’s either a 0 where a 1 should be, or a 1 where a 0 should be. When you have no idea which one of the 0s or 1s out of the zillions of them is involved then you have to have a means of narrowing down the possible places where it could be. It seems simple enough to consider where the code was last working properly, and what conditions lead to that, but I can’t find a way to teach that to people.