Dealing with code at a higher level

Greetings everyone,

At my job I deal with a lot of embedded code, 98% of which is C although some bits are written in assembly. When I say a lot I mean dozens of divirgent versions(sometimes substantially modified, sometimes not) of a codebase that’s close to a million lines and thousands of files. It pulls a lot of legacy support as well as historical things we don’t have time to update.

I use SlickEdit as my IDE. The build system is complex, but let’s just say it involves Perl, Batch files, a tree of INI files and GNU Make and is compiled using GNU C. I get to fix mine and other people’s bugs in different version splits of this code base.

I am getting sick of trying to figure out how things work using grep! Tracking down global variables, messages, event flags, thread entry points, pipes, function pointers and callbacks, and even co-routines that are not clearly described somewhere is taking it’s toll on my soul. Sometimes I feel like a cryptographer trying to decipher the voynich manuscript.

[starts acting like a five year old]

There’s gotta be a better way than just grep and navigating using tags. I can’t believe nobody has yet made a usable static code visualizer, much less sometimes that lets you modify and write new code at a higher level. I have a makefile. Something can parse that makefile, figure out what defines and what includes is every single file in the project compiled with, it could even run the compiler and have it dump translation units. Now, I am not asking anybody to try to solve the halting problem, I just want some feedback about what I’m looking at! Static analysis can’t tell me everything in all cases, but for any information I want I will settle for a reasonable attempt of providing a non-exhaustive set.

When my cursor is sitting in a middle of a function, I want it to be constantly aware of the entry point to the entire program, loops and branches. I want a reasonable guess as to what thread(s) I could be in, what functions could’ve called me. I want to be able to without leaving this code view provide hypothetical facts ( e.g. It was this function that called me) and turn it on and off as an assumption. I want to be able to teach it about specific messaging mechanisms (pipe implementation, semaphores, etc.) and be able to follow messages down pipes reasonably. I want a reasonable guess as to what other threads the thread I’m in communicates with. I want a diagram for everything. I want it overlayed floating over the code on a single keypress and I should be able to navigate the code using any diagram be it a call graph, a co-routine guess, a pipe diagram, hypothetical call stack, etc.

If I’m sitting in a perl script I want color coding that is actually useful. Data type, branch, reg-ex and object sensitive. It’s not really helpful that it’s making “print” bold – that doesn’t make other people’s perl any more readable.

Just because it doesn’t know the instruction set of the assembly file I’m editing doesn’t mean it’s that hard to guess and color code accordingly. It’s assembly for fuck’s sake, it should be able to figure out the instruction set without me telling it about everything. I can touch up it’s guess later.

I want a hypothetical guess at potential timelines – as in there’s always easy to find functions that are guaranteed NOT to be running whereever you are in the code. For example if you are in a() and the only place a() and b() are launched is

f()
{
a();
b();
}

We can know that if we’re in a(), probably no other thread is running b(). Same thing with things like semaphores that can be used to guess at timelines. My grep is tired.

[stops acting like a five year old]
I’ve studied compiler design. I’ve studied programming languages. I’ve studied computability. I know how incredibly difficult what I described is to actually make into a commercial product. It seems static analysis people often generalize things as “Oh, halting problem, so we won’t even try”. Halting problem only tells us what we can’t do in every case, it tells us nothing about what we can’t do in some cases. Here, I can write you a useless program in ten minutes that finds some infinite loops, and always terminates - that doesn’t violate the halting problem. It’s hard, but it’s not much harder than things like voice recognition and face recognition, etc.

We’re living in 2007 now. We have incredibly fast CPUs, huge reserves of RAM and enormous hard drives. We have reasonably mature higher level languages. We have thousands of software engineers, some of them in very inexpensive parts of the world. The best we can do in the IDE arena is Visual Studio, Visual SlickEdit and SourceInsight? It’s color-coordinated notepad with some fancy toolbar buttons for two-letter VI commands, an ugly inflexible call graph generator and a checkbox GUI enabled version of Make, all of them slow as bloody hell. :dubious: End-user PC application developers at least have RAD tools like Delphi and while that’s better than nothing I strongly feel that the development tool arena is a good fifteen years behind the rest of the software world. The mere fact that there is no (to the best of my knowledge) good 3D static code visualizer is a testament to that. And as much as I hate UML and Java, why isn’t there something that gives you a UML view of a Java project, and allows you to edit code by visually modifying the UML rather than the code itself?

I’m kind of busy now but if this doesn’t change in a few years I’m going to start rounding up some brains and venture capital. I just described the above to a few friends and some said they’d pay thousands out of pocket to have something like that. Hell, I’d pay thousands. Maybe we’re just spoiled idiots. Who knows.

Don’t wait. Do it now. You will always be too busy.

I’m not qualified enough to address the rest of your rant, but in this case, I can assure you there is. Borland Together Architect allows for UML round-tripping - modify the UML via drag-and-drop, and the code is updated. There are also open-source plug-ins for Eclipse that do the same thing, but I haven’t used them much.

Scientific Toolworks has software that may help alleviate some of your headaches: it can help you search for variables, navigate control structure, and track down functional dependencies. I haven’t used it myself, but I’m lucky enough that I don’t have to deal with legacy codes. See http://www.scitools.com/products/understand/cpp/product.php

Curiously, decent programming tools is one area that has been grossly deficient in Computer Science. The literature on the academic side is sparse and frankly not very interesting apart from a few neat examples and the process of converting a concept into a polished product is so daunting that very few people try. The state of the art in usable products right now is still depressingly primitive.

Here I thought this was going to be a Python thread or something. 'Fraid I can’t help much.

It was called COBOL and it was usually stored in files managed by PanValet or Librarian that came with extended search programs.

Too bad.

:smiley:

Well, it would help if C wasn’t such a nightmare to parse, or if there was a standard Makefile format(before you say POSIX, try compiling FreeBSD with GNU Make and get back to me).

Yeah we really should solve the problem of machine readable things being machine readable first.

I can’t figure out if you’re being serious here or not. Have you ever tried to write a C parser? The language is a nightmare. For example, what does the following line of code do?

f(x);

The preprocessor exaberates the problem greatly.

I’ll give C a pass because the language grew organically in the 1970s. There’s no excuse, however, for that montrosity called C++. The language was standardized in 1998. No one has managed to become fully in compliance with that standard. If the compiler guys – who surely get the best and brightest in the field – can’t get it right, people writing tools don’t stand a chance.

I’m making a reference to the fact that at least one machine can accurately parse any working program in any real compiled programming language – the compiler. For third party IDEs that support many compilers this is somewhat of a challenge, true. However, Microsoft Visual C++ and such have absolutely no excuse for not saving the translation objects generated during build – the build system ships with the IDE and is developed by the same vendor. If their IDE team has no contact with their compiler team, shame on them. Hell they could output segments of the expression tree into Visio on demand and that would already be ten times more useful than what it can do now.

It’s not that easy, groman. The code that the IDE sees is not the same as the code the compiler sees thanks to the preprocessor.

No it is just that easy to see the code. What isn’t easy is doing something useful with it, but it’s not impossible. In fact the IDE doesn’t even need to parse it, the compiler already parses it, the debugger is capable of reading the debug symbols and connecting every single assembly instruction with the corresponding line of code, so the line # information travels through the entire compilation process. The IDE invokes the compiler, and in case of GNU C, and probably other compilers as well, the compiler is capable of dumping all of its translation units which is what the IDE would have to parse. In case of GNU C I believe it’s XML.

There’s nothing stopping the IDE from seeing the same preprocessed code by running the preprocessor as well. In fact, a lot of static analysis tools that DO exist rely on you to supply commandline for each file to generate pre-processed output using YOUR compiler. I’ve yet to see one that could figure this out automatically based on your project files/makefile/etc.