is reverse compiling Cobol relatively easy or hard?

I think that reverse compiling C is considered hard while dotnet executables even preserve the variable names etc.

So, how about Cobol? At first glance the language seems unstructured and not far removed from assembly. So, in practice, is there a lot of info lost through compile - decompile process beyond the human readable variable and method names?

Also, is reverse compiling legacy Cobol even an issue in the real world or have real world institutions generally preserved the original source code intact?

I have been doing COBOL programming for about 25 years. Reverse compilers exist, and they produce usable source code in terms of the verbs and general structure. The variable names and paragraph names are gibberish. It’s pretty hard to work with but it can be done and I have done it. Once.

Most COBOL shops have been using standard source control software for a long, long time, so the issue of recompiling to recreate lost source doesn’t come up that often. It must have been enough of an issue at one time to justify the creation of reverse compilers, but it has not come up much in my career.

This has less to do with language and more to do with debugging symbols left in the binary by the compiler, which works the same whether the binary is machine code (as it usually is in C) or bytecode (as it usually is with, say, C# or VB.Net).

Conversely, I’m pretty sure there are ways to strip that extra data from a CLR binary if you really want to do it.

Finally: It isn’t the variable names that make it easier to reconstruct readable code; it’s the optimizations the compiler does to turn the source code into whatever it is you’re decompiling. A C compiler, for example, does a lot more to turn C into machine code than I’ve ever observed either Sun’s Java compiler or the Mono C# compiler doing to turn their respective languages into their respective bytecodes. Partially this is because both the JVM and the CLR are stack machines, which makes a lot of optimizations impossible, and the rest is down to the fact those bytecodes are expected to be run by bytecode systems that compile bytecode to machine code on the fly (or ‘just-in-time’, aka JIT), doing optimizations based on observed runtime performance as the program executes. Modern computer chips can’t do nearly as much to help the efficiency of unoptimized machine code, though what they can do never fails to surprise me.

Anyway, from all that we can conclude that in the case of, say, an MVS COBOL compiler targeting a z Series mainframe it’s probably going to be a lot closer to the case of the C compiler than the C# compiler, just because the z Series is a register machine and, even though it’s microcoded (is it?), it makes more sense for the compiler to do optimizations than to try and push everything into the microcode engine and hardware. (An OS/400 COBOL compiler targeting AS/400 bytecode could be another story; in some ways, it’s actually closer to the C#/Java model.)

I’m an old COBOL programmer. I’m surprised it’s still being used. Is it just changes to old programs or are new systems being written? It’s pretty clunky. On a related note, is EBCDIC still around? I always liked that format.

EBCDIC is still around, much to my horror.

I still write COBOL. The pc versions are really good. Especially Microfocus COBOL. The new standards include OBJECT support and html. There’s even Javabean support. I still write traditional programs like I have since the late 1980’s. But, the new stuff is there for people that want it.

I’ve never ran across a need to decompile back to COBOL. We had multiple copies of our source code. Both in Prod and Test environments. We had tape backups. Then later cd-rom. All our source code and documentation easily fits on a couple cd-roms.

I used EBCDIC nearly 25 years ago on a Honeywell mainframe. The old COBOL 69 compiler’s file system was EBCDIC. At one time I even had learned Octal codes. :wink: We moved past that system in 1990.

After thinking…

I do recall our old shop standard from the 1980’s didn’t allow us to use the Compute statement. One of our Systems Programmers had looked at the assembly code of a compiled program. Simple statements like Add, Subtract, Multiply were handled very efficiently. The Compute statement was broken up into a series of Add, Subtract, Multiply etc. statements at the assembly level. The module that handled the compute statement only got loaded if it was needed. The conclusion was the compute statement on this old compiler was much less efficient and used more memory. So, we were told not to use it. I was only a beginning Programmer then and did what I was told. I never questioned whether they were right or not.

Later, with a new Vax mainframe and different compiler we used all the new COBOL-85 statements. Compute, Evaluate, End-If, String etc.

Ah, I believe those versions are called ADD 1 TO COBOL. :wink:

This was a big issue about a dozen years ago, when many companies were working frantically to convert old COBOL programs where the source had been lost, before Year 2000 arrived. There were even commercial products sold (for big bucks) to de-compile COBOL programs.

But like Crotalus says, they weren’t real useful – you ended up with source code that worked, but was un-understandable (like an APL listing :slight_smile: ) and nearly un-maintainable). The task was hard, because COBOL compilers have been fine-tuned for decades, and are very optimized for the specific hardware, so the object code is often far removed from the source statement. When you know that the program will be running only on an IBM Z-series mainframe, you can do lots of tricks to use that hardware the most efficient way – something you can’t do with your PC program when you don’t even know if it will be running under a Windows, Mac, or Linux OS.

Also, it was generally only very old, infrequently modified programs – newer ones were under your source control system, so the source code was available. In the shops where I was working back then, if we had a program with lost source, we usually found it easier to just look at what the program was doing, and then write a new version to do that job. (The trick was that often on the very oldest ones there was little or no documentation of what ‘job’ that old program did. Or it had been modified over the years, and one or more tasks weren’t mentioned in the documentation, but were very critical.) In many cases, there were newer, simpler ways to do the task – we once replaced a lengthy antique COBOL-F (!) program with 4 DFSORT commands.

And COBOL is still used for a vast amount of processing in the real world.
Your credit card bill, your monthly bank statement, your electricity bill, your rental-car reservation, your cellphone bill, etc. probably came from a COBOL program. And those COBOL programs are still being actively maintained and upgraded today – they are extremely valuable, vital assets to the companies. (Right now, many of these companies are starting to worry because all their experienced COBOL programmers are reaching retirement age.)

…giving cobol :slight_smile:

Ah, COBOL, how I miss you. :frowning:

Can anyone recommend a good, free COBOL compiler that works in Windows XP?

You made me snort coffee. Ah, nostalgia.

We still rent time and space on a mainframe and use COBOL in our workplace. Theoretically that will end this year.

About 6 or 7 years ago I tried experimenting with this on a Windows platform: http://www.thekompany.com/products/kobol/

Although not free, you might call it “affordable”. It’s been a while and I only played around with it for a short bit, so I can’t provide an honest evaluation.

Some free versions. Never used them, don’t know how good or bad they are:
http://tiny-cobol.sourceforge.net/

I believe you will need a C compiler for all of these options.

The advantage of COBOL on the old IBM370 series mainframes was that a lot of the statements translated directly to assembly code. 370 Assembler has statements (IIRC) like ADD DECIMAL A(L,D) B(L,D) Where L-Length in digits and D-Decimal digits. (Or, ADD PACKED DECIMAL…)

The difficulty, as alluded to, was the discovery of the variable names. COBOL was originally record-oriented, data came in on those 80-column punch cards, or fix-size records on tape or disk. The same record could be redefined several times, as (for example) a purchase order header, a sales detail, a comment, etc.

So one command could reference a certain spot on memory (the record buffer) and say bytes 6-10 were a packed decimal with 2 digits after the decimal point. Another command would say bytes 3 through 20 were a text field. Several different versions of the record might have the same 6-10 as decimal, but in one version it’s a price, in another it’s a total, and in another it’s hours worked. If you try to decipher what a particular variable would be fromtracing through the program, you might get very confused.

Unlike modern programs, old COBOL would drop as much information as possible (i.e. useful debug data like variable names) since old mainframes were severely limited. A coworker from the 70’s mentioned he learned 370 Assembler to rewrite programs because the COBOL he was writing was too big to compile on the 40K of RAM the mainframe of the time had.

Of course, if you are talking about more modern versions of COBOL - it’s also possible they simply translate COBOL into metacode that another compiler can handle. Many providers of multiple languages tend to do this sort of thing.

Hello , I need revers compiler , as I can I acquired ?

IF the shop used a decent change control system, there is rarely (should really be never) a need to de-compile. - The source was guaranteed to match the executeable module.

I did work at a shop which did not do decent change control. This was an insurance company - a company that lives and dies by ‘Claim Adjudication’ - the decision to pay/decline a claim.
They did not have the source for their ‘Adjudication’ module - they thought they could piece together the source.
Would love to hear how they ported THAT function to client/server.

How they got into the situation:

The programmer says "OK - it’s fixed/upgraded - here are the:

  1. Source code
  2. Load module (the .exe)

Their ‘Change Control’ geniuses then copied both the source and the load modules.

After 20 years, they have absolutely no confidence that the load module they are running has any bearing whatsoever to do with the source they have.
Lesson: When it comes time to change the program, you move only the source member(s) and you RE-COMPILE it into a controlled environment and re-test it through every test in the book.
ONLY if it passes the ‘system test’ does it go live.

The best ‘de-compilers’ for COBOL could only do assembler-level output. The modern optimizing compilers made hash of any attempt to re-create the original COBOL.

The output of a COBOL compile and link would be machine-level code.

The only machine instruction I know is:
‘11’, meaning ‘Set Buffer Address’

Now you know why you should really, really appreciate the modern languages.

I can’t imagine a site that doesn’t re-compile the new version when it is moved into production – their programs must have been running very inefficiently.

There are all kinds of options that can be set on the compiler to control how the object code is generated. mostly they are set either to generate debugging help or to generate the most efficiently running code. (Generally, those goals are opposites of each other.)

So the load module generated by the programmer would be quite different from the one generated for the production environment, and usually wouldn’t work well at all.