I see a freelance project online for reimplementing a big open source C program (without unit tests) into Java. But I don’t find much mention of translating programs from one language to the other online and neither is there mention of special purpose tools for this, such as for verifying correctness of translation, or of partial translation.
Am I doing poor job searching? Or is this task so exotic and rarely undertaken nowadays that nobody cares to write about it and make tools for it?
A quick google for “convert c to java” turns up several tools.
I wrote a proposal years ago for converting a proprietary language to Java. The problem I found was that certain idioms don’t translate well (especially when going from a procedural language to much stricter OO). To handle that you wind up doing something like decompilation, which of course produces unmaintainable code. So you either do it as a one-time thing, never to be maintained, in which case you might as well use an emulator. Or as an incomplete first pass, with customized translation tools, and the tricky parts rewritten by humans.
that’s not what I mean. I am aware of existence of automated line-by-line translators, such as from C# to VB.net. The question is mostly about testing/verification of correctness aspect.
On second thought, this is pretty reminiscent of the problem of verifying correctness of a version based on a previous reference version in Subversion, which is a task for which there are no well-known tools I am aware of (though there are patents and some implementations from small ISVs). But I would guess that the specifics of translation task makes it distinct from comparing versions in the same language.
It sounds like what you’re looking for is either called a transcompiler or a source-to-source compiler. IIRC, there’s a theorem to the effect that a transcompiler exists for every pair of languages, but that doesn’t guarantee that it’s easy to implement, or that its output will be at all readable. That latter bit seems like the big stumbling block here, and the main reason why you wouldn’t want to do this for a large codebase.
If I understand you correctly you want to translate C to Java and somehow demonstrate (without testing) that the Java program is equivalent to the C program. Is my understanding correct? I am something of a code dinosaur, but I would be really surprised if something like this existed.
In my experience, reimplementing a codebase in another language is rare because no one is really sure of everything that codebase is doing. This makes testing difficult (perhaps impossible).
I’ve been programming in more languages than I care to remember for over 40 years now and have never seen anything like you describe. It might be technically possible to take compiled machine language that was generated in one language and decompile it into source code for another language, but that code would be very user unfriendly, with meaningless variable names and no code comments.
I’ve worked on a few projects where a system was re-written from one language to another, but it was always done by humans, not machines. Generally the best method is to start with a clean slate and duplicate the functionality of the program in the new language, using the old code as a reference. This also gives you an opportunity to incorporate new features and improve the user interface at the same time.
ETA: I’ve never yet seen a project where 6 months after release there haven’t been a lot of “why didn’t we do it this way instead of that way?” moments. A total re-write lets you make those changes as well
Certain C features (e.g. unions, casts, pointer arithmetic) will be difficult or at least very ugly to translate into Java. Anything that compromises type safety, basically.
Also complicating your life is the fact that C contains all kinds of “undefined behavior” — forms that compile with no error or warning, but which have no particular guaranteed behavior at runtime. Unfortunately, some C programmers are unaware of these forms, or just don’t care about the matter as long as everything runs as expected on their own target platform.
I think that translating a codebase into another language is fairly rare, and is usually only done because nobody is using the source language any more, which means that every one of these projects is a one-off. In particular, if you had a working application in C, you would be crazy to translate it into Java–what possible gain could you have from that work and risk?
I was involved in a project like this in the run-up to Y2K–a company had a home-grown language that they were tired of supporting, so they had us translate it into standard COBOL. This turned out to be way harder than anyone expected btw. It is hard for me to imagine a situation where enough people are still using the old language but want to move to a new language that it would pay to develop an automated translator,
The issue of proving that two representations are equivalent is related to the problem of proving that a program is correct. This is a matter for research right now, not anything that industrial code-producers are going to do.
I’d say the big advantage of re-implement codebase, whether or not a language change is involved, is to get rid of all the dead wood and change the design in way that is more suited to the current requirements of the software.
Unless issue is strictly one of low-level functionality, (e.g. language A works on platform X, language B does not, we need to support platform X) just blindly converting from one language to another does not seem to me to be a big win for anyone. And even if there is some low-level requirement like that, I’m sure some human intervention will be required to get the software to work WELL on platform X.
The first thing you’d have to do is to somehow derive a formal specification of the behavior of the code in language A from the program itself, which would be a hell of a job. Then you’d have to verify the translation into language B against that specification, which would be another horrendous job. Obviously this only makes sense for code big enough to make manual translation or automatic translation and manual verification impractical. I’m a bit rusty on verification, but nothing I’ve seen makes me think this is any more practical today than 30 years ago.
As an example of the problem, when I was in grad school we had a Multics system. Pascal programs were run by translating them into PL/1, and then running that version. That was fine for student programs, but my dissertation involved modifying the Pascal compiler, and I wanted to get it to compile itself. It took a lot of manual intervention, because Wirth and Jensen used 60 bit sets, assuming a CDC machine, which broke badly on our 32 bit Honeywell machine. Thus, verification of even this fairly simple case would have needed to know the underlying hardware.
I’ve written a variety of high-level language translators. There are two major problems to overcome in every case; First, the converted code must be completely compatible across platforms. Any small functional change can create a domino effect of problems. Second, the converted code must be reasonably useful for future maintenance and development. Without that second part, a cross-compiler which simply creates object code for a different environment is a better solution. I’ve seen more and more of these tools, but also don’t hear about a lot of success stories. It almost always requires a lot of human intervention to identify compatability issues, and apply technigues unique to an application to maintain the utility of converted code. Unless the code base is very large, a better approach is to create a tool that uses the original language as a template to create new code in the new language or environment. In any such project, there must be a serious testing effort to ensure that the converted code operates properly. Even in the old days where these processes were very slow, and large conversions extended for months or years, I would respond to clients questions about the length of the process by pointing out that the conversion would be done faster than they could verify the results. It is however, and excellent opportunity to evaluate and document an existing system. And even though it is a difficult process, if performing a major platform transition, across hardware and operating systems as well as languages, it is a much faster and more reliable means of migration than re-writes, many of which are never completed.