can you work on getting a Java project to build without accessing source code?

suppose I need some help with getting a big, complex Java project with a bunch of outside dependencies to build. So I hire somebody to do it through TeamViewer. But, I don’t want the somebody to either be able to mess up my machine or examine the source code of the project.

Obviously the messing up the machine can be prevented by using an OS running in virtual machine.

Now back to source code protection. I could prohibit files from being uploaded onto the net. I could sit there and watch it all happen, to discourage any malicious activities as well as just the plain reading of source code. Or I could use spyware type apps to monitor what is happening in my absence. But that all sort of sucks.

So I wonder - could the “get a Java project to build” operation be completely decoupled from examination of the source code? Let’s say we were to have an app that would show the user the file structure of the project as well as the lines with reported compile errors, if any. So the user can add and remove jar files, can edit the Ant script, set environmental variables and do other stuff of this sort, but viewing source code is either not allowed completely or else is very limited.

Would that work? Or maybe, “would that work in a pretty big percent of cases”?

Incidentally, the question focuses on Java because this seems to be the most popular technology where builds are thought to be complex. E.g. AFAIK builds in .net are not considered particularly hard.

You have two different classes of problem here.

One class of problem is that module (file) X won’t compile because it has a syntax error. Nobody’s gonna fix that without seeing the code in question. So we can dismiss out of hand the idea of solving that category of build problems using untrusted contractors.

The other class of problem is that (arbitrary large collection of modules / files) won’t compile as a group. Causes might be that a dependency is missing or the build script doesn’t match the actual dependency graph or there’s a typo in a file name, or you’ve got a build dependency loop or …

(I don’t know diddly about Java, so this next cause might not be possible.) Perhaps modules A & B both depend on module X, but they depend on different versions of X. And a completed build can only have one version of any given module.

etc. I can imagine another dozen flavors of global build failures.
This second class of problem (global build failures) could *potentially *be fixed without reference to the source code. At least for some subset of failure modes.

You’d need a tool which could map the dependency tree directly from the source code and then deliver the map, the build script, and any error messages to your untrusted contractor. He/she could then resolve the inconsistencies in the script & resubmit the changed build script to you and your tool. Iterate until success or failure to make progress.

But even then there are broad areas where this approach won’t work. For example, suppose module A’s source code references module XYZ. But that’s really a typo. The reference ought to be to module XYX. All your untrusted contractor can tell you is “We have a dangling reference here”; he can’t tell you what it ought to be. *If *there’s also a module XYX in the build script that appears to be unreferenced, he might put 2 & 2 together, but that’d be a tentative conclusion. And if XYX is referenced from more than one place (extremely likely), then there’s no hope; he’s got no basis to decide whether the cause is a missing (or misnamed) source file versus it being a bad outgoing ref. And if it is a bad outgoing ref, he’s go no basis to decide which one it ought to be instead.

Finally, if you were going to actually try this approach and you were serious about the secrecy of your code, you’d need to obfuscate all the names before you gave any info to your untrusted contractor. In fact, yuo’d probably want to obfuscate the namespace relationships as well.

Long before you got your bi-directional obfuscator working perfectly, you could have solved your original build script problem.

Late add …
90% of the value-add your contractor could give you was already done for him when you built the tool which reads the source code & generates the true dependency map. Once you have that tool running, you don’t need the contractor.

In practice this sounds unworkable, and will almost certainly be a frustrating experience for all concerned.

In order for this to be remotely feasible, you’d need to be able to guarantee up front that there are no syntax errors or incorrect references to libraries, because those are going to require viewing the source code to resolve. It is hard for me to imagine a scenario where you can confidently make this claim and the build isn’t working already!

Either way, you’d do so much work to setup this environment that you may as well spend that time fixing the build yourself.

Have the person you hire to do the build script sign a non-disclosure agreement and let them have the source code.

Conceivably. You could set up some sort of batch system where the build-person could submit a build script (batch file, shell script, etc) and then view the output to see if it succeeded. You’d still have to somehow filter out commands that would print the source code as “error” messages (e.g. “cat * 1>&2”). And it may be difficult to find someone willing to work with those limitations for a reasonable rate - as a consultant, I’d certainly charge extra.

what does the build complexity have to do with trivial syntax errors? Why do I care if somebody sees the names of the files in a 100K lines of code project?

I think the real problem is usually building a project with large number of “freely available” *.jar, or 1.jar, 2.jar, … 100.jar obtained from 3rd party sources, that’s when the problem starts.

So yes, an alternative might be to get a compiled project with all that hard to build stuff but without my code and then add my code myself. Except the problem may arise where references to the added SDK’s are recognized in some of my files and not recognized in some other ones. Or maybe the sandwich will find another clever way to fall buttered side down.

so you think that all stands between the current jar hell and the hypothetical automagic insta-build jar heaven is the lack of a dependency extractor that can be built an undergrad in intro to compilers? Or maybe some slightly more elaborate tool based on that that can be built by two grad students for their thesis?

Maybe you are absolutely right, but then we gotta wonder, where have these two grad students been for the last 15 years?

Anybody can decompile your classes and get source code through the back door, unless you obfuscate it.

If you trust someone enough to have them write code for this big important project, why do you not trust them with your source code?

bump. Incidentally, I think that the question I am asking in post #8 is actually of greater general interest than the OP question.

Maybe one way to look at the problem of the build, at least to the extent that I was exposed to it in Eclipse, is that this is like working with a black box that spits out complaints but does not give me a graphical explanation of what is actually going on. E.g. if I dumped a bunch of my code files and 3rd party jars into one place, presumably all I want is for the dependencies to be visible where I am using them. Is that accurate? Or is there a lot more to Java build than just getting a connected graph of dependencies?

Could the whole problem be solved by making a visual interface showing the various sections of the graph and then permitting the user to explicitly say, ok, these classes should be visible in this file, so that the relevant Ant script would be generated automatically from such input?

I’m having a bit of trouble understanding the question.

If you have hundreds of dependencies, then you explicitly account for them in your build process. If you need a very specific version of Oracle’s JDBC driver jar, then you put it under source control, so the build process has access to the exact same files every time.

Make everything so it can be built outside of any IDE, using Ant.

Then you can easily drop it into a slick automated build environment such as Jenkins (formerly known as Hudson).

This is not simple, but it is not rocket science for a seasoned Java developer, so I find it hard to see hiring on somebody to design the build without accessing source code.

If you are Ok with a much more structured approach, then use Maven. It places rigid constraints on how you build your project, but at least it is pretty darned good at finding hundreds of needed jars and bringing them down from your repository or a remote Maven repository, in order to build the project the exact same way every time.

And dependency checking stuff doesn’t necessarily give the whole picture. As soon as Java Reflection and dynamic classloading are involved, then the hard dependencies in the byte code on other classes do not show the full list of dependencies.

minor7flat5, thanks for bringing up Maven.

WRT to the “seasoned Java developer”, you know, there is this whole trend in my thinking that’s called “how do we make things so that people who are not seasoned could still do them”. Obviously that has its limits, and for any level of technology the seasoned will beat the non seasoned, but still progress is supposed to march on and make things easier to learn and easier to do. After all, it’s hard to become seasoned if the technology is so set up that making the 1st step is easy enough with the tutorial whereas the 10th step requires 10 years of experience.