Need a computationally complex algorithm

So I’m working on this project that will end up being significantly expensive computationally. I have been assigned to set up Amazon EC2 in order to do some number-crunching. But we don’t have our algorithm yet. I’m also considering looking into using GPU-based processing as well.

I need an algorithm that is easily adjustable. It would be even better if it could terminate itself after a certain amount of time.

An example. Something that might take about 5 mins on a normal computer. I just don’t want to get something that will suck up lots of power at EC2 and cost me a lot of money, is all.

Anyone got any suggestions?

First of all, EC2 is approximately the coolest thing ever.

That said, I’m a bit mystified about your need for the algorithm to be computationally complex; if you just need something in place for testing your EC2 instance, why not just have some process that counts to a few billion and then terminates?

If you want something more intensive, one of my favorite benchmark tasks is using GraphViz to plot a directed graph of N randomly generated nodes where N == BIG_NUMBER.



while true {
    x = sin(x)
}


You could look up some NP hard problems, but maybe the easiest thing to do is to write the least efficient sort you can think of, and apply it to a large list of randomly generated numbers,

How about some factorials? Something like !10^20 should keep the wheels spinning.

tim

It sounds like your best bet would be an approximation or combinatronic algorithm because these are well studied and there are known and fairly simple implementations for them that are non-polynomial time. Hell, a simple knapsack or TSP implementation could be relatively simple to implement (maybe 20-30 lines of code) but could run for minutes, hours, or days depending on how large the problem is.

Further, most of these algorithms are specifically designed to halt after a certain number of iterations or a time limit, so it would be fairly simple to meet your “5 mins on a normal computer” requirement.

I can think of some pretty inefficient sort algorithms.

Backtracking on the 0-1 knapsack problem is relatively painless to implement, and can easily eat up a lot of your CPU’s time. It also parallelizes well, if that’s a concern.

woops, hit post accidentally, will post update in a bit

So guys, thought I’d check back in with you folks to let you know how it’s going. In case you’re interested, I just decided to go with a bubble sort. I wrote a quick C++ program that creates an array of size 2^x, and fills it with random numbers. It then does a bubble sort. While it does this, I had it keep track of how long it took for it to do this. For the record I have an iMac. It’s an Intel Core Duo running at 1.83 Ghz with 1 GB of RAM.

The results aren’t very impressive, I must say. For my experiment I chose the fastest possible machine. That turns out to be one with 8 Intel Xeons at 2.33 GHz. I’m posting my results below:



Bubble sort time (in seconds) of array
Size		My Comp		EC2 8 Xeon	
----------------------------------------------
2^14: 		2.81		2.1
2^15:		11.05 		8.19
2^16: 		43.82		32.3
2^17:		174.41		128.76


So did I choose a bad test?
Is this just the nature of complexity?
Any ideas?

A few bonus points questions:
why does ssh treat my backspace key like a delete key?
why doesn’t CLOCKS_PER_SEC work right with g++? When I start a C++ project in xcode it does work, but when I try to use g++ in the shell it isn’t recognized. Although gcc says it recognizes cpp files as c++ it wasn’t compiling them correctly so I use g++ instead. Any ideas?

Well, your implementation is probably not taking advantage of the 8 processors (right?), so do these numbers seem reasonable for a single 2.33GHz Xeon? Seems like it’s at least in the ballpark…

What exactly makes you think this was a failure?

How to I optimize it for multiprocessor systems?

well quicksort might be a better choice since it uses a divide and conquer approach.
basically threading out the splitted list to the diffrent processors.

ETA: I am a first year comp sci student and know absolutely nothing about multiproc programming.

You need to parallellise the problem. Ideally, each part should be entirely seperate.

Your program needs to start processes or threads. Let’s say you’re generating a Mandelbrot set. You might start a thread for each pixel, or each line. Let the OS handle where to place each thread if it can do so. Be warned that multi-threaded programming can be fraught with danger - deadlocks and the like. I strongly recommend you read a book or three about multithreading and parallel programming.

AFAIK, ssh doesn’t do anything to or with the delete key. I’m pretty sure the “problem” is on the other system; I think you probably need to set up a special keystroke translation. For instance, a note from OSX hints about if you’re going from OSX to a Linux box (although there must easier solutions).

You’ll have to provide more information, I think. Are you including time.h? What platform are you using? It may make a difference: for instance, on Linux, the string.h header includes the function mempcpy; on OSX, that function doesn’t exist. A similar thing with LAPACK functions, although with switched platforms. (And now I need to modify my code for cross-platform compatibility…grumble, grumble.)

Ah. I had no idea what EC2 was until I looked it up after my first post. So, the issue, I think, is as Quartz says – to take advantage of what EC2 offers, you need a parallelizable problem. Furthermore, I get the impression that EC2 will simply run whatever code you upload, simply supplying the hardware resources. Is that right? Or do they supply some type of infrastructure that will automatically distribute and manage your code?

I’m thinking that easiest program would be a process that spawns multiple sub-processes that each do the same stupid yet computationally expensive calculation (say, a Fibonacci or factorial). Perhaps just a simple shell script would do. Alternately, you might write a multi-threaded program that does the same thing. Java is awfully nice for this, especially with their recently added (as of 1.5) java.util.concurrent package.

Oddly enough, this is akin to what I’m working on right now. Specifically, distributing the analysis of hyper-spectral images pixel-by-pixel using LAPACK. If you really need an example of a computationally expensive task that is easily distributed.

Ultimately, it’s not the computational complexity you’re concerned with here; it’s the computational power and parallelism. If I understand the situation correctly, that is.

As noted, you need to multithread - split your task into independent units that can run on the multiple processing units. This Microsoft Sample Code implements multiple sort algorithms using multithreading - it may or may not help, it’s pretty old and it is Windows.

Si

Aw crap, my apologies for the triple post. Two more items I wanted to mention, though.

(1) Don’t believe “them” when “they” say gcc will recognize C++ code by the program name’s extension. Use g++ specifically.

(2) Don’t use Fibonacci or factorial for testing, as I’d think they’d be more of a memory load than anything else. Besides which, Fibonacci does addition and you should use multiplication or division (or maybe even trig functions, as Cosmic Relief mentions). Although Fibonacci might be nice to test the limit of how many individual processes they can handle.

On preview – yay, no triple post faux pas for me! :slight_smile:

Well I tried both ctime and time.h. I realize it should be the same but I tried it anyway. Anyone know why my gcc isn’t working? I tried on both OSX and linux. It’s weird though because I was using Xcode, right? And I did it two seperate ways. The first way was by starting a project in Xcode. Then it worked. The other way was creating a blank text file and editing it in Xcode while compiling it in the shell with g++. I feel like it’s a compiler settings issue. I know for a fact that Xcode uses gcc for compiling c++.

As far as distributing anything, I don’t think they offer any kind of advantage over physical servers. By that I mean it appears that you need to write some kind of glue code that will individually work with each server and manage everything. The advantage is that you can decrease or increase the amount of servers depending on your particular load. But it’s not some kind of magic button, it has to boot up the OS and do all of the normal stuff.

I did seem to pass by some help documents that seemed to be aimed towards helping people with this.

But I don’t know how much we’ll really need to worry about all of this anyway. It could be just a one-time processing of our data. In which case, we could just run multiple instances and have each handle a specific part of the data and then we’d be done. But we’re only using it for private reasons, so it’s not like they’ll need to be open to the web or serve any applications or anything.

But it’s pretty cool to be able to log into a behemoth like that for 80 cents an hour.

How easy it is to optimize for multiple processors depends on what the problem is. For a big ugly computation I did for my research recently, I parallelized it just by running multiple instances of the code, each working on a different chunk of the data. I could do this because the value at each data point, for my problem, was completely independent of the value at any other data point. Had I a thousand processors to work with instead of just the two on my desktop, I could have easily and efficiently used all of them, and finished my calculation five hundred times faster.

However, in some calculations, the results of one part will depend on other parts, sometimes in very complicated ways. Sometimes, there will be a clever way to re-arrange such a problem so that it parallelizes well; sometimes there won’t be. And I don’t think it’s even possible to tell in advance which problems fall into which category.