Help me spend $250,000, For Science!

The university and the institute I, indirectly, work for are very impressed with my research. The institute wants me to start solving some practical problems that they have and the university is excited by the possibility, albeit very remote in my opinion, that this work is a step towards general artificial intelligence. So, they’ve given me $250,000 to spend on equipment, which is great. Unfortunately, I’m not really sure what to spend it on. The two of them seem to think that with a better computer, I could do more work. I’m actually sure where they got this idea, because it isn’t something I’ve ever mentioned, but here we are.

I was thinking of getting one of the personal AI supercomputers from nVidia (NVIDIA DGX Platform) but I thought, let’s see what the Teeming Millions, think. Maybe somebody knows about something that might be very useful. For those unaware, my research is on using artificial intelligence to solve natural computing problems.

So, I have CDN$250,000 burning a hole in my pocket. I can only spend it on equipment, I cannot hire anybody nor can I use it to pay for conference travel and such (I have funding for that already anyway). Any thoughts?

Can you buy a Ferrari 488 and install Google home in the dashboard and call it KITT version 2.0? That’s what I would do. I have no useful advice but congrats on your grant.

Why can’t you rent processing power rather than buy it? I thought various companies offered hardware to rent when needed to perform large scale calculations.

Would you hire more people if you could? Can you divert some of that money into existing equipment expenses and free up funding for more people?

Also, your user name is way more sinister now.

I’d love to hear more details on your approach. The equipment you need depends on the approach you’re taking.

a. Are you using deep learning networks, made using some variant on existing algorithms? If so, you want some servers with multiple NVidia cards in them. It depends on how you are constructing these networks and how much data must pass between them (as an example, a classifier that breaks an image into labeled and classified boxes, like YOLO, takes a large input image and outputs a tiny output).

This determines what hardware is most cost effective. For example, if you have a lot of sparsely coupled networks, where the data traffic between them is low, you can use separate computers, each with several NVidia off the shelf graphics cards, and they will talk to each other through an off the shelf gigabit switch. 250k, even canadian, would buy a ton of these - they’d only cost 2-5k each, depending on configuration.

b. If you’re not using existing algorithms, you may not be able to make the algorithms you use work on graphics cards. CUDA is very limited in what kinds of algorithms it can run efficiently, and if you are at early stages, you may not have the resources (massive amounts of labor) to program your prototype algorithms into CUDA anyway. This is what the Xeon Phi is for. You still have to be a skilled programmer who understands parallelism to use them, but you can get yourself some Xeon Phi based servers for between 5 and 20k Canadian, depending on how many accelerator cards are in them, how many CPU sockets and cores for the main CPU, etc.

c. Do you have to buy equipment before some deadline or can you bank this money for future expenses? Because the most cost effective way to go is to rent equipment through AWS or other cloud services until you know what you need.

What I’m hearing is that you’re not an expert computer engineer. You need to one, unfortunately, or you’re going nowhere. (and I’m not volunteering, my skills are in different areas and I haven’t finished my Masters). I’m guessing you’re a mathematician? Unfortunately it takes very specific skills to go from a paper description or Python/Matlab prototype algorithm to something that runs at high speed, reliably, and uses multiple computers or thousands of separate compute units in a graphics card in parallel. And a heck of a lot of work - a full custom solution involves many person years of development.

If you’re doing something that involves just cascading neural networks together, where those networks use a standard implementation supported through tensorflow or theano or something, then you can do it yourself. In that case, I’d just rent a machine through AWS (about $1 an hour) until you know for certain that these off the shelf tools are going to work. Then I’d just read a guide like this : The $1700 great Deep Learning box: Assembly, setup and benchmarks | by Slav Ivanov | Slav and either build one or get a professionally made server with similar specs in it.
Some major figures in AI have concluded that the best way to build a real one is to build agents that manipulate with the real world - our world - starting with rudimentary manipulations (picking up blocks like a small child does) and gradually increasing in sophistication. Various evolutionary algorithms you might use are going to be far more stable if the data they must solve comes from a real robot, where you cannot cheat to reach the goal, unlike a crude, simulated environment where you maybe can.

You can easily spend all 250k on real robotics hardware, it wouldn’t even be hard. I’d look for something off the shelf, ready to go, and consider that you’ll need a bunch of ancillary equipment. A test cell to put it in. Multiple cameras and data acquisition computers. Protective Plexiglas safety shields. Space. 250k is probably not even enough money, actually, at least not for a fancy setup.

As a side note, politics wise and your attractiveness to grad students - if you can make a real robot do something useful with your fancy ideas, you’ll probably be able to squeeze more money out of the administration and probably more grad students will want to help. Robots that actually do stuff are a lot more interesting than some numbers on a screen.

Why rent hardware when you can buy time on the Amazon ec2 System?
Edited to add: I think the rates are quite reasonable.

Does the university or institute have a data center already? If so, talk with the admins to get their advice as well. It may make configuration and maintenance much easier if you get hardware that they are already familiar with.

That’s what I meant, rent the processing power on an as needed basis. Due to economies of scale and the fact that places that rent will constantly be updating their own hardware (the Nvidia processors could be out of date in a few years) renting seems the better option.

Does Nvidia offer any kind of option to buy time on their processors?

No, but Amazon does. Anywhere from the p2.xlarge which is 64 gigs of RAM, a quad core, and a single GPU for 90 cents an hour, to the p2.16xlarge which is 16 bucks an hour but has a crazy 16 GPUs.

It looks like they come fully configured and tested, ready to go with all the drivers installed (and most importantly, verified to work on this hardware)

Our poster here would be best served just maybe building himself a single equivalent to the Amazon server. Just get himself a server motherboard from newegg, CPU, a single GPU, some RAM, and install all the same software on it. Use this machine day to day to test his code and develop on. (so he isn’t having to pay 90 cents an hour at the terminal). Then, when he needs more power and has it working without errors locally, rent the p2.16xlarge, run a shell script real quick to sync the latest version of his software over and start running using all 16 GPUs connected by custom switches, oh yeah…

And yeah, you’re totally right, this is going to be a lot more efficient to rent. Whatever ideas he’s working on are things that probably need a lot of code development time and a lot of pain, and during those periods, it’s dumb to have some monster collection of GPUs just depreciating. Amazon can be renting that same hardware to someone else when he’s not using it, and it’s just more efficient.

10,000 Raspberry Pis? (powering and connecting them is left as an excecise for the reader) :wink:

Brian

You can’t even spend it on contracts, e.g. for software development & support? How about just “buying” custom designed software that can help you?

So, you’re not sure that you can use $250k worth of equipment. Is there anyone else in your research group who can? You could maybe buy the fancy-schmancy setup that they need, find some way to justify it for your work, use it occasionally yourself, and mostly just let them run their projects on it.

This is by far the best suggestion. You win the thread! :slight_smile:

I literally don’t have hiring capability. I’m just a low level research drone from sector 7G.

<beep> Nothing to worry about human. <beep>

Hmmm… I should ask that. That’s a good idea. I wouldn’t mind offloading some of the coding work.

I cannot really talk about it in great detail until after the papers are published.

All my programs are CUDA capable (all customized code), but I don’t evaluate them with massive parallelism because it weakens the results for publication.

The grant is for two years.

I am a computer scientist, with a strong focus on mathematics in my undergrad (2 courses shy of a double major). As above, all my code supports single threaded, multithreaded and massive parallel execution modes.

Nothing I’m doing has much to do with robotics, except in some tangential distant future kind of way.

A few people mentioned renting time so I won’t quote each of you specifically.

See, this is where it gets kind of silly. I already have all the supercomputer time I could possibly need. I have an account on Compute Canada and if I want, which isn’t very often because for publication massive parallelism isn’t that impressive unless massive parallelism is the point. I didn’t ask for this grant. I was asked by the dept head to make a presentation because my work is doing impressive stuff and so I did. Next thing I know I have this money, that I did not ask for and I really cannot use in any meaningful way. I suspect the dept head asked for it on my behalf (more on this in a second) as my supervisor says he didn’t.

Ok, so I’m certainly not going to turn it down. It looks too good on a CV. But I cannot simply not use it (apologies for the double negative). And if I spend it I kind of have to make use of the equipment, that I don’t really need. But its fine. If I got the DGX say, or stacks of nVidia Titans, I would use it and get some results and make everybody happy. I would try to find some really hard problem to solve that requires massive parallelism and solve it. I get publication(s), the money is spent and used well, everybody is happy.

So, I’m not faculty right now. But a faculty position is opening up and I think the dept head would like me to fill it. And that’s fine by me, I want a faculty position and I like this university. So me getting this grant is good for all concerned because I’ll have some equipment for when I can get my own minions… grad students.

This is exactly what I’m thinking. I really don’t need any equipment. But the DGX station or a stack of Titans can certainly help out the lab. And that’s where I’m leaning and I think that’s why the dept head got the money. And I don’t blame him really. He saw a chance to get a big influx of cash for the lab and he took it.

Ah, I get it, so it’s “Merry Christmas, here’s something for you that I’ll want to borrow”.

EDIT: And the faculty position opening up soon means that a bit of brown-nosing certainly couldn’t hurt.

Well, if I had a mountain of money to spend, I’d feel a duty to be efficient about it. This is why we keep saying you should rent Amazon time - you do agree with us that if you build a 16-GPU monster (probably would cost 30-50k, so you’d be able to buy 5-8 of them), even if you do your own massive parallelism, you’re going to be sitting there at a terminal for hours with the big rigs sitting idle while you fight a bug.

And then another consideration is in computing, each marginal dollar for a single node has diminishing returns. A processor that costs twice as much is at a certain price range only 20% faster. And then if you pay double after that, you’re now buying overclocking equipment for another 3%.

So I’m just saying, slightly lower end gear you could buy a mountain more of. Or for some problem types you could go for the extreme bleeding edge and make the most amazing single node you can get. With phase change cooling.

You could also look into more IT equipment. If you collaborate with other institutions, upgrading a conference room and fitting it with good video-conferencing system may be a worthwhile investment. And maybe an interactive digital whiteboard and/or video wall.

Agree with scr4.

Or, let’s talk about AI as applied to a specific problem. Say, playing StarCraft 2 without cheating. An AI system able to do that needs more than just some GPUs. It would need a set of computers that take the video signal, captured from a memory buffer of the system playing the game, and classify it. You would need to iteratively train that classifier.

Which means you really want a rack of equipment. You want PCs running the game, separate ones running the classifier, and probably separate ones running the planner and simulator*. Probably 4 computers per instance of the game, and 2 of those computers have 1 or more high end GPUs or Nvidia machine learning accelerators. And then you want to train the agent in parallel so it has many thousands of hours of experience. So there’s your 250k right there : you could probably afford between 10 and 30 full setups, depending.

There are a huge number of similar problems that work out like that.

As a side note, this kind of setup is not something you can do with your existing supercomputer Canada subscription. Heck, I think it would be a pain to set up via AWS. It’s an example of a problem that you could contribute to where you actually do need that kind of money.

To get convergence, your agents would need to be playing the game for many thousands of hours. Or you could use GTA5 and be working on self driving. It need not even be a game, you could use many other simulation packages to simulate your problem space. For example, robotic motion, quadcopter flight - developing an AI able to do any of that stuff needs a simulator and enough iterations to converge.

*the simulator is the component that predicts future game states using neural networks varying based on actions your agent takes.

Refrigerator for the beer …

You know, I’m actually really glad I posted this thread. I wasn’t going to ask but this has been very helpful. I’ve been thinking about this solely as computers and in particular computers for “me” (even though I don’t need any). However, this thread has helped me clarify what has been on the edge on my thoughts. That the best thing here is to look at it from a lab and dept. POV. So long as I can justify that the money is “helping” my research in some fashion, I think everybody is happy. Cool.

Thanks everybody!

Of course, if anybody has any additional ideas, please feel free to post them!

Now, to go the Ferrari and KITT upgrade store. :slight_smile:

A robotic fridge that brings you a cold beer.

The grad students would probably erect statues to me and worship me as their god.

Hmm…