Specific, technical questions like this have a 50/50 chance of meeting with success. Cultivating a list of topic-specific forums is good advice. But the SDMB has everybody. And just to prove your snide remarks wrong, Ms Patriot Grill, I will answer the OP in detail.
First of all, what exactly are you trying to do? It’s very difficult to beat a GPU. It has unimaginable memory bandwidth and processing capability. Only if your task is very non-parallel or has a lot of conditional code, and is integer-based, should you consider an FPGA. (FPGAs also win out when you need low, very low, or extremely low latency, or when you’re focused on the digital signal interface aspects–think PCB wires–rather than computation.)
Which product? There are two classes of FPGAs. Mid-range devices that cost $20-200 (meaning Cyclone and Spartan), and high-end devices that have ridiculous prices and are sold to the military or telecom. Besides having more gates, the high-end devices will have faster I/O pins that can do gigabit speeds. They’ll also have a bit more megaherz (alas, no gigahertz), more memory, etc. But mid-range devices have gotten quite powerful, and the high end parts just cost too much.
Memory access FPGAs have two kinds of internal memory and can interface to several kinds of external. Internally they have flop memory, which is extremely fast. Every “gate” in an FPGA is actually a piece of memory (a look-up table), and if you use them all at once you will have extreme bandwidth (terabytes/s). But you won’t have a lot of storage (a few tens of KB) and you’ll use all your gates. Think of these as CPU registers. Then there are special-purpose memory cells, which are also fast and have more capacity (a megabyte). Think of these as cache. Externally, you can use DDR2 DRAM, but you can also use SRAM and various other exotic technologies. It will be nowhere as wide or as fast as GPU memory. It’ll be similar to CPU memory, at best. Whichever memory you use, know that it will be a pain to use it.
Communication with a PC. PCI-Express is fastest (gigabytes/s, just like GPUs). You’ll need a high-end FPGA for that. PCI-e will be quite difficult. Some FPGAs can also drop into AMD CPU sockets. That, actually, will be fastest and more convenient. On the other side of the spectrum, an ancient serial or parallel port link will be very easy. Any sort of no-protocol data pipe is easy.
How much computation? That depends immensely. Complex if() statements that will confuse a CPU will go by with many per clock cycle. Yet a floating-point op that a GPU can do a hundred in a clock cycle will be much more difficult. In terms of raw number-crunching, take a look at the number of multipliers the FGPAs have (typically a few hundred). These can do one 16-bit integer multiplication per cycle (typically 300-600 MHz). Really, it’s not much. Like I said, it only makes sense for non-parallelizable problems. (Or others where you’re in it because you need a chip.)
Oh, and did I mention how difficult it all is? It’s actually easy to make a system-on-a-chip by slapping together a virtual micrcontroller, other virtual peripherals, etc. If you have a specific role for the FPGA (bypassing Xbox DRM?), you can code a small custom component. But running a complex custom app at very high throughput means coding that whole app manually, all in a hardware description language. You have to think of dataflows cycle by cycle. And if you factor in external memory and external PC interfaces, the task is immense. Profoundly immense.
Stick to GPUs and x86 clusters.