You might be better off focusing on cache. CPU makers spend a huge amount of their area (~50%) on cache and a lot of time optimizing it.
The tricky part is designing an algorithm that scales proportionally to the amount of cache. It’s easy to design algorithms that need, say, 2 MB of cache and then fall of a cliff if you have less than that. But what you want is something that, roughly speaking, goes twice as fast if you have twice the cache. And perhaps is also sensitive to latency, so L1 is worth more than L2.
Of course the ASIC can dedicate most of its die to cache, but it won’t be as well-optimized as Intel’s stuff, so it’ll never be more than a small multiple of the performance per unit area.
That doesn’t sound like something you could actually do in the real world. The problem is that for mining in general, you need to have a math problem that takes a lot of resources to do one way, but can be easily verified once it’s finished. But there’s nothing about mathematics that is tied into obscure x86 addressing modes and instructions, so I don’t see how you’d actually create a system of mining that needs it. Sure you can specify ‘an algorithm’ that says to do a bunch of x86-specific stuff, but I don’t see how you can actually force a miner to run that algorithm instead of something else that produces the same result without the silly x86 fiddling.
Also tying your ‘open’ cryptocurrency to a single hardware platform (even a large one) runs counter to one of the core design goals and causes significant issues if it’s supposed to last in the long term. And you certainly can’t force miners or clients to run specific code without completely abandoning the portability and decentralization goals.
Yep, Dr. Strangelove mentioned that above.
It’s a valid point but I don’t think it eliminates the idea. The algorithm needs to maximize it’s mapping onto the x86 profile from an execution perspective. You don’t need to specifically state that instruction X or Y or addressing mode Z is used, but the more you can structure it to run most effectively using those attributes, the more expensive it is to create an ASIC that is the best performer.
Even if any individual subset of calcs can be optimized on an ASIC, if you have enough breadth of x86 attributes required for reasonable performance for different subsets, then the ASIC needs to account for all of them and cost goes up until it matches or exceeds Intel’s design/mfg of x86 CPU.
Agreed. This is just a fun puzzle to try to solve. The best solution would minimize complexity while maximizing effectiveness on the target platform, which today is x86 CPU if trying to be the most accessible to average joe.
I just realized that it’s easy to make my cache idea work: you just change the payout model.
You use one of the memory-limited algorithms like scrypt. But you allow variable memory use: the user can have it use 1 MB, 2 MB, 32 MB, or whatever is optimal for their system. The size gets baked into the verified block.
Then, the system pays the user in proportion to the chosen difficulty. Doesn’t have to be completely linear but that’s probably the easiest.
CPUs with big caches get paid more than little ones, but not by the orders-of-magnitude that you’d get if you had a fixed problem size that was just a little too big for the cache.
ASICs would help a bit, but you can only stuff so much SRAM on a die. And they don’t benefit from massive parallelism–sure, you can divide your cache into 64 components or whatever, but then each one gets paid at 1/64 the rate and so it’s a wash. You’d get maybe a 2x or 3x difference by throwing out the non-SRAM stuff, which probably isn’t enough to erase the lead Intel/AMD had from their decades of optimization, not to mention their process lead.