uggghh… i had written a nice, elegant, full reply and then accidentally turned off my computer before sending.
anyway, the two main points to take away from it:
I would argue that setting the equalizer to be identical to compare two sets of speakers is precisely the opposite of what you should do. If one set has poor bass, but it is entirely correctable through the equalizer, then why in the world should one hold that against it? In the ideal test, you should tune the equipment as much as possible to bring out the full potential of the speakers. If you were to conduct a purely synthetic test, then of course you’d want identical inputs. For a real-world test, you should have real-world conditions. In fact, the results may turn out different with different equipment which has different tuning capabilities. You’d also need to collect different groups of listeners (divided up by taste and hearing acuity) to get a full picture. You’d also need to retune the entire setup for each participant in your study. Better yet, you’d have to let them do the tuning themselves.
If you say that that’s too much work, then you’d probably be right. However, let us not lose sight of what an ideal assesment of quality is. Of course, when you’re stepping away from the ideal you get into another gray mess as well. If you don’t let everyone retune the setup for themselves, then perhaps you should keep the equalizer the same to not bias the test in any direction. Or rather, better to just run the study many times with a varying, random selection of equalizer settings. However, averaging over individual preference like that will get you into big trouble no matter what you do (eg, the public loves crazy bass and lossy DSPs like surround sound or whatnot, while pure audiophiles hate them).
Second, double-blind A-B tests are not the standard of ideal examination either. They suffer from “multiple-choice test” syndrome. SAT instructors always tell their students that if they come to a question of which they’re not sure (but have some hint), then the more they sit thinking about it, the less the chance of them getting it right. It’s kind of like when you were a kid and you could sometimes repeat a word to yourself enough times that it almost got disconnected from its meaning and became very alien-sounding. Our minds just get confused that way. If the A-B pairs are varied each time, then that effect can be minimized. However, if you sit a person down to go A-B between two synchronized versions of the same song for 10 minutes, then by the end he won’t be able to tell apart his hand from his face (so to speak). An ideal A-B test would take place over many days spread out over many listening sessions, and the As and Bs should even have breaks between them (the way wine tasters have to clear their pallete, although i don’t quite see how a cracker does that).
Admittedly, i haven’t argued effectively my belief that straying from the ideal (in the way that past researchers have done) will impact the results enormously. However, i think it will impact the results enormously.
In particular, if I had a [double-blind] week with that super amp you mentioned, i’m sure i could discern any difference. Of course, maybe the whole thing is a scam and they just found some crappy over-priced model. Ideally, you would make a test across all expensive amps. Ideally, you would also look for cases where a playback device produces some super-carefully aligned harmony that only an amp of the super-duper class could reproduce. Perhaps if you set it to play back ordinary, improperly aligned/tuned music, it just wouldn’t hold much of an advantage. Such special cases might make a big difference for some (who posess properly tuned equipment and recordings).
Of course, i’m by no means disagreeing with the general conclusions that high-end equipment poses no more than a subtle benefit. I’m just clarifying that previous studies have had a margin of error that may be even greater than the span of most definitions of “subtle”. Ie, no point mentioning them.
Anyway, the people on this forum have this strange notion that just cuz someone does a study and stamps it with the ACME stamp of science, then its value is as that of gold. Well, guess what, if studies were so great, then THEY WOULDN"T CONTRADICT EACH OTHER ALL THE F***** TIME. I’m sorry, this is a bit of a non-sequitor. However, i just wanted to insert that after my dissection of the ways a study could go wrong for one particular example of a field. Maybe some of you will even come away with learning to cry “cite” just a tad bit less often.