Poll: Do you recognize the term "Fuzz Testing", and are you a Software Developer

Judging by the followup posts, most of us developers who said we weren’t familiar with the term are familiar with the method, just not by that name. I’ve heard it called “dump testing” (from dumping garbage into it) and “Maggie-testing” (simulating input from a notional toddler named Maggie with an input device), or (prosaically) just “input testing”.

Jargon diverges with people working in different companies and specializations. Just because coders working in, say, a services management software lane don’t call something by the same name the security wonks use doesn’t mean they don’t know about it.

But, you do need to know what it’s supposed to do. Feed garbage into the program, and it’ll do… something. Maybe it does what it’s supposed to do, or maybe it does something bad. But if, given a random input, you knew what it was supposed to do, you could just write a program that did that: And that would be a replacement for the code you’re trying to test.

Or is it just that, of the things it’s not supposed to do, there’s some small subset that are things you know it’s not supposed to do? In that case, you could catch bugs that cause it to do those known wrong things, but still might not catch bugs that cause it to do other wrong things.

That’s fair, but still–it seems that anyone that follows security news at all would have encountered it. I’m not in security either, but I do read about the major exploits, and fuzz testing comes up a lot. Google’s Project Zero is a major proponent and has found a large number of exploits via that method.

Maybe people have seen it before, but the jargon didn’t stick for whatever reason. That happens too, IME–you skim past stuff that you don’t immediately recognize.

That’s certainly a big part of it. They generally shouldn’t crash, for one. If an image decoder library crashes on a malformed file, then there’s something wrong with it–it needs to do more input validation or something.

Fuzz testing can also be used to catch regressions. Suppose you write some code and it passes your fuzzed test suite–great! Now keep up that automated testing forever (with the same random seed) and compare against the “gold” outputs. If you ever add a bug, the fuzzing might catch it. Even if nothing seems “wrong,” if there’s any difference at all, there might be a problem.

Fuzz testing isn’t a replacement for other types of testing, but can be a good addition. And it’s pretty cheap in terms of developer time. Unit tests are time consuming, and are themselves never the same quality as the code they’re testing, so there’s always a chance they’re really catching a bug in themselves. But a generic fuzzing framework can be written once at a high quality level and then used to produce an unlimited number of variations.

I’m not a developer; I only manage some systems. Here is a real world example of when I used fuzzing (as I knew it). Using real names, because let the guilty be shamed.

This was a long time ago, back when there was a big difference in memory and speed between servers and desktop computers. SPSS came out with a Unix version of their statistical software. One method of interacting with it was as a service running on a server. Windows computers would run SPSS, which would submit code to the service, which listened on a particular port.

As a systems administrator I was very hesitant to have some random software running on my server that allows anyone to talk to it. So to test I ran the extremely simple fuzz test of

nc localhost 12345 < /dev/urandom

All that does is feed random data into the port where SPSS was listening. The correct response is for SPSS to do its best Gary Coleman impression, and then close the connection. Instead, SPSS crashed with a segmentation fault. That strongly suggests to me that at the absolute minimum it is trivial for someone to implement a denial of service against the SPSS software. More likely, it is possible to send specially crafted data (instead of random) which gets the SPSS service to do anything you want.

So running one simple fuzzing command let me know this was not something I wanted to offer to my users, and if they needed it, I would have to be extremely careful to insure it was isolated from potential bad actors.

Come to think of it, a black-box test (of any sort) isn’t very thorough, either. For instance, for a lot of software, if you try to feed it random data, the correct response is for it to give an error message of some sort. And on the black box level, the error message might be all that you can tell it’s doing. But what if, with the right (wrong) input, it gives an error message, and starts overwriting files, or something?

Sure, there are always cases where Test X definitively proves that you do have a bug. In fact, for any given bug, there will always be some test that proves that you have it. But no matter how thorough you are, you can never be sure that you don’t have any bugs.

I guess the bottom line is to just run as many tests as you can (limited both by your imagination in coming up with tests, and by the testing resources you have available), and in that context, a cheap, easy test like fuzz makes good sense.

This is all great info, everyone! The term is not as well known as I’d thought. I now have some alternate names for it, some example tools, and some examples of how it helps everyone. Ignorance fought! And keep it coming!

(From “The Last Bug”, by Lou Ellen Davis.)

https://lamport.azurewebsites.net/tla/tla.html
https://tla.msr-inria.inria.fr/tlaps/content/Home.html

I never heard of it either - my BS is from 1973. But here is something to support your point on randomness. New microprocessor designs get tested by running programs on a simulated model of them. This is done on a compute ranch with thousands of processors all running tests. (It helps to be a company that makes computers to do this.) The best way of testing is not to run tests written by the designers, but to assemble random snippets of code to run. This finds all sorts of corner cases developers never think of.
The programs have to be generated to make some sort of sense - so they are pseudo-random, not truly random, but this is the standard method these days.
When are you done? When the bugs found per week levels off at a low value.

I have heard of bebugging, which is when a team inserts bugs inside a supposedly finished program, and then measure how much of these known bugs the test cases find. The simulation of tests for hardware faults works exactly in this way.

I’m a software developer and this exactly my response.

But we tend not to use it because unit testing and functional testing already includes testing for error situations.

(And we have QA for the rest!)

Good. We can show that our programs are correct as long as the specification is correct. Now we just need a specification specification language to prove that the specification is correct. But then we need a specification specification specification to show that our specification specification is correct…

The thing to remember is that automated testing can be more thorough than manual testing. It’s much faster, can consistently test known problem cases and find new problems by providing random input.

It’s specifications all the way down.

I heard a rumor once of a program that was completely bug free.

That’s probably the “hello world” program.

You can, perhaps, be confident in the source code you wrote for “hello world”. But that source code will call some routine like printf(), or something equivalent to it: How confident are you that printf() is completely bug-free? And how confident are you in the compiler you use to compile your “hello world”? Or the compiler that compiled that compiler? Or the opcode built into the processor (or processors) that are running all of those programs? Or, since modern processors are too complex for any human to understand in their entirety, the programs (or compilers or opcodes) used to design those processors?

It’s maybe not quite turtles all the way down, but it’s still a heck of a lot of turtles.

Most versions of hello.c have bugs, if there is a requirement to alert the user to an error if printf() fails, such as by being redirected to an output file on a full-to-capacity disk.

How about a show of hands: Who here has ever checked the return value of printf() for an error or incomplete write condition?

Heck, let’s make it a poll:

  • I am a programmer and I have checked the return value of printf() at least once in my life.
  • I am a programmer and I have never checked the return value of printf().
  • I am not a programmer. What are you nerds talking about?
0 voters

That is a poll for C and C related programmers, not programmers. I think it is failing its error check.