# Sudoku Generation Algorithm

There have been a number of threads lately talking about Sudoku, a Japanese (?) numeric crossword-like logic puzzle. They’re apparently something of a fad these days; while I’ve never seen them in the newspaper, a number of Dopers’ papers apparently print them now.

Anyway, I enjoy them, and I’ve got a program that generates them for my PocketPC. But…it seems to generate a fairly small set of them (it claims 5000 or so, but I’ve gotten the same one several times), the set it does generate seems pretty similar in form, and the user interface, to put it gently, sucks.

So I want to write a program to generate (not solve, which is easy) them myself. I’m not immediately seeing an algorithm, however.

Sudoku rules: a 9x9 grid, broken up into a 3x3 grid of 3x3 spaces. Each row and column of the main grid must have each numeral from 1 - 9 exactly once. Each 3x3 subgrid must have each numeral from 1 - 9 exactly once. A “puzzle” is a grid with some of the numbers filled in initially: the solver must start from this configuration.

A “valid puzzle” is one which has a single unique solution from the initial grid: no more, no less. An empty grid would not be a valid puzzle, for example (since there are a very, very large number of possible combinations). Nor would a puzzle with only one or two squares filled in: again, there would be multiple combinations.

It’s easy (well, not trivial, but relatively easy) to generate a “solved” grid of numbers. From this, I must determine a set of “clue” numbers to leave in the grid that (a) allow enough information to solve the puzzle, and (b) allow only a single solution. I’m pretty sure that (a) is a subset of (b).

Even better, many Sudoku programs claim to be able to produce puzzles of varying difficulty. The “difficulty” appears to be more than just a simple “number of clue squares” – I’ve seen “hard” puzzles with more initial numbers filled in than “easy” ones, for example. Ideally, I’d like to have a user-selected difficulty as part of the algorithm.

Just to make this tough, I’d like the algorithm to be comprehensive: it should be able to generate ANY valid puzzle – I don’t want it to be like the app that I’ve got that seems to generate only a few different puzzles, with a few simple variations.

So, anyone whose been doing these for a while care to guess (or tell) an algorithm for generating them? Plain English is fine, no need to drag this down to the code level.

I’ve been wondering this myself since I first tried to do a Sudoku puzzle.

Generate them using a constraint satisfaction algorithm. Click here.

Yes, they’re a Japanese puzzle.

They’re certainly a fad here. The puzzle magazine with which I’m currently torturing myself has 113 puzzles, 41 of which are sudoku puzzles. (One of them is a very diabolical one that is really five sudoku puzzles, one being in the center position and the other four sharing one corner with it. Two of the remaining sudoku puzzles jack up the difficulty level by being 16x16 puzzles.)

I’ve been wondering for a few days now exactly how many possible sudoku puzzles there are, not counting reflections and rotations. Does anyone know of a way to determine this number short of counting every single sudoku puzzle?

Btw, the Monterey Herald and Puzzle Japan both have online puzzles to solve.

Here is the Wikipedia’s article on sudoku.

From the Wikipedia link above:

So the puzzle isn’t Japanese; however, it became popular thanks to Japan.

Start with a full solution. Remove a number. If removing that number would result in a grid with more than one solution, put it back. Recurse until no number can be removed without creating additional solutions.

I think it should work, though I’m not sure it’s optimal. Verifying that the grid at step t has only one solution may not be polynomial time.

You could turn it into a randomized algorithm by selecting the number to remove randomly. The run it several times until you’ve found a starting grid with acceptably few numbers initially filled in.

(I looked at the paper linked by Dominic after typing this and they suggest essentially the same algorithm, additionally giving a way to determine if a grid has more than one solution.)

We discussed it a bit here, but I don’t think we ever came up with a definite answer.

All the Sudokus I’ve seen have their initial clues symmetric about the center cell (like crossword puzzles). So you’d have to remove two numbers - a cell and the matching one on the opposite side (unless you remove the center cell itself).

Here is a link to a translation of a step by step Sudoku creation tutorial from Japan. There is a harder tutorial, but the translator died before it was complete. It looks like it would be fairly simple to implement as a program.

Some nice links here. Dominic Mulligan’s method is a little light on details in the generation section, and is mainly a matter of (admittedly automated) trial and error. It has the advantage of creating a minimum spanning set of “disclosures.” (CurtC, the symmetric property doesn’t appear to be necessary; many of the machine-generated sudoku’s I’ve seen don’t have symmetric disclosures.)

Voyager’s link is closer to what I had in mind; it generates a soduku by adding numbers one at a time to an empty grid, always validly. It, too, has a bit of the “here magic happens” when selecting the disclosed numbers at the beginning, but I think there’s a process here that I can make work.

What’s left is this concept of “difficulty,” which is mentioned in Dominic Mulligan’s link, but not really defined. I still don’t understand what makes a given Sudoku “harder” than another.

In all honesty, I was hoping that a simpler algorithm was well-known. Given that it’s not (or it’s a trade secret), perhaps I’ll switch from intending to implement this to just looking for a better existing program.

I have a program from http://www.sudoku.com/ that works pretty well, 30-day trial and \$14.95 to buy. (It showed a false virus report with one version of AntiVir, but the virus def was fixed the next week. Ask me about “those stupid Italian boot ROMs” sometime.)

The current (December) Games Magazine has a very brief discussion on making your own, but not nearly specific enough to write code.

Was discussing this with a friend recently and decided to see whether I could find an answer. Haven’t yet, but found something which might be of interest to programmer types. This site has code (open source) for a composer (as well as a solver). Alas, IANACG, so this means nothing to me.

If I were making it, I would just have it be a random number generator and a solver. You would pick out a random number at a random location, then see if it was solvable. If not, you add another number.

This allows you to use different solving techniques so that you can vary difficulty. If you want to create a hard puzzle, you use a solver that does some of the more fancy deduction and will brute-force out to X permutations. If you want an easy one, you can use a solver which only can use some basic figuring.

One way to estimate difficulty is to try solving it algorithmically. It is straightforward to write a Sudoku-solving program which follows the same simple rules that people use. [I’ve implemented a solver like this, but not with difficulty scoring.] This can be a simple recursive application of the following rules:
(1) If the grid is inconsistent, return “inconsistent”;
(2) If the grid is full, report it and return “solved”;
(3) Apply heuristics, e.g. the simplest heuristic
(3a) If there is a cell with a single possible assignment, after ruling out values in the same row, column, or block, fill it in and recurse
(3b,c,etc.) (Add heuristics until you get bored)
(4) Guess: pick a cell; for each value from 1 to 9 fill the cell with this number and recurse

To estimate the difficulty from this, assign a “difficulty score” to each rule. The simplest heuristics, like (3a) above, get a very low difficulty score; more complicated heuristics involving multiple elements or more subtle patterns get higher scores. The guess-and-backtrack rule (4) gets a score based on the difficulty of checking the consistency of the guess (that is, based on the score of the reasoning needed to rule out each incorrect guess). The total score is the sum of all of the scores of the individual rules used in solving the puzzle.

I generated some interesting puzzles online here, which has a backend of gnome-sudoku (sourceforge link). In the manner of coders everywhere, there seems to be no documentation, but if you RTFS I imagine you could find something out.

Seems like you could also eliminate two numbers being “switched” for example, if the two replaces the seven, and vice versa, it seems to me that that’d be, essentially, the same puzzle. Does anybody know what this is called? I suppose it would work for “rotating” through three symbols, as well.

The “difficulty” level of a sudoku is related to the techniques required to solve.
A sudoku that can be solved by simple row/column/square elimination is easy.
The next solving technique is column/row and cell elimination, where a number in a column/row can only go into a single cell - thus eliminating that number from the rest of the cell.
There are also pair cell eliminations - where two entries in a cell can take the same two numbers - even if other cells could take those numbers, they can be eliminated.
The most complex solving techniques involve looking at three or more cells together - I still cannot get my head around it.
Note that NONE of these techniques involve a guess and backtrack approach. Sudoku can all be solved by logic techniques alone - that is why a unique solution to the puzzle is required.

Go to www.sudokuslam.com - they have a sumo mode that gives help to solve the puzzles, and hints on what to look at next (and an explanation of the technique used to solve the hint). It will help you understand the concept of puzzle difficulty.

Si

There’s a nineteen line Sudoku solver written in Object Caml here.

But you’re testing to see whether it’s solvable by using your algorithm, right? This technique will generate only puzzles that your algorithm already knows how to solve. It would also not be very efficient, because solvers can take a while for difficult situations (at least on my Palm, which, in addition to the newspaper, is where I play them).

I’ve read that for Sudoku puzzles, there is a very simple and fast check to see whether a puzzle has a unique solution. This test doesn’t tell you what the solution is, just whether there is zero, one, or more than one solution.

My Palm seems to generate the puzzles by starting with a random, completed puzzle, then taking away cells (of course, symmetric about the center cell), and seeing if the resulting grid still has only one solution. If it does, it then takes away another pair of cells and checks that. Eventually, it will take away cells that have more than one solution, so it backs up one step, to the last one that had a unique solution, and presents that as the puzzle.