Sudoku puzzle statistics

Gordon Royle at The University of Western Australia has a collection of 47793 inequivalent 17 clue sudoku puzzles. They provide a very useful dataset for exploring varous aspects of sudoku and for testing software.

Does the number of clues determine the difficulty of a Sudoku puzzle?

Some believe that the difficulty of a Sudoku puzzle depends solely on the number of initial clues, and is independent of the way the clues are arranged on the grid or the particular clues given. Below is an analysis of the variation in difficulty of the 17 clue puzzles. Were the difficulty to depend solely on the number of clues, all the 17 clue puzzles should have very similar ratings.

I calculated the difficulty rating for the subset (40553) of the 47793 puzzles which can be solved using only the hint algorithms in SourGumdrop - i.e. without guessing. This was done twice. Once using the "provocative" standard algorithm scores I first posted several years ago, and which are used to calculate the scores for SourGumdrop's built-in puzzles, and the second using the scores H1 = 1, H2 = 6,H3 = 6, L1 = 2, L2 = 2, N2 = 3, N3 = 3, N4 = 6, XW = 9, SF = 9, JF = 20, which probably more accurately reflect the difficulty of recognising and using each algorithm. The results are plotted in Figure 1.

SourGumdrop: score distributions
Figure 1. The distribution of difficulty ratings for 40533 17 clue Sudoku puzzles. The green points represent the standard scores and the red points an alternative set. The Y value represents the number of times each of the scores shown along the X axis was achieved.

For the standard algorithm scores the difficulty ratings vary between 6 and 61. The 6 corresponds to 6 Hidden Singles and the 61 to 23 Hidden Singles, plus 4 Hidden Pairs, plus 7 Hidden Triples, plus 1 Naked Pair, plus 1 X-wing: i.e. 23*1 + 4*2 + 2*2 + 7*2 + 1*3 + 1*9 = 61. The mean score is 30 with a standard deviation of 6. So we would expect 70% of 17 clue Sudoku puzzles to have a difficulty rating between the equivalent of 24 to 36 Hidden Singles, but over all there is a 10-fold difference in their possible values. For the alternate scores the difficulty rating ranged from 6 to 93.

Some might take this to show that there is a wide range of variation of difficulty for puzzles with the same number of clues, but it would be interesting to perform the same analysis on representative sets of puzzles with 18, 19,...,30 clues.

For the puzzles built in to SourGumdrop, chosen for their range of required algorithms, and each having 28 clues, the difficulties vary from 16 to 51 - so having a mid score of around 33. Comparison with the data from above shows the following: 280 of the 17 clue puzzles have lower scores than the lowest of SourGumdrop's 28 clue puzzles; 24 have scores higher than SourGumdrop's hardest at 51; the mean scores of the two datasets are not dissimilar.

The relative effectiveness of Sudoku algorithms

In another round of analysis the hint algorithms in SourGumdrop were employed to try to solve the 17 clue puzzles and the effectiveness of each algorithm was measured. The Simplest first strategy solved 40553 of the 47793 puzzles. The Random choice strategy solved a very similar number, but naturally, being a random process, the exact number varied with each run of the program.

Algorithm   A     A%       B     B%     C      C%   C/B    D       E
H1      1093696  94.4   194857  15.3  782934  16.1  4.0  21368  3763992
H2        38516   3.3    70379   5.5  407200   8.4  5.8      0   439739
H3         5267   0.5    27765   2.2  155586   3.2  5.6      0   180883
L1        20800   1.8   192722  15.2  507885  10.5  2.6     24  1071730
L2          229   0.0   191967  15.1  455686   9.4  2.4     12  1054669
N2           48   0.0   126364   9.9  475450   9.8  3.8     70    83176
N3            7   0.0   148016  11.7  750430  15.5  5.1    798   634253
N4            3   0.0   172777  13.6  944163  19.5  5.5    188   875674
XW           18   0.0    48999   3.9  105268   2.2  2.1      0    22394
SF            3   0.0    45822   3.6  131394   2.7  2.9      0   107115
JF            0   0.0    50744   4.0  135587   2.8  2.7      0   172844
Totals  1158587 100.0  1270412 100.0 4851583 100.0
Figure legend: H1 = Hidden single, H2 = Hidden pairs, H3 = Hidden triples, L1 = Locked candidate 1, L2 = Locked candidate 2, N2 = Naked pairs, N3 = Naked triples, N4 = Naked quads, XW = X-wing, SF = Swordfish, JF = Jellyfish. A: Number of applications of each algorithm using strategy Simplest first successfully - i.e. the counts are only from solved puzzles. B: Number of successful applications of each algorithm using strategy Random choice. C: Number of candidates removed by each algorithm using strategy Random choice. C/B: The values in column C divided by those in column B, i.e. the candidates removed per application of each algorithm. D: Number of puzzles solved by the repeated use of each algorithm alone. E: Number of candidates removed by the repeated use of each algorithm alone. A%, B%, C%: The figures for the corresponding columns as percentages.

These results from applying the Random choice strategy on such a large number of puzzles provide a good assessment of the usefulness of each algorithm. Column B% shows the likelihood of finding the pattern corresponding to each of the algorithms. Column C% the relative effectiveness of each algorithm - for example Naked quads is most effective, being around 6 or 7 times more effective than X-wing, Swordfish and Jellyfish. The C/B column shows the average number of candidates removed for each occurrence of each pattern. D shows how outstandingly effective hidden single on its own is for solving random puzzles. Why is Naked quads judged to be so effective? In the random processing each algorithm had an equal chance of being selected; the Naked quads pattern occured frequently (columns B and B%) and removed, on average 5.5 candidates (column C/B) per occurence.

E provides a set of counts which can be used as benchmarks for testing new computer programs: does the new program achieve similar counts. If not, either SourGumdrop is bugged or the new program is. If you are suspicious about any of these numbers please contact me. In February 2007 I recoded my Python algorithms in C to enable me to generate interesting new puzzles more quickly. The results in this table were very useful in revealing bugs: if the Python and C did not agree, at least one of them was wrong. If a particular algorithm was producing a surprising proportion of hits it could be bugged. For example, I discovered that I had misunderstood the defintion of hidden triple when coding the original version of the program: the pattern I was looking for was far too restrictive and SourGumdrop was finding an anomalously small number.

An example of a hidden triple discovered by the bugfixed version of SourGumdrop.

SourGumdrop: a minimal hidden triple pattern
Figure 1. An example of a minimal hidden triple pattern. Symbols 2,4,5 provide the solutions for the three cells, so the symbol 6 can be removed. Normally 2 or 3 members of the set occur in each cell. Here, for one cell, only a single member occurs.