Sudoku puzzle statistics
Gordon Royle at The University of Western Australia has a collection of 47793 inequivalent 17 clue sudoku puzzles. They provide a very useful dataset for exploring varous aspects of sudoku and for testing software.
Does the number of clues determine the difficulty of a Sudoku puzzle?
Some believe that the difficulty of a Sudoku puzzle depends solely on the number of initial clues, and is independent of the way the clues are arranged on the grid or the particular clues given. Below is an analysis of the variation in difficulty of the 17 clue puzzles. Were the difficulty to depend solely on the number of clues, all the 17 clue puzzles should have very similar ratings.
I calculated the difficulty rating for the subset (40553) of the 47793 puzzles which can be solved using only the hint algorithms in SourGumdrop - i.e. without guessing. This was done twice. Once using the "provocative" standard algorithm scores I first posted several years ago, and which are used to calculate the scores for SourGumdrop's built-in puzzles, and the second using the scores H1 = 1, H2 = 6,H3 = 6, L1 = 2, L2 = 2, N2 = 3, N3 = 3, N4 = 6, XW = 9, SF = 9, JF = 20, which probably more accurately reflect the difficulty of recognising and using each algorithm. The results are plotted in Figure 1.
For the standard algorithm scores the difficulty ratings vary between 6 and 61. The 6 corresponds to 6 Hidden Singles and the 61 to 23 Hidden Singles, plus 4 Hidden Pairs, plus 7 Hidden Triples, plus 1 Naked Pair, plus 1 X-wing: i.e. 23*1 + 4*2 + 2*2 + 7*2 + 1*3 + 1*9 = 61. The mean score is 30 with a standard deviation of 6. So we would expect 70% of 17 clue Sudoku puzzles to have a difficulty rating between the equivalent of 24 to 36 Hidden Singles, but over all there is a 10-fold difference in their possible values. For the alternate scores the difficulty rating ranged from 6 to 93.
Some might take this to show that there is a wide range of variation of difficulty for puzzles with the same number of clues, but it would be interesting to perform the same analysis on representative sets of puzzles with 18, 19,...,30 clues.
For the puzzles built in to SourGumdrop, chosen for their range of required algorithms, and each having 28 clues, the difficulties vary from 16 to 51 - so having a mid score of around 33. Comparison with the data from above shows the following: 280 of the 17 clue puzzles have lower scores than the lowest of SourGumdrop's 28 clue puzzles; 24 have scores higher than SourGumdrop's hardest at 51; the mean scores of the two datasets are not dissimilar.
The relative effectiveness of Sudoku algorithms
In another round of analysis the hint algorithms in SourGumdrop were employed to try to solve the 17 clue puzzles and the effectiveness of each algorithm was measured. The Simplest first strategy solved 40553 of the 47793 puzzles. The Random choice strategy solved a very similar number, but naturally, being a random process, the exact number varied with each run of the program.
Algorithm A A% B B% C C% C/B D E H1 1093696 94.4 194857 15.3 782934 16.1 4.0 21368 3763992 H2 38516 3.3 70379 5.5 407200 8.4 5.8 0 439739 H3 5267 0.5 27765 2.2 155586 3.2 5.6 0 180883 L1 20800 1.8 192722 15.2 507885 10.5 2.6 24 1071730 L2 229 0.0 191967 15.1 455686 9.4 2.4 12 1054669 N2 48 0.0 126364 9.9 475450 9.8 3.8 70 83176 N3 7 0.0 148016 11.7 750430 15.5 5.1 798 634253 N4 3 0.0 172777 13.6 944163 19.5 5.5 188 875674 XW 18 0.0 48999 3.9 105268 2.2 2.1 0 22394 SF 3 0.0 45822 3.6 131394 2.7 2.9 0 107115 JF 0 0.0 50744 4.0 135587 2.8 2.7 0 172844 Totals 1158587 100.0 1270412 100.0 4851583 100.0
These results from applying the Random choice strategy on such a large number of puzzles provide a good assessment of the usefulness of each algorithm. Column B% shows the likelihood of finding the pattern corresponding to each of the algorithms. Column C% the relative effectiveness of each algorithm - for example Naked quads is most effective, being around 6 or 7 times more effective than X-wing, Swordfish and Jellyfish. The C/B column shows the average number of candidates removed for each occurrence of each pattern. D shows how outstandingly effective hidden single on its own is for solving random puzzles. Why is Naked quads judged to be so effective? In the random processing each algorithm had an equal chance of being selected; the Naked quads pattern occured frequently (columns B and B%) and removed, on average 5.5 candidates (column C/B) per occurence.
E provides a set of counts which can be used as benchmarks for testing new computer programs: does the new program achieve similar counts. If not, either SourGumdrop is bugged or the new program is. If you are suspicious about any of these numbers please contact me. In February 2007 I recoded my Python algorithms in C to enable me to generate interesting new puzzles more quickly. The results in this table were very useful in revealing bugs: if the Python and C did not agree, at least one of them was wrong. If a particular algorithm was producing a surprising proportion of hits it could be bugged. For example, I discovered that I had misunderstood the defintion of hidden triple when coding the original version of the program: the pattern I was looking for was far too restrictive and SourGumdrop was finding an anomalously small number.
An example of a hidden triple discovered by the bugfixed version of SourGumdrop.