Introduction
Now that we know the nuts and bolts of a performance test, it’s time to learn how to use this technique to track down and errors solve in our code.
Although plenty of information about perft testing and how to implement it exists, you won’t easily find a well-explained methodology to perform the test, format the results properly, and interpret them. In this devlog, we’ll be cover another step-by-step guide for debugging a chess program using perft testing. We’ll start with an example to illustrate the methodology and then I’ll list a couple of errors I could find using this technique.
A practical guide for debugging using perft testing
Step 1 - Identify the position
We need to run a perft test on a set of positions we chose and identify the problematic ones. In this example, we found a mismatch in the number of moves for this position.
![]() |
---|
Perft Test results |
Our code is generating more moves at depth 4 than expected. We need to test this position further to find out the issue.
Step 2 - Perform a single-depth test in your program
We’ll use the perftTest function we created in the last devlog to output results categorized by moves, similar to stockfish.
To do this, we will create a new board object using the FEN representation of the tested position. We can print the board to the console and verify it is the same position.
![]() |
---|
Creating a new board |
Now we run the perftTest function. The first depth that showed incorrect results was depth 4. We’ll start from that depth.
![]() |
---|
Perft Test results for depth 4 |
Step 2 - Perform a single-depth test in Stockfish
We use the position command with the fen tag to indicate that we are passing a FEN representation to set up the board. The w tag indicates that white plays first. The KQkq tag indicates that both colors have castling rights on both sides. Again, we can print the board using the d command and check everything is set up correctly.
![]() |
---|
Setting up the board in Stockfish |
Next, we use the go perft command along with the depth value to run the test.
![]() |
---|
Perft Test results for depth 4 in Stockfish |
Step 3 - Compare results
The next step is to look for differences in the numbers generated by both programs. This will help us pinpoint the starting moves that lead to incorrect results in the following moves.
This task might be easy for our example because we only have 6 starting moves to compare. However, when we have a list of more than 10 starting moves, comparing them one by one will surely become tedious and difficult to do.
For this reason, we’ll remember that we are in the 21st century and modern technology is available to us.
Wise words. Taken from [1] |
We can use a spreadsheet application like Excel to paste and format the results nicely so we can make a comparison.
![]() |
---|
Perft test results in Excel |
I used the feature Text to Columns on the Data tab to separate moves and numbers into two columns. Then, on the same tab, I used Sort A to Z to order moves alphabetically and be able to compare them side by side.
![]() |
---|
Comparison of perft Test results in Excel. |
Step 4 - Select a bad move and apply it on the board.
With the previous table, we can determine which moves are generating inaccurate results down the line. The goal of this debugging methodology is to find a sequence of moves that leads to a board position that contains an error. We’ll follow this sequence of moves until we reach the faulty position. At that point, diagnosing the problem will be a piece of cake.
To find the second move in our sequence, we need to make the first move on the board first. Any move that has a difference in numbers will suit. In this example, all moves lead to a buggy position, so we can choose any of them and move forward.
In our program, we’ll use the functions provided by the board object to generate moves for White and then apply one of them to the board. I selected a pawn push from d2 to d4. Then, we can print the board to verify the move was done correctly.
![]() |
---|
Making a move in our program |
In Stockfish, we can make moves by adding the moves flag at the end of the position command and then typing the sequence of moves. Make sure to adjust the flags for the playing color and castling rights accordingly.
![]() |
---|
Making a move in Stockfish |
Step 5 - Perform a test for the next depth
Performing the same process as before will give us the next set of moves to compare. However, since we made a move on the board, we effectively went one level deeper into the game tree. We only have 3 levels left to explore (including the current one). For this reason, the next perft test must be done for depth 3.
![]() |
---|
Perft Test results for depth 3 |
Notice the test was performed with Black playing first because Black always plays the second move after White plays first.
![]() |
---|
Perft Test results for depth 3 in Stockfish |
![]() |
---|
Perft Test results comparison for depth 3 in Excel |
For depth 3, most moves have a difference of 0, which means they don’t create inaccuracies later. However, the moves b2a1r and b2b1r show a difference in the number of moves.
Both moves represent a black pawn reaching the 1st rank and promoting to a rook. We could already interpret from this that something is wrong with the way our move generation considers promotion. Furthermore, an error with castling might be occurring since the moves involve rooks.
This might be enough information for us to look at the code and spot an error. But if we are still clueless, the surest way to proceed is to go deeper into the game.
I selected b2a1r as the second move.
Step 6 - Repeat until you reach the last depth
We’ll repeat the same process over and over until we reach depth 4. Selecting bad moves, making them on the board, running calculations, comparing results, and adjusting parameters like the playing color and castling rights properly.
![]() |
---|
Perft Test results comparison for depth 2 in Excel |
Again, we can select any move here that has a non-zero difference. I selected a4b3.
Step 7 - Look for the incorrect moves at the last depth
When there’s only 1 depth left, we have reached a position that creates inaccurate results. All that is left is to determine which moves are missing or exceeding, and why.
From the previous table, we observe that most positions generate 1 extra move. We have to look for that extra move and figure out why the move generation algorithm is producing it.
![]() |
---|
Perft Test results for depth 1 in Stockfish |
All moves generate 1 move at depth 0, which is the move itself. Stockfish shows that this position has 38 possible legal moves.
![]() |
---|
Perft Test results for depth 1 |
The length of the array of moves is 39, which is effectively 1 move larger than what Stockfish calculated. In this case, we can easily detect that a queen-side castling move is repeated, which shouldn’t be possible.
Step 8 - Diagnose the problem
From the sequence of moves we performed, we can infer that the problem lies in how the algorithm calculates castling moves after a pawn promotes to a rook.
After carefully pondering and looking at the code, I realized that my castling calculations didn’t consider the possibility of new rooks being added to the board. I reasoned that if castling rights hadn’t been broken so far, the rooks must be in the corners of the board and they could castle with the king. This is true until you add a new rook in a square different from the corners. Even if this new rook has not moved and castling rights are not lost after a promotion, this rook cannot castle with the king.
Step 9 - Solve the problem
The solution was to check that rooks were in the corners of the board. If they aren’t, they definitely cannot castle. We explained this piece of code in the devlog about castling.
class Rook extends Piece {
isOnInitialSquare(){
let isRookOnInitialSquare = rook.color === E_PieceColor.White ?
(rook.rank === 1 && rook.file === 1) | (rook.rank === 1 && rook.file === 8) :
(rook.rank === 8 && rook.file === 1) | (rook.rank === 8 && rook.file === 8);
return isRookOnInitialSquare;
}
}
![]() |
---|
Addition to code to fix the bug (in green) |
Example 1. Promotion not debugged properly
I found an error when performing a test on the following position.
This position is known as Kiwipet and it’s notable for helping find many bugs [2]. After performing a full perft test, there was a mismatch at depth 4.
![]() |
---|
Perft test results for Kiwipet |
By following the steps previously explained, I arrived at depth 4 by going a1b1, h3g2, and a2a3, resulting in this position:
Then I generated legal moves and started looking for inconsistencies.
![]() |
---|
Legal moves generated by the program |
![]() |
---|
Legal moves generated by the Stockfish |
My program generates 47 moves and Stockfish 57 moves. 8 moves are missing from my program.
If we look at the position reached at depth 4, black can perform two promotions moves: g2h1 and g2g1. These moves are generated by our program. However, Stockfish shows something a little different.
![]() |
---|
Promotions generated by Stockfish |
Stockfish also generated both promotions, but 4 different versions of each one. Stockfish acknowledges the fact that a pawn can be promoted to a knight, a bishop, a rook, and a queen. Therefore, it generates a separate promotion for each type. This is denoted by the letter at the end of the move (r for rook, b for bishop, n for knight, q for queen). Our program does not do this. It treats a promotion as a single move and the information of the new piece is added later. That’s why there’s a difference in the number of moves.
This made me realize that the testing code was not handling promotions accordingly. The number of possible moves after promoting to a queen differs from the number of moves after promoting to a knight. to properly perform a perft test when there’s a promotion, we need to consider the 4 cases.
There are two ways to solve this. The first is to generate 4 moves for every promotion in the move generation algorithm. The second is to change how the testing code counts promotions. I didn’t want to change how moves are generated because that involves tweaking other parts of the program. Rewriting the testing code is easier because it is loosely coupled to the rest of the program.
function perftTest(board, depth, debug = false, playingColor = E_PieceColor.White) {
if (depth == 0) {
return 1;
}
let moves = board.generateMoves(playingColor);
let numberOfPositions = 0;
for (let move of moves) {
if (move.flag === E_MoveFlag.Promotion) {
numberOfPositions += perftTestPromotion(move, board, depth, debug, playingColor);
continue;
}
board.makeMove(move);
playingColor = OppositePieceColor(playingColor);
let positions = perftTest(board, depth - 1, false, playingColor);
numberOfPositions += positions;
board.unmakeMove();
playingColor = OppositePieceColor(playingColor);
if (debug) {
console.log(MoveToString(move) + " " + positions + "\n");
}
}
return numberOfPositions;
}
function perftTestPromotion(promotion, board, depth, debug, playingColor = E_PieceColor.White) {
let typesToPromote = [E_PieceType.Knight, E_PieceType.Bishop, E_PieceType.Rook, E_PieceType.Queen];
let numberOfPositions = 0;
for (let pieceType of typesToPromote) {
let promotionString = MoveToString(promotion) + pieceColorTypeToKey(playingColor, pieceType);
promotion.newPieceType = pieceType;
board.makeMove(promotion);
playingColor = OppositePieceColor(playingColor);
let positions = perftTest(board, depth - 1, false, playingColor);
numberOfPositions += positions;
board.unmakeMove();
playingColor = OppositePieceColor(playingColor);
if (debug) {
console.log(promotionString + " " + positions + "\n");
}
}
return numberOfPositions;
}
In the perftTest function covered before, we add a special function called perftTestPromotion that counts moves for promotions. Inside this function, we go over each possible piece a pawn can be promoted to, and we run a perft test one level deeper as usual.
![]() |
---|
Legal moves generated after fixing the error |
Example 2. Protected pieces
Let’s look at a more challenging example.
We start again at the Kiwipet position and perform the sequence a1b1, f6d5, and e5f7. The result is this position:
Here’s the number of moves our program and Stockfish output, respectively:
![]() |
---|
Legal moves generated by the program |
![]() |
---|
Legal moves generated by Stockfish |
An extra move. What a surprise!
The comparison reveals that our program generates the move e8f7. This move is illustrated in the following board. The king is trying to move to the red square and capture the white knight.
Capturing the white knight will mean stupidly sacrificing the black king to the white queen, which is an illegal move.
From the fact that this move is made by the king and that the captured piece is being protected by the queen, it is evident that the problem lies in how the program deals with protected pieces.
However, the reason why this particular case is not considered correctly is not clear. Indeed, it took me quite some time of reasoning to find the answer.
What’s the error?
Initially, the calculation of protected pieces was done along with the calculation of pinned pieces. If you think about it, both concepts are fairly similar from the perspective of sliders. Take a look at the following boards:
These scenarios are almost identical except for the pawn’s color. In the first position, the white pawn is a pinned piece because it’s protecting the king from the rook. In the second position, the black pawn is a protected piece because the rook is protecting the pawn from the king.
These similarities made me think I could kill two birds with one stone. Can’t blame myself though. The brain is wired to look for patterns. Unfortunately, this time it created an error in my logic.
This is known as a “psychological set” and describes how we tend to expect new phenomena (protected pieces) to behave similarly to phenomena we have seen before (pinned pieces). Therefore, we overlook important differences because we assume it operates the same way and we already understand it. [3]
Stupid brain. Taken from [4] |
Here’s how the code worked:
class MoveGenerator{
generateMoves(board, piecesDict, pieceColor) {
// ... previous code ...
//if there's no intersection
if (intersection === 0n & !isSliderBesidesKing) {
//there are two or more pieces in between slider and king.
//Therefore, slider is not checking king and there are not pinned pieces.
} else if (intersection === 0n & isSliderBesidesKing) {
checkers.push(slider);
} else if ((intersection & board.getEmptySpaces()) === emptySpaceBetweenKingAndSlider) {
//There's no pieces in between slider and king. Slider is distant-cheking the king
checkers.push(slider);
} else {
//There's one piece in between slider and king
//if the piece is an ally
let isPieceAnAlly = (intersection & board.getOccupied(king.color)) > 0n;
if (isPieceAnAlly) {
//piece is pinned
moveFilterForPinnedPieces[intersection] = rayFromSliderToKing | slider.position;
} else { //else piece is an enemy
//piece creates a discovered check
protectedPieces |= intersection;
}
}
}
}
If you remember, this code is similar to the one we reviewed when talking about using opposite-ray attacks to calculate pinned pieces. However, here we are calculating checkers, pinned pieces, and protected pieces at once.
In the last else statement, we conclude that there’s one piece between the king and the slider. If this piece is an ally, then we have a pinned piece. If it is an enemy piece, the piece is protected by the slider from being captured.
There’s nothing wrong with this logic. The mistake is found earlier in the code:
class MoveGenerator{
generateMoves(board, piecesDict, pieceColor) {
//... previous code ...
let slider = enemyPiece;
let sliderRays = slider.getSlidingRays();
let rayFromSliderToKing = GetRay(slider.rank, slider.file, king.rank, king.file, false, true);
//if there's no possible ray between slider and king
if (rayFromSliderToKing === 0n) {
//king is not within slider's range, do nothing.
continue;
}
//... following code ...
}
}
The purpose of this piece of code is to omit calculations for any slider that cannot reach the king. For example, an enemy rook that is not in the same file or rank as the king cannot attack it, so there won’t be any pinned pieces.
This works well for pinned pieces, but does not apply to protected pieces. A demonstration of that is the incorrect move we found earlier.
Here the queen cannot reach the king within its attacking directions. However, that doesn’t mean the white knight is not a protected piece.
Fundamentally, the error was stringing these two pieces of code together, one after the other. Therefore, the solution comes down to modifying the sequence of statements.
What is the solution?
Before solving the error, the generateMoves function was a big amalgamation of code that was responsible for calculating checkers, pinned pieces, and protected pieces, all in one go.
![]() |
---|
generateMoves() before the fix. |
To tame this abomination, I created three functions that deal with each task separately.
![]() |
---|
generateMoves() after the fix. |
Looks better, doesn’t it? That is SRP [5], baby! Distributing responsibilities into three little functions not only helps keep the code neat and organized. It is mentally less demanding to worry about each thing separately than to keep track of everything at the same time.
No one’s skull is really big enough to contain a modern computer program (Dijkstra 1972), which means that we as software developers shouldn’t try to cram whole programs into our skulls at once; we should try to organize our programs in such a way that we can safely focus on one part of it at a time. The goal is to minimize the amount of a program you have to think about at any one time. - Steve McConell [3]
Also, SRP makes the code more maintainable and decoupled. If any function produces an error, we know where to find and correct it. Changing code from one function won’t affect the others.
Conclusion
As we can see this methodology is really helpful because it isolates problematic positions from a million possibilities and lets us find predictable and reproducible errors. Then, a careful analysis of the nature of the position or the characteristics of extra moves will point us toward the cause of the problem.
Whats’ left is for you to try to follow these steps and debug your program. Trust me, it will be a fun process, and resolving the errors will give you great satisfaction.
References
- [1] LMac4U. (2016, July 6). Spongebob Technology GIF. Tenor. https://tenor.com/es/view/spongebob-technology-patrick-hung-gif-5567497
- [2] Perft results. Perft Results - Chessprogramming wiki. (2024, March 15). https://www.chessprogramming.org/Perft_Results
- [3] McConnell, S. C. (2004). Code complete: A practical handbook of software construction (2nd ed.). Microsoft Press
- [4] Trendizisst_. (2020, May 17). Brain Trash GIF. Tenor. https://tenor.com/view/brain-trash-spongebob-squidward-dumb-gif-17233216
- [5] Wikimedia Foundation. (2024, December 29). Single-responsibility principle. Wikipedia. https://en.wikipedia.org/wiki/Single-responsibility_principle