Puzzle testing

For the past two years, I’ve helped a friend prepare for his annual puzzlehunt. As the test coordinator, my role is a little harder to explain than if I were authoring puzzles or promoting the event. I found myself drawing a lot of parallels to tech industry work, so I decided to write up some thoughts on why I chose this approach and how well it’s been working so far.

Puzzlehunt formats vary, but in contrast to traditional puzzles like crosswords and sudoku, they don’t come with instructions on what you need to do. Also, unlike escape rooms, there isn’t generally a staff member attending to you at all times, actively facilitating your gameplay to keep things moving along. That means that in addition to checking for puzzle correctness, you need to find out where people tend to get stuck and then either revise the puzzle itself or write relevant hints. It’s a tricky balance between being vague enough to give the players room for creativity, exploration, and learning, and being specific enough that they can tell they’re on the right track and don’t flounder around too long.

What kind of testing does PuzzleBang need?

  • PuzzleBang is written over the course of a few months, but revisions may be needed up until shortly before launch. This means that test feedback needs to be turned around as fast as possible. At the same time, it needs to be nuanced enough to convey how and why players got stuck.
  • PuzzleBang runs in conjunction with a week-long university conference. Puzzles are released one day at a time. Pre-written hints are released throughout the first few hours after a puzzle goes live. This means that support might be needed at any point during the week.
  • PuzzleBang is published as a custom site. This means there are very few constraints on puzzle formats, which makes it possible to incorporate sound or video, provide arbitrary interactions, and potentially hide hints in the source code. It also means that the infrastructure itself needs to be tested, not just the puzzle content.

How do test sessions work?

  1. Prepare materials. Game Control prepares for testing by creating a private spreadsheet. Each puzzle gets its own tab. The tab is seeded with a direct link to the puzzle, all scheduled hints, the solution, and a description of any known issues. This allows us to test puzzle content without being blocked by site infrastructure work.
  2. Recruit participants. I reach out to friends to ask them if they’d be interested in giving feedback. The typical test solving team is 2-3 players who already know each other, have their own computers with reliable internet access, and have at least some experience with puzzles. I bring all the materials and I handle notes, so there’s no pre-work or lingering to-dos for the solvers: everything is self-contained within our allotted time.
  3. Set expectations. At the start of our video call, after chatting a bit, I give an official introduction to our testing session. More than anywhere else in the process, I’m drawing inspiration from the UX researchers I’ve worked with for this part. I remind the solvers that the puzzles are a work in progress and that the goal isn’t to solve well or solve quickly, but to provide actionable feedback to Game Control. One thing that can be hard to overcome is getting people to ask for hints. Just about everyone wants to keep working on their own. I try to address this by emphasizing that asking for hints is necessary to testing the hints themselves.
  4. Facilitate the session. Throughout the test session, I need to remain actively engaged. There isn’t always much to say, so I take notes on their thought process just to keep myself focused and not wandering away to other browser tabs. It also helps me provide more details afterwards in case Game Control finds my summarized feedback unclear. Staying engaged also helps me volunteer appropriate hints when people might not think to ask for one, or redirect people if I know they’re going down a time-wasting path. This is always a tough balance, since I want to see their full thought process, but I don’t want them to get frustrated by losing a lot of time to something unproductive.
  5. Summarize insights. At the end of the session, when thanking participants for their time, I emphasize how much insight they’ve provided and how much more polished the puzzling experience will be as a result. This is true every single time: identifying pain points is the most obvious outcome, but occasionally a test session goes perfectly smoothly, which might actually indicate that the author should not revise any further and should leave the puzzle as-is. After they log off, I jot down everything else I remember in the spreadsheet, pull out specific action items, and tag Game Control to alert them to the feedback.

Some of what I’ve written above is aspirational, e.g., I’ve run sessions with solo players, and I don’t always remember to cover all the key points during my intro speech. I have a tendency to ramble when explaining PuzzleBang and puzzlehunts and how they differ from escape rooms. For the most part, I think participants still have a positive experience and Game Control still gets the feedback they need, even when things don’t quite go according to plan.

There’s one exception that I think can derail a session though, and the main reason I’m writing this post in the first place is to warn myself about it in the future: I can’t be in both roles at once. If I’m facilitating, I can’t also participate as a player. All sorts of anti-patterns come out of this, like trying to contact the puzzle author through side channels to ask for help, or getting really stuck on something and not knowing that it’s the wrong path. Even if that’s a realistic outcome during the actual hunt, it just doesn’t fit with the video call test approach. During the actual hunt, you can reasonably assume that you’re stuck because you just haven’t thought of the right thing yet. During a test session, you might be completely blocked because of a bug in the implementation.

Collaborating on PuzzleBang continues to be a rewarding experience. Having something concrete to work on makes it easier to reach out to someone online. It’s a shared experience that provides something to look at, not just another video call with disorienting levels of near-eye contact. Most importantly, it feels good to know that the puzzle author’s hard work will pay off because players will have a smoother experience.