I wrote in 2016 about tournament design in general. Today I’ll talk about specific designs and consider Dan Gant’s proposed tournament design.
I use words precisely. A “game” is a single game. A “match” is a sequence of games between the same 2 players. A “pairing” is the choice of which 2 players play a given game or match against each other. (I don’t say “matchup” here, but I reserve the word for race matchups, terran versus zerg and so on.)
With n players there are n * (n - 1) / 2 possible pairings. Each pairing is potentially independent; A might beat all other comers but lose to X, Y, or Z, and you don’t know until they play. One of the goals of a tournament is usually to declare “a winner” even though there may be no unambiguously strongest player—there may be cycles of A beats B beats C beats A, as in the Condorcet paradox. That’s only one reason that tournament design is hard, but it’s enough!
tournament designs
Here are some popular tournament designs. Many more exist. There are also many ways to combine them; these are simple tournament designs that can be plugged together as modules to create complex tournament designs.
Round robin means each player plays each other. The SSCAIT round robin phase is a double round robin where each player plays each other twice, and the CIG and AIIDE academic competitions are similar but with many more rounds. A round robin tournament collects evidence evenly across all pairings, and traditionally it also weights the evidence evenly: Winning a game against a weak competitor is 1 point, winning a game against a strong competitor is 1 point.
Knockout means that losing players are eliminated. In single elimination, whoever loses one game is out; in double elimination, two losses and out; there are higher variants. Knockout tournaments usually feature fixed brackets, where the pairings of winners and losers are laid out beforehand. If the players are seeded 1 to 10, then pairing 1 with 10, 2 with 9 and so on gives an advantage to the higher seeds, so they are more likely to meet in final rounds. A knockout samples few pairings and gives less evidence than other designs that the winner deserves it, or in other words, the luck of the pairing is a big factor.
Grouping the competitors into sub-tournaments can be one stage of a tournament. Pro Starcraft tournaments commonly have a “group stage” where the players are divided into groups of 4 who play among themselves. The top 2 finishers of the 4 move on to the next stage, which has a knockout format. Grouping selects some pairings to gather evidence about, and ignores others. Luck again; did you get the group of death?
Progressive elimination designs come in many shapes, but in general, the tournament runs through steps or stages, and after each step the weakest competitors are dropped. For example, the players play a round robin, then 1 or more of the competitors with the lowest scores are eliminated; slice off those rows and columns from the crosstable. The remaining players keep their surviving results in the crosstable and add to them by playing another round robin, and the process repeats. This kind of design collects some evidence about all pairings, and deliberately collects more about the top competitors; it cares more about the strongest competitors, which have the best chances to win.
The Swiss system attempts to identify a winner with fewer total games. A Swiss tournament has a fixed number of rounds, much less than the number of players, where each player plays one game per round. In the first round, players are normally paired by seeding order (if players are 1 to 10, pair 1 with 6, 2 with 7, on to 5 with 10). In later rounds, players are paired against others with the same score: Winners against winners, losers against losers. If the number of players with a given score is odd, one player has to be paired up or down; there are rules about how to do this fairly given the seeding order. Compared to designs that collect more evidence, a Swiss tournament has a higher chance of players ending up tied at the top. Swiss makes sense when the number of players is large and the number of games per player must be kept small; I would recommend it over 4-player groups in pro Starcraft qualification tournaments (the ones to choose players before the main tournament starts).
matches
Anywhere that I wrote “game” above, the players could play a match of multiple games instead. That’s common in Starcraft knockout tournaments: You are knocked out, or at least knocked down to the loser’s bracket, if you lose a best-of-k match for some odd number k. A k-round round robin can be seen as a single round robin of k-game matches. A match can be taken as a tournament between 2 competitors, and like a bigger tournament, there are different way to design it.
Best of k means that the first player to win over half of k games wins the match. The remaining games may not be played—in Starcraft, they aren’t; in other sports, they sometimes are.
Ahead by k means that the players play games until one of them pulls ahead of the other with a lead of k games. If the players are evenly matched, then the match might continue for a long time, especially if k is big. You collect more evidence to decide a close call, but you don’t know how long it will take.
Statistical decision procedures are also possible, though I’ve never seen them used in a public match—they’re hard to explain to viewers. The match continues until one player is significantly ahead by some chosen statistical criterion, functionally equivalent to, say, “A is at least 67% likely to be better than B given these results.”
the Dan Gant proposal
Dan Gant of purple fame proposes using match scoring rather than game scoring for a round robin tournament. My 2016 tournament design post points out that this would have changed the results of AIIDE 2015. The proposed details, to quote Dan:
To create a better event that’s pure round robin, I think the formula is this:
1. Bots are ranked on how many opponents they had winning matchups against
2. First tiebreaker: Head-to-head record against all tied opponents
3. Second tiebreaker: Overall win%
Dan says “matchup” where I would use “pairing”. I foreshadowed this above when I pointed out that a k-round round robin can be seen as a single round robin of k-game matches. Dan says, instead of adding up the game results to get a competitor’s score, add up the match results. In the SSCAIT round robin phase, k=2, so a bot scores 1 if it defeated an opponent 2-0 and otherwise scores 0. The idea is that we should only care about how many opponents a bot can defeat—to beat more enemies is better, even if you lose occasional games, and he believes that leads to better incentives for bot authors. It’s a reasonable argument. To be a top bot currently, you must score nearly perfectly against most weaker bots, and the effort to get that near-perfect score is not as interesting as the effort to beat your next competitor up. That’s the claim, and it makes sense to me. I can also see that the claim may be more appealing to top bot authors than to others. Steamhammer’s tricks like burrowed zerglings will be countered by top bots soon if not already, and may never be understood by lower-ranked bots, so I expect that the long-run effect will be to score higher against lower-ranked opponents; that doesn’t make the feature less fun or interesting to me.
With a score range of only 0 to 1 per opponent, rather than 0 to k, there will be more tied rankings, so he proposes tiebreakers. With k=2 in SSCAIT, the first tiebreaker amounts to “I had 4 ties and 3 losses, you had 3 ties and 4 losses, so I’m ahead.” With k=100 or more in AIIDE, the first tiebreaker will hardly matter, and only the second tiebreaker will count.
It’s simple to modify the scheme to avoid tied matches, if that’s something you care about. For SSCAIT, instead of a k-game match with k=2, go with a best-of-k match with k=3. Based on the rate of 1-1 matches in this year’s crosstable, I estimate this would require about 10% more games. There would still be ties in the rankings, and breaking the ties might be more complicated, so I don’t consider best-of-k a better choice overall for SSCAIT.
Scoring matches instead of games means throwing away information, at least until tiebreakers are applied. How much information you are throwing away depends on k, so the argument for this design also depends on k. If k is large, you’ll often know which player is better in a given pairing long before all k games are played. Given that you go with match scoring, that is an argument for choosing something other than a fixed k-game match for each pairing.
Different tournaments have different goals. Dan’s proposal is in part calling for the goal of different bot author incentives to be incorporated in tournament goals. It’s perfectly reasonable, and may help the community maintain interest and activity over the long haul. If some tournament organizers agree, then there should be more discussion about what kinds of design best meet the goal. I also think it’s good that we have different tournaments with different goals, so it’s good if not all organizers agree.
If a large academic tournament like CIG or AIIDE accepts the argument, then I have a different proposal. I think a progressive elimination tournament might perhaps meet both academic “let’s measure everything” goals and Dan’s “give authors good incentives” goals. These tournaments have n players with n in the tens and k rounds for k near 100. They are round robin. A progressive elimination design might go like this: Play 50 rounds with all players. Then drop the lower half and play 50 more rounds (so far this is like CIG 2016 except that first stage results should carry over). Repeat once or twice more. The result is fewer total games, all pairings have at least half as much evidence collected, and pairings between top competitors are much more deeply investigated. Lower-ranked competitors were eliminated so detailed results against them do not affect the ranking of the final winners.
Maybe a more gentle form of progressive elimination would be better. I feel it does make sense to sample more data from higher-ranked bots than from the tail, though perhaps not so much more. As far as I can see, some design in a progressive elimination style could meet more goals.