How to make 1v1 arena balanced and fair

If I say "1v1 rated arena", the first thing that comes to mind is probably not "balance". It is no secret that some classes in WoW do significantly better in 1v1 than others. Not only are some classes just stronger overall in PvP, but certain classes are also very effective counters to other classes. This makes rated 1v1 a controversial prospect. In this article, I will present solutions that would enable a fair and balanced 1v1 rated arena system in WoW PvP.

Why rated 1v1 arenas?

There are two key arguments for rated 1v1 arenas. The first is to simply provide more types of rated PvP content for people to do. In WoW, we currently have less variety of rated PvP content than we did 10 years ago (due to the removal of 5v5). There are obvious benefits to Blizzard for increasing the types of content available to players (people spend more time playing the game). For players, having more stuff to do that is competitive and rewarding is beneficial (and there is a separate argument to be made for avoiding the LFG for one's mental well-being). The second reason for introducing 1v1 rated arenas is that it provides a very manageable way of learning the basics of PvP. A bursty meta such as the one we are experiencing in the first season of Shadowlands is far more noticeable at lower rating than at higher rating (where the gameplay slows down considerably), and so giving the option of a 1v1 can provide a less complex entry to rated PvP. The gear rewards from a 1v1 bracket would probably have to be a fair bit more modest than the other brackets to reflect the importance of team play. Nevertheless, rewards such as unique titles and/or mounts could provide strong incentives for players at all ratings.

Fairness and balance

Before we dive in, let me clear about some definitions. The definition of "fair" I am going to adopt in the context of 1v1 is as follows: a system is fair if everyone has equal access to rewards irrespective of their class. In other words, 1v1 would only be fair if playing a weaker class would not mean that you are automatically less likely to obtain rewards from 1v1. Equally skilled players playing different classes should, generally, obtain the same rewards from a 1v1 rated arena bracket.

In the context of 1v1, I am also adopting the definition of "balance" to mean balanced matchmaking. Balanced matchmaking means that each player should be matched with an opponent who they have a 50% chance to win against, irrespective of their class. Indeed, this is a fundamental assumption of Elo systems: two Chess players with the same rating (e.g. 1600) will each win 50% of the time. We will return to how we can use this definition to create a system of balanced matchmaking later in the article.

Making a fair system

A fair system is a system that ensures that any class played with equal skill gives equal access to PvP rewards. Both proposals assume that the skill distribution of each class is roughly equal.

Proposal 1: Class-specific ladders

The simplest way of achieving fairness is just to have a class-specific ladder, with rewards tied to a character's relative performance on that ladder. This is an idea that has been raised numerous times, but was perhaps most recently discussed on the Allcraft PvP summit prior to the release of the Shadowlands expansion. On class-specific ladders, an Arms Warrior will not be competing against Windwalker Monks; instead, an Arms Warrior will be competing against other Warriors on the same ladder. Rewards might be given based on percentiles on the ladder, for example with rewards being given to players reaching the 80th percentile on the ladder (roughly the current percentile for 1600 rating in 2v2).

A major advantage of this approach is its simplicity: Blizzard could implement this fairly easily with only a small number of adaptations to the existing PvP system. A major drawback of a class-specific ladder is that any rewards would have to be calculated based on the relative rankings on the ladder (e.g. percentiles) since each class would have a completely different rating distributions. Thus, you couldn't give rewards based on 1800 rating in 1v1 since the ratings are, by definition, not comparable across classes. This clashes with Blizzard's general approach to PvP gearing, which is to gate PvP rewards behind rating.

Proposal 2: Class handicaps

An alternative to the class-specific ladder is to introduce an overall class handicap. Suppose that the average rating of all players is 1500. Demon Hunters have an average rating of 1400, while Monks have an average rating of 1600. The class handicap of a Monk in this case would be -100, whereas the class handicap of a Demon Hunter would be +100. Your adjusted rating, and the basis for calculating any rewards and your position on the ladder, would factor in the fact that you are playing an underpowered (or overpowered) class in 1v1.

A slightly more sophisticated version of this would also allow the rating handicap to change with rating. For example, it is plausible that a Demon Hunter may have a -100 rating handicap at 1500 rating, but a -200 rating handicap at 2000 rating. This would be a straightforward extension of the class handicap model.

In the below figure, I am showing the rating distribution for Demon Hunters and Monks in 2v2. This is of course not the same data that would be available in 1v1, but is the closest thing that is currently available. The left plot shows the actual rating distribution, and we can see that Demon Hunters have, on average, significantly lower rating (on account of being a generally weaker 2v2 class at the minute).

Demon Hunters have an average rating of 1340, while Monks have an average rating of 1452. The average rating across all players is 1406. Thus, for this 2v2 data, the handicap for Demon Hunters would be +66, and the handicap for a Monk would be -46. The right-hand plot shows what happens if we apply the class-based handicaps to the rating distribution. These distributions now look essentially the same, aside from a few peaks around reward ratings (1600, 1800, 2100, 2400) that are now misaligned.

It therefore seems that a simple fixed rating class handicap could work in principle for 2v2, and thus lends some preliminary evidence that it might be a viable strategy for countering excessive class imbalances in 1v1. The advantage of this system over a class-specific ladder would be that you could put everyone on the same ladder, and give rewards based on adjusted rating rather than percentiles on a class-specific ladder.

Making a balanced system

A natural objection to rated 1v1 is that it is far too "compositionally" sensitive. An 1800 rated Warrior might only win against a similarly rated Mage 15% of the time, and an 1800 rated Mage might only win against a similarly rated Rogue 15% of the time. In extreme cases, this essentially becomes a game of rock-paper-scissors, which makes for pretty dire gameplay. The way that Elo rating systems are meant to work is that if two players with equal rating (e.g. Chess players) are playing against each other, they should each win 50% of the time. Due to WoW not being perfectly balanced around 1v1, this will generally not be the case once we factor in classes. To be clear, all things being equal, your average win rate would be 50%, but it would not be 50% against all classes. Certain classes counter other classes and we would get a wide distribution of win probabilities for different class vs class matchups.

Ideally, we would like to have a 1v1 system where each matchup is created such that there is a 50% win chance for both opponents. This would be a system that had balanced matchmaking. The way we can achieve this is with a matchup handicap. A matchup handicap is an answer to the question: a Mage of 1600 rating would beat a Rogue of X rating 50% of the time. Under this system, a handicap would be given to all class-based matchups, and would be estimated from 1v1 arena data. For example, Mage vs Rogue might have a 150 rating handicap in favour of the Mage (so that a 1600 rated Mage would be considered equal to a 1450 rated Rogue). Mage vs Warrior might have a 120 rating handicap in favour of the Warrior (so that a 1600 rated mage would be considered equal to 1720 rated Warrior). A 1600 rated Mage that queues for 1v1 would therefore be matched with a 1450 rated Rogue, a 1720 rated Warrior, and so on. Rating change would then be calculated based on the matchup handicap-adjusted rating rather than your actual rating. Thus, a 1600 rated Mage that beats a 1720 rated Warrior would gain 12 rating if they won, and lose 12 rating if they lost (this is what you would gain or lose if you faced a person with equal rating under the standard arena system). The below figure illustrates an example 1v1 handicap for mages. Note that these values are hypothetical so if they don't make sense, don't worry; these would be estimated from actual 1v1 data using the method described in the appendix.

Fortunately, such a system is relatively straightforward to implement. I have worked out the mathematical details for those interested (see technical appendix at the end of the article). Like any solution it has advantages and disadvantages. A huge advantage is that each player would be going into an arena fight with a 50% win probability. This very likely improves the player experience of rated 1v1s quite significantly. A disadvantage of the system is that fairness is only guaranteed if the handicap model is an accurate one. I have derived a solution for a fixed rating handicap for a given class-vs-class matchup. This may not be the best model to describe how a 1v1 handicap actually works. As with the class handicap, it may be that a Warrior at 1500 rating has a +120 rating handicap against a Mage, whereas a Warrior at 2000 rating has a +190 rating handicap against a Mage. Fortunately, as with the class handicap, the matchup handicap can be extended to incorporate this type of behaviour relatively easily.

Rating-based class handicaps are essentially a way of factoring out differences that arise due to class imbalance. We're not balancing the classes; instead we're accounting for the fact that classes are imbalanced in our rating system. However, you rely on a decent amount of data for each class-vs-class matchup. The fact that there are only 66 possible non-mirror matchups in 1v1 means that this problem is very tractable for a game with as many active players as WoW. It also means that it's much more difficult to do for something like 2v2 (where you have 2556 possible non-mirror matchups) and 3v3 (where you have 132 132 possible non-mirror matchups).

In traditional Elo, the win probability of player 1 when facing player 2 (with ratings \(r_A\) and \(r_B\), respectively) is given as $$ P(y=1 | r_A, r_B) = \frac{1}{1 + 10^{(r_B-r_A)/\beta}} $$ where \(\beta=400\) is a parameter that is normally fixed in Elo systems.

We can introduce a simple fixed rating handicap \(\theta\) for a given class versus class matchup: $$ P(y=1 | r_A, r_B; \theta) = \frac{1}{1 + 10^{(r_B - r_A + \theta)/\beta}}. $$ This is the mathematical form that allows us to answer the question: a warrior of 1800 rating should beat a mage of 1800-\(\theta\) rating 50% of the time. \(\theta\) can be either positive or negative. If the two classes are equally matched, \(\theta = 0\) and no adjustment is made.

Given a dataset with rating data and observed match outcomes, we can calculate the likelihood of the parameter value \(\theta\).

Let's for simplicity denote the win probability given ratings \(r_A, r_B\) and handicap \(\theta\). $$ q = \frac{1}{1 + 10^{(r_B - r_A + \theta)/\beta}} $$ $$ \mathcal{L}(\theta | y, r_A, r_B) = \prod_i (q^{(i)}) ^ {y^{(i)}} (1-q^{(i)})^{1-y^{(i)}} $$ The log likelihood is $$ \begin{align*} L\mathcal{L} &= \sum_i \log\left( y^{(i)} q^{(i)} + (1-y^{(i)})(1-q^{(i)})\right)\\ &= \sum_i \log \left ( y^{(i)} \frac{1}{1 + 10^{(r_B^{(i)} - r_A^{(i)} + \theta)/\beta}} + (1-y^{(i)}) (1 - \frac{1}{1 + 10^{(r_B^{(i)} - r_A^{(i)} + \theta)/\beta}})\right) \end{align*} $$ This is a convex, single variable optimisation problem so in practice we can just do scalar maximisation on the log likelihood to estimate \(\theta\). However, having the derivative of the log likelihood allows us to do neat things like stochastic gradient descent to estimate \(\theta\) on the fly. $$ \frac {dL \mathcal{L}}{d\theta} = \frac{\log{10}}{\beta} \sum_i (1-y^{(i)}) \frac{1}{q^{(i)}} - y^{(i)} \frac{q^{(i)}}{q^{(i)}+1}. $$