This Friday, Scotland will be involved in a draw for a World Cup finals for the first time since 1997. We haven't qualified yet of course, but we will be represented by the UEFA Playoff Path A placeholder.
With up to six different continents represented in a World Cup, there are many constraints placed on the draw to ensure geographical separation - that is, to ensure teams from the same continent cannot be drawn into the same group as each other. In the case of Europe, there are thirteen teams to spread across eight groups, meaning there will have to be groups with two European teams. FIFA have stipulated that every group must have at least one European team, meaning five groups will have two.
The teams involved in the draw have been placed into four pots as shown below. Qatar, as the hosts, are placed in pot 1 and will automatically be drawn into Group A. The 28 qualified teams are then placed in accordance with their current FIFA ranking, and the three playoff placeholders are placed into pot 4.
Pot 1 | Pot 2 | Pot 3 | Pot 4 |
---|---|---|---|
Qatar (A) | Mexico (N) | Senegal (F) | Cameroon (F) |
Brazil (S) | Netherlands (E) | Iran (A) | Canada (N) |
Belgium (E) | Denmark (E) | Japan (A) | Ecuador (S) |
France (E) | Germany (E) | Morocco (F) | Saudi Arabia (A) |
Argentina (S) | Uruguay (S) | Serbia (E) | Ghana (F) |
England (E) | Switzerland (E) | Poland (E) | UEFA Playoff Path A (E) |
Spain (E) | USA (N) | South Korea (A) | Intercontinental Playoff (A/S) |
Portugal (E) | Croatia (E) | Tunisia (F) | Intercontinental Playoff (N/O) |
The letters in brackets after each team indicate the continent to which they belong, as follows:
Note that in the case of the intercontinental playoff placeholders, these cannot be placed in the same group as a team from either of the continents involved. This means that in the case of the Asia-South America playoff, they cannot be placed in the same group as an Asian or a South American team. As there are seven such teams in the first three pots, it is possible that they will already know which group they are going to be placed in, before pot 4 is drawn!
Another such situation could arise for the European playoff placeholder. As every group must contain at least one European team, and as there is only one European team in pot 4, they might be forced into a group if it has no European team after the first three pots are drawn.
Quirky possibilities such as this are the cause of some surprising differences in the probability of Scotland (or Wales or Ukraine) drawing different teams. While it makes sense that Scotland are less likely to face a European team than a non-European team from any given pot, it is surprising to learn that we are nearly twice as likely to draw Mexico as to draw Uruguay!
The probability of the UEFA Playoff Path A winners drawing each team from each pot are shown below:
How do these differences in probability come about? Well as mentioned above, there are situations where a team in pot 4 might be forced into a specific group if there is only one group they can legally go into, but there is more to it than that. To fully understand, we need to look at the draw procedure.
The four pots will be drawn in sequence from 1 to 4. To draw a pot, the balls are taken out one at a time, and each team drawn is placed in the first group they can legally be placed into. In some cases, it might appear that a team can legally be placed into a group, but they actually cannot because of the teams remaining in their pot.
For example, suppose we are drawing pot 4 and have three teams remaining - Saudi Arabia, Ecuador and Canada - to place into groups F, G and H. Group F contains Belgium, Denmark and Senegal. Group G contains France, Netherlands and Morocco. Group H contains Brazil, Mexico and Serbia. If we were to draw Saudi Arabia it looks at first like there is no reason they cannot join group G with an African and two European teams. However, if you look at Group H, you will see that it contains both a North and a South American team. This means that neither Ecuador nor Canada are allowed to be placed into this group, and so Ecuador and Canada will need to be placed into groups F and G. Saudi Arabia therefore are placed into Group H.
Because of the sort of complications described above, there is no straightforward way to calculate the probabilities of any given two teams being drawn together - at least not once you get as far down as pot 4. In order to calculate these probabilities, we need to write a computer program to simulate the draw.
What follows is an in-depth look at how this simulation was performed and how the probabilities were calculated. This might be overly technical for readers with no programming or statistics background, but if you're interested, please read on.
I set out with the idea that if I could accurately simulate the method by which the draw is performed, I could run every possible permutation and then calculate the percentage of cases in which the UEFA pot 4 team drew a team from each continent in each group. Within this process, it is not necessary to represent each individual team separately, it is only necessary to represent the continents. This reduces the number of permutations significantly, but still leaves nearly two billion permutations for the full draw!
The simulation was implemented in python. I decided I would define a function
called draw_pot
, which would take the sequence of the teams coming out of the
pot (encoded as single letters in a string), and optionally the state of the
groups already (after previous pots have been drawn). This function then returns
the state of the groups after assigning these teams.
Below are shown some simple test cases showing the behaviour of this function (you can see the full test module here):
def test_pot_1(self):
self.assertEqual(
draw_pot('AEEEEESS'),
('A', 'E', 'E', 'E', 'E', 'E', 'S', 'S')
)
def test_pot_2_simple(self):
self.assertEqual(
draw_pot('EEEEESNN', ('A', 'E', 'E', 'E', 'E', 'E', 'S', 'S')),
('AE', 'EE', 'EE', 'EE', 'EE', 'ES', 'SN', 'SN')
)
def test_pot_2_clash(self):
self.assertEqual(
draw_pot('EEEEESNN', ('A', 'E', 'E', 'E', 'E', 'S', 'E', 'S')),
('AE', 'EE', 'EE', 'EE', 'EE', 'SN', 'ES', 'SN')
)
In the third test case, you can see that the sixth team drawn out (S
representing South America), was placed in the seventh group as the sixth group
already contained a South American team.
The algorithm used to implement this function is as follows:
1
, then go back to 2
with
the next available group2
with the next available groupThis function will fail if there is no group the team can legally be placed in,
meaning that, one level up on the call stack, we will go back to step 2
. This
simulates the behaviour described in the "Saudi Arabia" example above, where a
team looks like they can legally be placed in a group, but doing so would create
an impossible situation later on in the draw.
In step 1
, when determining how many teams from the confederation are allowed
into a group, we check how many groups already have two European teams. If five
groups already have two European teams, we set the value to one even if the team
is from Europe. In this way we prevent a sixth group from having two European
teams, thus enforcing the policy of every group having at least one. The
implementation of this function, and all the functionality described in this
article, can be found here.
This function allows us to generate the groups based on the order the teams come
out of the pot. Now we need a function which can generate all the permutations.
The pot_permutations
function takes a string representing all the countries in
the pot, and optionally a prefix, and returns the permutations for that pot:
def test_pot_1(self):
self.assertEqual(
pot_permutations('SSEEEEE', 'A'),
(
'AEEEEESS', 'AEEEESES', 'AEEEESSE', 'AEEESEES', 'AEEESESE',
'AEEESSEE', 'AEESEEES', 'AEESEESE', 'AEESESEE', 'AEESSEEE',
'AESEEEES', 'AESEEESE', 'AESEESEE', 'AESESEEE', 'AESSEEEE',
'ASEEEEES', 'ASEEEESE', 'ASEEESEE', 'ASEESEEE', 'ASESEEEE',
'ASSEEEEE',
)
)
def test_multiple_confederations(self):
self.assertEqual(
pot_permutations('ABBC'),
(
'ABBC', 'ABCB', 'ACBB', 'BABC', 'BACB', 'BBAC',
'BBCA', 'BCAB', 'BCBA', 'CABB', 'CBAB', 'CBBA',
)
)
The prefix
parameter is necessary for pot 1, where Qatar are guaranteed to be
the first team drawn from the pot. With these two components, we are ready to
run through all the permutations pot by pot and figure out how many instances of
each possible state there are after each pot.
There's just one problem. Pot 1 has 21 permutations, Pot 2 has 168, Pot 3 has 560, and Pot 4 has 10,080. Multiply these numbers together and you get nearly two billion!
When solving a computational problem, I always start off with the approach of "if in doubt, use brute force" - i.e. run it for every possible permutation, and see if that is fast enough. Having implemented these components, I started it running. Pot 1 completed instantly, Pot 2 took a couple of seconds. After waiting about ten minutes for Pot 3 to complete, I went for a walk. I thought through the numbers in my head and realised it would probably take about six months before Pot 4 would be completed, so this approach is clearly not fast enough. As I continued my walk, I realised that the way to solve this is to use a Monte Carlo Method.
A Monte Carlo method is a technique for calculating probabilities across a large number of permutations. If it is impractical to run every permutation, you can take a random sample and run for those. As long as the sample is large enough and there is no bias in the sampling, you can have a high degree of confidence in the accuracy of the results.
To implement this, I created a method which runs forever, maintaining a count of the number of iterations. After every 1000 iterations, it prints the probability of each of a number of events - i.e. the number of times each event occurred, divided by the number of iterations. As the output is printed and the number of iterations ticks up, these numbers become more and more fixed and we have a higher degree of confidence in their accuracy.
Before we begin, we generate the permutations for each of the four pots. Then,
on each iteration we select a permutation at random for each pot, concatenate
these together, and pass them into draw_pot
. The events we are looking for are
labelled as 1E
- European team in pot 1, 2N
- North American team in pot 2,
etc - and refer to the other teams in the same group as the UEFA pot 4 team.
In this way we end up with an output that looks like the following:
610000 1A:0.307787 1E:0.204631 1S:0.487582 2E:0.219751 2N:0.615033 2S:0.165216 3A:0.367298 3E:0.075815 3F:0.556887
This output line shows that after 610,000 iterations, 1A
(i.e. Qatar in
Scotland's group) had occurred in 30.8% of cases, 1E
(i.e. a European team in
Scotland's group) had occurred in 20.5% of cases, etc.
If we were to run this again, we might find some variance in these numbers even after such a large number of iterations. Based on the variance observed, the results appear to have an error of something in the range of ±0.5%.
The events 1A
, 1E
etc are implemented by the extract_pot_confederations
function. There is also an extract_first_three_confederations
function
implemented which extracts the permutations of confederations from the first
three pots. And of course if there were other events whose probability you
wanted to investigate, such as how likely is it for a given team to be in a
given group, or simply how likely is it for another country, such as Canada, to
be drawn with any given team, you would simply need to write a new extract
method and pass that into the Monte Carlo simulation.
I hope you have enjoyed reading about these investigations and it has given some insight into the World Cup draw procedures. Of course, it is all just a bit of intellectual curiosity, and even though there is only a 0.3% chance of Scotland being drawn in a group with England, Uruguay and Iran, it could still happen. We can do millions of Monte Carlo simulations, but on Friday the draw will be performed just once, and the outcome of that is the one that matters.