Ranking Methods I: Page Rank

This is the first part of the series "Ranking Methods in football". The series introduces a variety of methods to determine the strongest teams on various levels of football. This article explains how Google's PageRank can be used as a rating method for domestic, continental and world-wide football. The procedure comes with the desired feature of adjusting for the strength of schedule. That is, winning against stronger teams is more beneficial than winning against weaker teams. We use this method to rank several European domestic leagues, find the strongest teams in Europe and also the strongest team in the world, all for the Season 2016/17. Be prepared for a surprise in the English Premier League!

31.07.2017

The PageRank algorithm was developed by Sergey Brin and Larry Page during the mid 90s. When they later founded Google, PageRank became the key ingredient for their search engine for how to rank the most relevant results of user queries. The key idea of PageRank are described in the following.

To determine a ranking of webpages, PageRank essentially counts the number of links leading to specific webpages. The more links point to a page, the higher the assumed quality of the page. However, not all links are treated as equal. Links from pages that receive a lot of links weigh more than pages with few links. That implies that the rank of a webpage depends on the rank of pages that link to it, which also depend on the rank of pages that link to them, and so on. Although this seems like a never ending iterative process, it is in fact very easy to obtain PageRank scores for webpages with some linear algebraic trickery.

But what does ranking webpages have to do with football? Substituting webpages with teams and "links to" with "lost against" in the above description, we actually get a ranking method of football teams that comes with a nice additional feature. Namely a ranking that adjusts for the strength of schedule. A team winning against a lot of "low quality" teams is considered to be itself of a lower quality than a team that might have won fewer games, but against higher quality teams.

To calculate the PageRank of football teams, it is best to think of a football competition as a directed graph. Nodes represent teams and links the games played among teams, where links always point from the loser to the winner. If a team lost several times against the same team, we can either add several links or attach a weight attribute to the link. A special case are tied games. Here we add "half" an edge in both directions.

The below chart shows the PageRank scores of the English Premier League in the last seasons.

So according to PageRank, Liverpool was the strongest team last season and not Chelsea. This is very surprising, given the fact that Liverpool finished fourth. How is this possible? Before we give an explanation for this, check out the chart below which shows the differences in ranks between the "3-1-0" point system and PageRank.

Liverpool had a fairly good run against all top clubs last year. They did not lose a single game against the top three (one win, one tie each). This strong record weighs heavily in the calculation of PageRank, since (obviously) Chelsea, Tottenham and Manchester City themselves had a very strong record last season. However, Liverpool lost a fair amount of points against supposedly weaker teams (For example only a tie and a loss against Bournemouth). While this cost the Premier League title, it did not weigh that much for PageRank, such that (technically!) Liverpool was the best team in last years Premier League season!

The inverse effect can be observed for Stoke City and Southampton. Both had horrendous records against the top 10 clubs. Stoke City only won one match against a top ten club, which coincidentally was against Southampton. The only victory of Southampton against a higher ranked club was the 1:0 against Everton.

If you are interested in other leagues besides the Premier League, check out the selection below, all showing the PageRank scores of the last season.

The differences between the "3-1-0" point system and the page rank for the selected leagues are shown below.

Of course we are not restricted on using PageRank just for domestic leagues. In fact, it is way more interesting to see how PageRank performs in ranking teams of a whole confederation. The below charts shows the Top 20 in Europe, based on the results of all domestic leagues, the UEFA Champions League and the UEFA Europa League during the Season 2016/17.

The ranking looks fairly reasonable. Real Madrid, last years Champions League winner, is certainly a worthy top ranked club. Also FC Barcelona as Spanish champion is an acceptable second place. One could maybe argue that Juventus, Italian's champion and Champions League finalist, should be ranked third instead of Atlético Madrid (or even second!). The best English team is Manchester United (9th) and the Premier League champion Chelsea is only ranked 17th. This is easily explained by the fact that Chelsea did neither participate in the Champions League nor the Europa League. Manchester United, on the other hand, won the Europa League.

What works on continental level, also works on a global scale! Using all games played between first of July 2016 and first of July 2017 in 206 domestic leagues and 19 continental tournaments, we can determine the (according to PageRank of course!) best club in the world of said time period.

You may notice that this ranking is almost the same as the European one. Only Jeonbuk FC (South Korea, 14th), Uanl Tigres (Mexico, 17th) and Al-Quwa Al-Jawiya (Iraq, 18th) manage to break into the global Top 20. You can explore the complete global ranking in the table below. The PageRank scores are rescaled by dividing all scores with the maximum value.

PageRank is a very useful method to rank teams since it takes the strength of schedule into account. While in domestic leagues all victories weigh the same, PageRank adjusts for winning against stronger or weaker teams. It is applicable without any adjustments to domestic leagues, confederations and even on a global level. A huge benefit is also that the ranking is parameter-free. Well, technically there is one, but the so called "damping factor" has mostly mathematical significance. For the interested reader: It is set to 0.99 for all calculations. Parameter-free rankings are far more objective than methods that rely on tweaking parameters (like the method we currently use for our main world ranking!). There is no need to justify certain choices of weighting (Like "Why are games in Europe equally important as games in Oceania?"), since this is all done implicitly during the calculation. Winning against, for instance, the UEFA Champion League winner has a far higher impact than winning against a team on Aruba (no offence!).