This is a followup on my last blog article. which covered inferring the team ratings based on the match market odds.
We can apply this to the Premier League. There is going to be no scraping and nearly no programming, so should be easy to follow.
It is all based on google sheets which is freely available to all.
Final ratings/match projections here.
1. Get the data
So maybe technically this is scraping but it is very straight forward to populate some raw data into a googlesheet. Just type the below into the first cell.
=IMPORTDATA("https://www.football-data.co.uk/mmz4281/2223/E0.csv")
And voila all the data is populated.
2. Narrow the data
Okay so we dont want to see everything. So leave the raw data as it is and creating a 2nd tab (Results), to only bring in the date, time, teams and pinnacle odds.
I've tried to be clever with this, maybe you think its confusing or redundant. I think it's convenient as I now can pick which data to bring in by changing the heading. Want to switch to 365 odds? Then easy. Want to bring in the handicap odds, easy too, just add the columns you want.
Type the formula into the first cell in the second row and drag the formula down and across. The first row just type in the fields from the raw data that you want to narrow down to.
=index('fb-data.co.uk'!2:2,match(A$1,'fb-data.co.uk'!$1:$1,0))
Add some formatting and now we have all the data and just the data we want to use.
3. Remove the margin
So our data we are focusing on is the H-D-A odds. The bookie will have a margin here. I will remove this using the power method so that we get our 'true' probabilities. True in the sense they sum to 100% and also in the sense that Pinaccle will be more accurate than my (and probably your) opinion.
To use the power method we need to solve for k so that all the probabilities (in %) raised to the power k add to 100%. This will apply a fav-longshot bias which we know is a characteristic of the markets .
To do this in google sheets you do need to do some programming (in Excel you could use solver). But I have already done it and you can just use the same custom function. (Copy and Paste the below code into the Appscript, under the menu Extensions->App Script).
function solve_power(probs)
{
var iterations = 20 ;
var t = 1 ;
var max = 2 ;
var min = 0.5 ;
var k = 1 ;
for ( var i = 0; i < iterations; i ++ )
{
k = (min+max)/2 ;
var sum = 0 ;
for ( var j = 0; j < probs[0].length ; j ++ )
{
sum += Math.pow(1/+probs[0][j],k) ;
}
if ( sum <= t )
{
max = k ;
}
else
{
min = k ;
}
}
return k ;
}
Now we can simply use in a formula and pass in the array just as any other built in spreadsheet formula. It will populate the k we need to raise the probabilities by to remove the margin.
Type and drag into Column K =solve_power(H2:J2)
For example below, to remove the margin from pinaccle's HDA odds for the Crystal Palave vs Aresnal match using the power method we have the below:
And we can fill in the 'true' probabilities by raising the cells with the Home, Draw and Away odds to this k.
Drag down for the rest of the matches.
4. Infer the Ratings
So in exactly the way the previous blog post describes. Infer the market ratings of each team based on the HDA true odds.
The formula we use in column O is =1/L2+1/(2*M2)
The formula we use in column P is =(1/L2-power(O2,2))/(O2-power(O2,2))
The key formula we use in column Q which will give the implied ratings difference is =-400*log((1-O2)/O2,10)
The logic of these formulae are found in the previous linked blog.
5. Look up most up to date ratings for home and away teams
So now we are going to go down row by row and compare our current ratings for each team with the market implied ratings. To do this we need to look back to the last match each team played and copy down their rating. At the beginning we wont have a previous match, so we will just default to 1000.
For the home team, look for the row the last home match and last away match. We will only use the latest, but this is the first step.
=row() - MATCH(D2, sort($D$1:D1,row($D$1:$D1),0), 0)
=row() - MATCH(D2, sort($E$1:$E1,row($E$1:$E1),0), 0)
And the same for the away team.
=row() - MATCH(E2, sort($D$1:$D1,row($D$1:$D1),0), 0)
=row() - MATCH(E2, sort($E$1:$E1,row($E$1:$E1),0), 0)
This completes column R, S, T and U.
So you can see the formula worked. For line 23 for example for the home team Tottenham it has returned 7 and 20 in columns R and S. Indeed these rows do correspond to Tottenhams previous home and away matches.
For the rows at the top where there hasnt been a previous match yet, they just populate with some error value.
So we will look back at the biggest of these two rows and pull in the closing line rating.
This complete Column V (for home) and W (equivalent but for away team).
=iferror(if(iferror(R2,0)>iferror(S2,0),index(Z:Z,R2), index(AA:AA,S2)),1000)
This is where we handle the error as well. Where a previous fixture was not found then we just set to 1000.
6. Calculate and compare the difference, adjust the ratings.
So we can quickly add something for Home Field Advantage (HFA) and find the current estimate of the ratings difference.
Column X is simply
=V2+AB2-W2
So we go back and compare this to the market inferred ratings difference, and adjust each team by the same amount necessary to match with the market. The market is King.
How much our current rating is wrong is in Column Y =Q2-X2
So add half this to the current home rating to get the new home rating in Column Z =V2+Y2/2
And subtract half this to the current away rating to get the new away rating in Column AA =W2-Y2/2
This brings our current rating up to date based on all the information in the pinaccle closing line. This is our new best guess for the team's rating and this will be used as a starting point for the teams next match when looking back for the ratings.
7. Home Field Advantage.
We have accounted for HFA above in column AB, but how? It is just a quick estimate from the average of previous matches. I have given it a bit of a headstart with 200 times 50 just because I know it will fall around this, but eventually with enough data sampled the data will weigh heavier on the calculation. This is just quick and simple. Different teams could have different HFA (they do). It can change over time (it has been). And it can almost disappear in a pandemic (it did almost). For now just to keep things going we are using this quick dirty formula described.
=(50*200 + sum(Q$1:Q1))/(200+count((Q$1:Q1)))
8. Thats it!
Make sure you drag your formula down all the relevant rows. For each match the googlesheet will calculate each teams new rating. The most recent rating will be our best market derived guess at the teams underlying rating.
I've added another sheet to copy out and sort the ratings. Its a single formula in the first cell. It pulls the date, home team and new home ratings unioned with the date away team and new away ratings, and sorts by the date descending.
=SORT({Results!B2:B,Results!D2:D,Results!Z2:Z;Results!B2:B,Results!E2:E,Results!AA2:AA},1, FALSE)
9. The Results
Finally I created a summary of the ratings.
Create a new sheet called Team Ratings.
On the first row just add column headers as you wish. Then in the second row first cell get the list of all the teams =UNIQUE({Results!D2:D; Results!E2:E})
Then in Column B filter the ratings for each team in Column A and transpose it so the results are populated along the row, rather than down the column. The Team Ratings History has been sorted in descending order, so the first value in this list will be the latest most up to date market based rating.
=transpose(filter('Team Ratings History'!C:C,'Team Ratings History'!B:B=A2))
It has done a good job, rating Man City highest with 1285 points, Liverpool next (just about) with 1161 rating points and Arsenal just behind with 1144.
You can have a scan of the ratings week to week.
There you can see the market inferred ratings reacting, for example when Haaland was injured, Man City's rating dropped from 1322 (3 weeks back when he was fully fit) to 1289 (vs Leicester closing line) and to 1285 (vs Fulham closing line, the latest).
You can see how the market reacted to Chelsea's 4-1 hammering against Brighton the previous week we had a rating of 1098 for Chelsea. The closing line vs Arsenal the latest we have then dropped them to 1094 rating. So in fact its not a huge down grade for Chelsea, I suppose taking into account that the xG score was a little closer than the 4-1 scoreline suggested.
It is interesting to see certain trends and reactions.
Just for a note on presenting the results, I cannot sort in the sheet "Team Ratings" because I have used special kind of Array Functions (Sort, Unique, Filter) in the Team Ratings sheet. So for final presentation create a new Pivot Table off this sheet.
Just dragging the Team and the Latest Rating, and you can order it now. The above is the ratings as of 2022-11-07.
10. So was it worth it?
What I will probably use the rating for is for long term outlook. There are markets for the next round of matches (and some times 2 rounds) on betfair and the sportsbook. These markets will always be King, and trump our ratings and estimates.
Above I have compared my ratings with clubelo ratings and with the betfair market. Highlighted green where my odds are closer to the market than club elo. I think the method described is genuinely more accurate than club elo (as I'd expect since it is derived from the market). It is quicker and easier to calculate in my opinion, we've come to a pretty accurate ratings after 13 matches for each team where as Elo is described as only being accurate after about 30 matches.
Important Note: The ratings are derived from the market closing line. Anything after the closing line does not show up in my ratings. That could be part of the reason that above I am 2.79 on Chelsea but the early market is 3.25. Maybe the 3.25 is big, the early market is not renowned for being too accurate (only about €8,000 matched on that match at time of writing). That might be part of it, but another big part of it is definetely because the ratings is only based on info up to closing line excludes any information received after. Newcastle scored 4 away versus Southampton and looked pretty good, meanwhile Chelsea lost at home. This happened after the closing line and will not be accounted for in my ratings.
The next step would be to have a sort of hybrid where we derive the ratings from the closing line, but then also make the usual Elo adjustment based on the result. We could then learn to make further adjustments based on player availability, etc.
Comments
Post a Comment