Believing the numbers: when do goals for and against stabilise?
At this point of the year it's important that we try and establish what numbers can be believed. In other words: how small is too small for a sample size? Consider, for example, that through six gameweeks last season:
We could go on and on. The point being, six games is not enough to write teams off or buy into short lived success. When then, is long enough? Using data from the past two seasons (ideally we'd use a much larger sample but I don't have it to hand and, honestly, I don't want to to the leg work to get it in the right form), we can plot a teams goals scored/conceded per game on a weekly cumulative basis against their final attacking/defensive record and then try and locate when the GPG rate becomes sufficiently predictive (I'm arbitrarily setting this at 75%). Let's look at the trends:
The first thing to note is that defensive goals per game seem to stabilise at a similar rate whether at home or away, gaining 80% correlation, somewhere around gameweek 12. That said, even at the stage we're currently at this year, it seems we have a decent set of data to predict year end results. That suggests that there is a pretty good chance that the likes of West Brom and West Ham are for real, while formerly great defenses like Liverpool and City might continue to have struggles. Of course, as noted in the introduction, some of these trends will reverse as, if the correlation holds, they're only ~65% predictive, but even so, we're already at a point where one can't just simply say that City are great and West Ham are terrible simply because "that's the way it is".
The second point to note is that for goals scored, the results at home seem to be much more reliable than those on the road. Indeed, we see close to 80% correlation after just seven gameweeks at home while away data is all over the place, only settling down into the second half of the year. Without looking deeper into the numbers I'm not really sure why this is, other than the obvious fact that it's hard to score away from home so your denominator is smaller when calculating GPG. Even one goal swings in that scenario will therefore have a large impact on this analysis. As a side note, we also observe that the r-squared of the home scoring data stabilises at 70% as early as gameweek 10, which suggests that not only is this data predictive of the overall picture, but it also applied to a decent proportion of teams. This suggests that teams like Fulham, Southampton and West Brom could, and possibly should, continue to have success based on their performances to date.
So what does this mean? For the next week or so, not too much. I'm still not ready to give up on City's defense and I'm still not willing to believe that Fulham are an elite attacking side at the Cottage. We should however be looking out for the below (approximate!) dates, at which we can assess where team's stand and, with some certainty, will finish up:
- Aston Villa ranked 10th in goals scored at home and 3rd on the road. They wound up being the 2nd worst attacking side in the league, managing to notch just one goal more than Stoke.
- Tottenham placed 19th in home defense, shipping an average of 2.5 goals per game. They finished the year as one of the league's best defenses, ranking 4th at home and 5th on the road.
- QPR conceded just one goal in their first three away games. They went on to concede 40 more in the next 16, at an alarming rate of 2.5 a game, enough to rank behind everyone outside of lowly Blackburn.
We could go on and on. The point being, six games is not enough to write teams off or buy into short lived success. When then, is long enough? Using data from the past two seasons (ideally we'd use a much larger sample but I don't have it to hand and, honestly, I don't want to to the leg work to get it in the right form), we can plot a teams goals scored/conceded per game on a weekly cumulative basis against their final attacking/defensive record and then try and locate when the GPG rate becomes sufficiently predictive (I'm arbitrarily setting this at 75%). Let's look at the trends:
The first thing to note is that defensive goals per game seem to stabilise at a similar rate whether at home or away, gaining 80% correlation, somewhere around gameweek 12. That said, even at the stage we're currently at this year, it seems we have a decent set of data to predict year end results. That suggests that there is a pretty good chance that the likes of West Brom and West Ham are for real, while formerly great defenses like Liverpool and City might continue to have struggles. Of course, as noted in the introduction, some of these trends will reverse as, if the correlation holds, they're only ~65% predictive, but even so, we're already at a point where one can't just simply say that City are great and West Ham are terrible simply because "that's the way it is".
The second point to note is that for goals scored, the results at home seem to be much more reliable than those on the road. Indeed, we see close to 80% correlation after just seven gameweeks at home while away data is all over the place, only settling down into the second half of the year. Without looking deeper into the numbers I'm not really sure why this is, other than the obvious fact that it's hard to score away from home so your denominator is smaller when calculating GPG. Even one goal swings in that scenario will therefore have a large impact on this analysis. As a side note, we also observe that the r-squared of the home scoring data stabilises at 70% as early as gameweek 10, which suggests that not only is this data predictive of the overall picture, but it also applied to a decent proportion of teams. This suggests that teams like Fulham, Southampton and West Brom could, and possibly should, continue to have success based on their performances to date.
So what does this mean? For the next week or so, not too much. I'm still not ready to give up on City's defense and I'm still not willing to believe that Fulham are an elite attacking side at the Cottage. We should however be looking out for the below (approximate!) dates, at which we can assess where team's stand and, with some certainty, will finish up:
- Home goals scored: Gameweek7
- Away goals scored: Gameweek 21
- Home goals conceded: Gameweek 13
- Away goals conceded: Gameweek 12
Try and keep these dates in mind before you buy you third Everton midfielder/forward or you build your defense around a pair of West Ham defenders.
Comments
If you calculate the average shots that an offensive team makes within the 18 yard box.
Then average it and calculate across the fixtures that have past. For example, Man city:
SOT liv QPR
6.9 10.2 6.5
stk ARS ful
6.6 8.8 8.2 Ave. 7.9
Then compare that with the actual average shots conceded within the 18 yard box over the first 6 fixtures (4.8)
You have a legit reason to believe the Man City defence is comparatively better than the average.
Since they have conceded 3.1 less shots in the box each game than the average opponent (This measure has excluded H/A because of the small sample size).
This post is more about why you shouldn't look at goal stats yet, but could feel more comfortable doing so after X weeks, depending on what you're looking at. Shot data is useful for sure but as with everything there are exceptions (Liverpool last year) so I'm just saying we can rely on goals scored/conceded a bit after a while.
I think there is a tendency in the stat community in general to overlook the obvious so this was a bit of a counter argument to the proposition that historical goals scored/conceded is totally useless. Not rocket science, agreed, but worth a quick post I thought.
Plus, you know, pretty graphs! :)
Look at GW25 ... by that week, all four metrics have reached 90% of their predictive value. You could theoretically use that point in time as the week to use your wildcard. It would still give you about a third of the season left to score points, and you'd have a very good feeling for which teams should perform well over the remaining games.
Of course by that time, a lot of the key squad members will be set. But perhaps you could round out your team better and gain something like 5-10% per week over others in your league.
Just a thought.
Let's say the 15% loss in correlation is in direct proportion to points. That would mean that starting gameweek 28, in each gameweek then onwards till the end of the season, I would have 15% lesser points than someone else who goes with the 90% stats. But before that, other stats drop to an average low of 60%, starting gameweek 13(which is again the average of 7,21,13,12).So I gain a 15% higher correlation for 15 gameweeks and lose 15% correlation for 10 gameweeks. Really good choice, that 70-80% range. Love it.
Actually I have long hoped for an analysis of wildcard strategies. Last year I bet on Chelsea early; finding that a disaster, I used my WC around GW3 or 4 to replace my Chelsea-heavy lineup with RVP and a few other options. I was able to come back to place in the top 3 in my league, but it took a year's effort and I never really challenged for the lead.
This year I am in first, and have not yet used my WC.
Step 1) Get a table of the average quantity of shots each PL team has made within the 18 yard box
2) Insert these quantities into each fixture that has been played (i.e. the Man City example) and then average across all fixtures
3) Compare this value with the average quantity of shots 'conceded' in the 18 yard box for the specific team, in this example Man City
So my step 2 answer was 7.9, my step 3 answer was 4.8. Based on the assumption that offensive performances are the same each week (bad assumption but useful here) the Man City defence has conceded 3.1 shots less per game than we would have expected them too.
Also as it turns out so far they have conceded 1.6 less shots in the 18Y per game than last season
If you are aiming to win the overall game, you need to be going for 70-80% correlation, at the latest.
If, however, you aim to win your not-so-competitive mini-league a better strategy could to be aiming for a 90% correlation wildcard.
There's always the question of rising prices though and the likely inability to get the optimal team in around that time.
Those damn prices are rising so fast. I feel that there must have been a big chance to the algorithm and it will have a bigger impact on our season than we realize at the moment.
Very interesting indeed!
By game week 12/13, we can use this seasons stats to predict how teams will do going forward with a *reasonable* amount of certainty? (bar away goals)
I was considering my WC for around week 11/12/13 as we have one in December and I never saw waiting for a DGW to be that attractive (they can be tricky).
So, dumb question ahoy, if I wanted to get ahead of the pack GW 11/12 would be the time? Can always use free transfers to sort out the inevitable mistakes, but by that time I think you can be fairly confident in which of the big hitters will perform and how the other sides may fare long term.
Or at least last year, it seemed by mid November everyone was catching onto the trends.
You would also, I believe, need to take into account likely regression and the the underlying performance stats (again as per @ChemiKills). In those first 6 games Villa scored 7 goals but this was from just 15 shots on target, i.e an unsustainable 50% conversion rate. Also, Villa's opposition in the first 6 GW conceded 43 goals from 130 SoT, a collective average of Goals Against of 1.2 and 3.7 SoT. Villa averaged against these teams 1.4 goals and a very poor 2.5 SoT / game. Considering average SoT per game over tlast season was 5 for home sides and 4 for the away team Villa's avg. of 2.5 in those first games was actually a strong indicator of how very poor they would be throughout the season. Hope this makes sense,
E.g. are more people playing some kind of price game, trying to finish up with the most valuable team?
Or is a site like Fantasy Football Scout, which I believe is by far the most popular one on the subject, creating massive bandwagons?
The FFS league only has around 13k members so I don't think they could move the markets that much.
I remember in GW1 players like Aguero, Dzeko etc all went from like 2% ownership to 30%+ while not experience any price rises as the prices were fixed for GW2. Players were able to get rid of all their GW1 mistakes and bring in the form players without triggering any price rises which calmed the market alot. Right now everyone is still scrambling to cut the dead weight and get the inform players.
The point of the piece was really to say that we shouldn't be giving up on City defenders or falling over ourselves to buy Fulham forwards purely based on the fact that they've done well to date.
By GW7 for 3/4 metrics we're seeing only 40-65% correlation which would leave somewhere between 8-12 teams who won't follow this trend. There are a number of ways we could try and weed these out, such as looking at shots on goal, strength of schedule etc but that wasn't my intention here.
My point was supposed to be as we approach GW10 and beyond, we do need to start considering that perhaps great attacking units/defenses of old can't be relied on based on name alone. That was really my only point and I appreciate it's not a deep one, hence the lack of write up.
I really like the idea of looking at shots on goal, particularly those inside the box which seem to correlate best to future goals, but that's a much much deeper level for another post. I have essentially done exactly what Chemikilis suggests in my goals forecast piece and I will soon add a new column which will show how many 'expected' goals a team "should" have scored or conceded based on their shot numbers. I also plan to add a similar idea for the captain picks.
I talked about the shot regression idea a couple of weeks back:
http://premierleaguefantasy.blogspot.ca/2012/09/judging-team-success-shots-inside-box.html
Thanks for the comments guys, it's getting really fun to write here and get everyone's feedback. On that note, @shots_on_target is launching a new forum to discuss this kind of thing which could be really cool. More on that to come soon I hope.
Great analysis but I do agree with what others (specifically chemikills and shotsontarget) have said. There are two fundamental problems in this piece:
1. Goals are simply too infrequent to be reliable for much of anything statistically at the 5, 6, or even 10 game week. There is some relationship certainly but it is week due to the lack of positive outcomes available. Thus we should probably use SOG or some other metric that highly correlates with goals scored but yet occurs with much more frequency.
2. Unless I a mistaken, there is no adjustment for opponent strength is this analysis. That is a major flaw given the limited sample sizes at play here. A team like Southampton has had a killer schedule (@ MCI, @ EVE, @ ARS, v MUN), so one would expect their numbers to be severely skewed this early in the season, while they will regress towards the "truth" as the schedule strength normalizes over the course of the season. We need a way to analyze these factors.
Regardless of shot totals you absolutely can rely on goal data history with some certainty once you get into the GW20+ period, and that makes sense as strength of schedule is no longer an issue and shot data will likely have levelled out.
To be clear, I am saying that using past GPG history is NOT viable for the short term. As I say, in stat community this isn't exactly a breakthrough, but udnerstand that a lot of people aren't using stats that a few on here and I often read that team X is in good form etc. This post was really meant to quash that claim quickly, nothing more.
Apologies if all that wasn't clear from the original piece, my mistake for writing it late at night and then not proof reading.
The good news is that I am very much on the same page as chemikills and shots_on_target with regards to shot data, and as I say, I have already baked some of that into my GPG weekly analysis which I will probably expand on next week.
That helps clarify things a bit.
Hope my early post didn't come off as too critical. I think maybe my thirst for a definitive predictive model is making me a bit cranky! :)
Not that it wasn´t a good piece or relevant, it's just that I think we, well, at least me, are getting used to being schooled by your articles.
You often make me think of the Fantasy game in new light, so people perhaps sought out a "deeper" meaning than you were trying to convey.
Would it be possible to do the same analysi for earlier seasons to see if the same GPG confidence is arrived at by certian gameweeks?
Great stuff Chris.