Fantasy Football Rankings & Statistical Analysis Part 2

Hi again, this is my second and final part of the blog on fantasy football. This part goes into statistical analysis and thus slightly more advanced techniques than the rankers did in part 1. However, I think this is more interesting and not something that you’ll see in every fantasy football blog post and thus will offer something unique.

On a high level, the analysis is simply looking at various variables and seeing how they affect fantasy points. This way you can know what factors to look at when making your starting and sitting decisions. The analysis is two-fold.

One type is R Squared or Coefficient of Determination analysis. How much of a variation in one variable is explained by the variation in another? For more information on this, the following link explains this well: R Squared.

In addition, I ran linear regression models to see how one (independent/predictor) variable relates to another (dependent/response) variable. The models use an independent variable to see whether and how much it affects the dependent variable. One part of the output of the model is the coefficient/slope of the regression. The coefficient represents how much a change in one unit of the independent variable results in a change in the dependent variable. For example, if the independent variable is yards gained per game on offense and the dependent variable is points scored on offense, if the coefficient is 0.5 that means that for every yard gain it’s predicted that a team will score an extra 0.5 points on offense. (Please note this was just an example with made up numbers and not an actual regression I ran). In addition, the data provides information on whether the slope is significant and thus depicts whether a relationship exists. Usually, and in the case of this analysis 0.05 p-value (prob.) is the standard used for significance, meaning that if the p-value is less than 0.05 the coefficient for the independent variable is significant. The final major part of a regression model is the constant (C), which represents the dependent variable’s value when the independent variable equals zero (think of the y-intercept of a line). For more information on regression analysis, including coefficients, constants, predictor/response variables, and p-values please refer to this site: Regression Analysis.

An interesting case that we will see later on in this post is when R Squared is low but the variable is significant. That means that the model has limited predictive power but does show a relationship. This site explains this phenomenon: Low R Squared but Significant. If a model does have predictive power, predictions can be made only within the data range and are made by the following formula (Coefficient*Independent Variable Data Point+Constant).

We start off our analysis with two types of weather conditions that are easily quantifiable for such analysis. The first one being temperature.

stats_temperature

Three points about the data in the above chart (scatter plot). 1) The data includes only standard games from QBs (>10 pass attempts), 2) The data now is not restricted to only above average quarterbacks like it was in the first part, and 3) the x-axis is every unique temperature that games were played in in the last 4 seasons and the y-axis is the average fantasy points for each unique temperature. Similar parameters will be used in the data going forward.

In regards to the above statistical analysis, the data shows that temperature is not a good predictor of fantasy points. The R Squared is a very small 0.0016 and the coefficient of the temperature variable has a significantly higher p-value than 0.05 and thus is not significant. Originally I thought that if there would be no significance it would be because the temperature doesn’t linearly improve or get worse. While, freezing temperature becoming warm should improve fantasy points performance, very hot temperature may once again cause a dip in performance. However, this scatter plot doesn’t seem to confirm this theory. If it did we’d see an upside-down U (aka parabolic) shape. One interesting thing to note is a few outlier average fantasy points for some of the freezing weather, which is due to a low sample size of games played in such weather. It may seem surprising that this phenomenon isn’t seen on the opposite end of the spectrum where there are also fewer games played, but the samples are higher on the higher end of the spectrum.

Next, we take a look at the other easily quantifiable weather condition, wind:

stats_wind

Here’s a plot for wind in miles per hour vs. average fantasy points associated with the force of the wind. At about 0.18 the R Squared isn’t as large as one would hope but it’s not completely insignificant. About 18% of the variation in fantasy points can be explained in the variation of wind. The regression shows that wind is, in fact, significant, although just barely at about 0.04 p-value. From the regression, we see that there’s an inverse relationship between wind and fantasy points. This seems rather intuitive as simple physics suggest that high winds hurt the passing game, however, in my last blog post, I showed that when looking at average to above average fantasy performers the average fantasy point drop going from low wind to high wind were remarkably low (20.7 to 20.3). The question thus becomes is that enough to be statistically significant or do the below average QBs, that weren’t included in the previous analysis but were included here experience much bigger drops.

At 0.18 R Squared, I would be wary of doing fantasy point predictions. In other words, if the forecast calls for 10 MPH winds, I wouldn’t place bets that my team’s QB’s fantasy points would be 17 points (~10*-15.2401+18.84745). However, I would be a bit more worried if there’s a high wind rather than a low wind.

Next, we take a look at pass D from different angles:

stats_def-pass-yards

stats_def-passer-rating

The above analysis takes a look at how pass defense relates to fantasy points. The top analysis relates to passing yards per game and the bottom relates to passer rating per game. Once again, as mentioned in my previous blog post passer rating includes the following: completion percentage, yards per attempt, touchdowns per attempt, and interceptions per attempt.

This is a simple analysis that confirms what we logically believe. The worse the pass defense is (allows more passing yards per game or a higher passer rating against) the better the fantasy points by the QB will be.

The R Squared is in the mid 0.60s for both. Not only is that high they’re both nearly identical, which is an interesting insight. In regards to the regression, both p-values are 0. The coefficient for defensive pass yards allowed per game is about 0.09, and thus with each extra yard, the opposing defense allows through the air, your QB projects to an extra 0.09 fantasy points. In regards to passer rating, the coefficient suggests that each additional point in passer rating allowed by the opposing defense projects an additional 0.26 fantasy points. All in all, neither of these bits of analysis are surprising, they serve more as confirmation. However, despite that fact, the R Squared being sizable may allow for better predictions than most of the other variables we look at.

Far more interesting is opposing run defense and its relationship to fantasy points:

stats_def-run-yards

This is an interesting case. The R squared is relatively small at around 0.08. However, the regression tells a different story. The coefficient has a p-value of 0.002, so there’s a strongly significant relationship. As mentioned earlier in this post, this may mean that it’s hard to make actual predictions of what the fantasy points would be based on the run defense that a team is facing. However, the overall relationship is strong. The direction of the coefficient (positive) was actually the opposite of what I expected, as I thought that a poor run defense meant more rushing attempts and thus fewer passing attempts. That obviously doesn’t seem to be the case. It’s possible that poor run D may be indicative of poor play against mobile quarterbacks. However, obviously, there are other factors involved in a poor run D. In addition, this insight can likely be attributed to an improved running game increasing time of possession and thus allowing for more pass attempts since the offense ends up staying on the field.

If this theory is correct I worried that perhaps the entire significance of the run defense is due to its effect on pass defense. If that’s the case then maybe there’s no use in using run defense as a predictor at all since we have pass defense stats. Therefore I decided to run a multiple regression model with two independent variables, defensive rush yards per game and defensive pass yards per game, and for the dependent variable, average fantasy points. If the run defense would become insignificant then we’d know that its significance was due to its effect on pass defense.

stats_rush-and-pass-d

The results are encouraging for people that are interested in non-obvious insights. Run defense is highly significant with a p-value of 0.0006 and thus confirms a solid relationship with fantasy points. However, since the R Squared is so much higher for pass defense, I would still look at the pass defense first and if I would have a choice between playing two similar caliber quarterbacks playing similar caliber pass defense, I might use the run defense as a tiebreaker.

Finally, we take a look at the opposing quarterback and try to see whether we could derive fantasy points projections from our fantasy quarterback by looking at the guy behind center on the team he plays against. One analysis is done by looking at the fantasy points of a quarterback in a given season and one analysis looks at the QB over the 4 seasons of data that I have used.

stats_opp-qb-fp-given-season stats_opp-qb-fp-total

Originally, my hypothesis was that the better the opposing QB is, the more a given team’s QB is forced to keep up and throw the ball. However, the analysis above contradicts this hypothesis. The R Squared is very low for both and the regression shows no significance for either independent variable. An interesting note is that the R Squared is higher and the p-value is lower when looking at a QB over the 4 seasons rather than in the given season. To me, it makes more logical sense that the R Squared and p-value would be flipped since a given season is more relevant to the QB in the season the game is being played. However, this could be due to the larger sample size of 4-year average or just random variation. In any event, neither analysis meets the standard for significance.

Finally, I took a look at fantasy points in a given game by the opposing QB to test the above hypothesis one final time. I looked at the opposing QB’s actual fantasy points in each game:

stats_opp-qb-fp-per-game

This bit of analysis seems to hint at the fact that my theory is true. The R Squared isn’t huge but is not insignificant at around 0.22. However, the variable has a strongly significant coefficient (p value=0.0001). So there appears to be a relationship.

The question becomes what insight can we gain from this that is useful? The opposing QB’s fantasy points compiled in a given game is not a useful metric as that information only becomes available after the game is over. To solve this problem I pondered the reason for the gap between actual points by an opposing QB compiled in a game and the average in a given season. One obvious answer is one’s a 16 game average and the other is an actual number which is much more precise and relevant. However, that’s not good enough for people trying to make fantasy decisions. Therefore, I decided to test the (own) QB’s defense to see if perhaps a better predictor of actual points scored by the opposing QB is not his average fantasy points scored in a given year or over the last 4 years, but the defense he faces. To test this, I used what I called “same team’s pass yards/gm defense” or the defense of your QB rather than the opponent’s as an independent variable and once again I used average fantasy points as the dependent variable. (Please note that I tried using passer rating defense in the analysis but it had no significance).

stats_same-team-def-pass-yards

Here the R Squared isn’t very large again at around 0.10. However, the variable is highly significant (p value=0.0083). This is good news because the quality of a defense is something that can be predicted even before a season starts and thus lends itself to interesting analysis. It may make sense to draft a good QB with a defense that projects to be bad (example: Drew Brees lately) rather than a good QB who will play with a defense that projects to be good.

To make sure this insight was valid, I took a closer look at the data. The question I asked myself was whether the QB’s fantasy points are a result of a poor defense where he has to try to keep up with the opposing QB’s offense or whether his defense is poor because the opposing QB is trying to keep up with him. The former has more predictive power. I ran a regression (see below) against same team’s defense in terms of points per game allowed. To my surprise, the R-square was only ~0.01 and the p-value wasn’t significant (~0.33). Additionally, the coefficient was negative. This may hint at the fact that the QB’s defense is poor due to his opponents getting a lot of passing yards but not necessarily scoring. Therefore, it may, in fact, mean that opposing QBs try to keep up with the QB in question and not the other way around. That said, I wouldn’t throw the baby out with the bathwater and ignore this finding completely. There may be various factors that I haven’t considered. Therefore, I still believe my recommendation above, to use this information as a tiebreaker in deciding between similarly ranked QBs, is valid.

Same Team PPG Def

And that’s it for this blog post and for my fantasy football analysis. I will update this blog with new insights that I discover about sports, as I discover them.

Once again, the websites I used as data sources are:

QB Stats

Defensive Stats

Weather Information