Deconstructing Rushing Yardage

For a nice weekend diversion, here’s another stab at pro football sabermetrics. I’m new at this, so if anyone’s interested in this subject, I’d welcome any feedback, particular with the statistical analysis.

I wanted to look at rushing yards which is an interesting data set. I’m amazed at the wide dispersion of rushing gains in football. Basically, it works likes this—when running the ball, the key is to break out for a big gain.

While this sounds obvious to anyone familiar with football, the numbers are pretty striking. Consider that roughly 92% of a team’s rushing yards come on just half its rushing plays. The other half accounts for just 8%.

I could go so far as to say that we should no longer look at a rushing as the some total of yardage, but rather as a question, did the runner break out or not. I also wanted to see if a particular break out point could be identified.

For my data, I went to FootballOutsiders.com and bought the play data sets for the 2005, 2006 and 2007 seasons (the final week of the 2005 regular season isn’t included). I then separated out the 40,000+ running plays.

Here are some stats: The average run is for 4.24 yards. The median run is for 3 yards and the mode is for 2 yards. Both numbers suggest that the histogram has a strong rightward tilt.

What I wanted to do was construct a running play as a series of odds. Think of it as a probability game. If you gain one yard, what’s the probability that you’ll gain at least one more. Let’s say you pass that threshold and now have a two-yard gain, what’s the probability that you’ll gain at least one more. As I expected the probability that you’ll gain one more yard increases the farther down the field you go. In other words, rushing gains accelerate.

Here’s a graph of what rushing gains look like. The chart shows: if you can at least X yards what are the odds you’ll get at least X+1.
image749.png

As you can see, as the running back breaks out, the yards are increasingly easier to gain. Gains beget more gains — that’s the key. (If you’ve notice a stock market equivalent, then you’ve probably read this blog before.)

Here are a few items I need to mention. There’s an unusual depression and spike at the 9- and 10-year mark. That’s probably due to NFL accounting which creates an unusually high number of 9-yard gains (plays that are just short of a first down). For my purposes, I’m side-stepping that issue since it’s not what I’m looking at nor does it seem to have an impact of the trend I’ve found.

You’ll also notice that the data become much more volatile as the run goes down the field. This is due to less data. For example, the outlier at the 76-yard mark (80%) is simply due to three runners being tackled there. Remarkably, Fred Taylor accounted for two of those runs! I should add that the graph refers to runs that are stopped, not touchdowns.

The hardest yard to gain is going from 3 yards to 4 yards. That’s just a 78.00% chance. But 4 to 5 is 78.11% so it’s not much better, and 5 to 6 is 78.52%. After that the yards are increasingly easier to gain. It would seem that the defense dominates 18 feet behind the line of scrimmage. After that, the field slowly turns in the runners favor. Runs of 7 or more yards make up about 20% of all runs but 60% of all yards gained.

Here’s a histogram of the rushing gains with a log scale. I had to exclude the extremes since the ones and zeros don’t work well with a log scale.
image750.png
It’s that long, fat tail that’s so, so important.

Here’s a spreadsheet of my data.

Posted by on December 14th, 2008 at 2:18 pm


The information in this blog post represents my own opinions and does not contain a recommendation for any particular security or investment. I or my affiliates may hold positions or other interests in securities mentioned in the Blog, please see my Disclaimer page for my full disclaimer.