NFL Data Analysis¶
Names¶
Alexander Scheibe, Addison Atkin, Seth Jordan, Henry Severson, Melanie Yadgar
Introduction¶
For our project we will be designing models to make predictions based on historical NFL game-by-game data. Our data consists of a wide variety of features ranging from football stats like passing yards, touchdowns, win rates, as well as other factors such as weather, player data and stadium data. For our analysis, we will not just look at game scores, but also these other various factors to see how and if it affects game outcomes. By using machine learning, we aim to transform raw data into meaningful predictions and uncover patterns that may not be immediately obvious. Ultimately, we hope that with our model we will be able to predict which team will win this upcoming Super Bowl, as well as learn more about Vegas betting odds accuracies in the world of football betting.
Dataset¶
Questions¶
- Player Performance Prediction: Can we predict a player's performance (e.g., passing yards, touchdowns) based on their historical data and game conditions?
- Betting Data Analysis: How accurately does betting data predict a team’s success on a given week? Does betting accuracy increase as the season progresses?
- Team Performance Analysis: How do environmental factors (e.g., weather, stadium) impact team performance (e.g. win/loss rate)?
Variables¶
Independent Variables:¶
Player Performance:
- Completions: Number of successfully completed passes by a player.
- Attempts: Number of attempted passes, regardless of whether they were completed.
- Passing Yards: Total yards gained through passing plays.
- Receiving Yards: Yards gained by a player who catches a pass.
- Rushing Yards: Yards gained by running with the football.
- Touchdowns: Number of times a player scores a touchdown.
Game Conditions:
- Weather (Temperature, Wind MPH, Humidity): Respective atmospheric conditions during the game.
- Weather Detail: Indicates whether stadium is indoor/outdoor
- Stadium: Stadium name.
Betting Data:
- Over/Under Line: A betting line predicting the total points scored by both teams combined.
- Team Favorite ID: The team expected to win a game as indicated by betting odds.
- Spread Favorite: Surrendered point spread for favorite team.
Dependent Variables:¶
Player Performance (as Dependent Variables):
- Passing Yards: Total yards gained through passing plays.
- Touchdowns: Number of times a player scores a touchdown.
Team Record:
- Wins: Total number of games won by a team.
- Home Score: Points scored by the home team in a game.
- Away Score: Points scored by the away team in a game.
Betting Data:
- Betting Accuracy: A measure of how accurately betting odds have predicted game outcomes in the past.
Methods¶
Player Performance Prediction (Q1):
- Linear regression for predicting continuous performance metrics such as yards gained.
- Soft-margin SVM for classifying players into distinct performance categories. May also be helpful for high dimensionality data which we anticipate.
- Descision trees for identifying key factors influencing player performance.
- Feature engineering for binning numerical performance data into categories.
Betting Data Analysis (Q2):
- Decision tree classification using favored team by brinigng in the multiple betting data points in order to figure out what's most useful.
- Logistic regression using the betting line for predicting the probability of a team's success.
Team Performance Analysis (Q3):
- Linear regression for analyzing how environmental factors quantitatively impact team performance (win/loss rate).
- Decision trees to understand the impact of categorical environmental factors (like type of stadium) on team performance.
- KNN to predict team performance based on similarity to historical games in similar environmental conditions.
import pandas as pd
pd.set_option('display.max_colwidth', None)
pd.set_option('display.max_columns', None)
nfl_data = pd.read_csv('nfl_data.csv')
nfl_data.head(10)
id | name | position | team | week | season | season_type | completions | attempts | passing_yards | passing_tds | interceptions | sacks | sack_yards | sack_fumbles | sack_fumbles_lost | passing_air_yards | passing_yards_after_catch | passing_first_downs | passing_2pt_conversions | carries | rushing_yards | rushing_tds | rushing_fumbles | rushing_fumbles_lost | rushing_first_downs | rushing_2pt_conversions | receptions | targets | receiving_yards | receiving_tds | receiving_fumbles | receiving_fumbles_lost | receiving_air_yards | receiving_yards_after_catch | receiving_first_downs | receiving_2pt_conversions | target_share | air_yards_share | fantasy_points | fantasy_points_ppr | total_yards | ypa | ypc | ypr | touches | count | comp_percentage | pass_td_percentage | int_percentage | rush_td_percentage | rec_td_percentage | total_tds | td_percentage | passer_rating | rookie_season | round | overall | ht | wt | forty | vertical | offense_snaps | teams_offense_snaps | snap_pct | years_played | Unnamed: 0 | schedule_date | schedule_season | schedule_week | schedule_playoff | team_home | score_home | score_away | team_away | team_favorite_id | spread_favorite | over_under_line | stadium | stadium_neutral | weather_temperature | weather_wind_mph | weather_humidity | weather_detail | team_home_abbreviation | team_away_abbreviation | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | A.J. Brown | WR | TEN | 1 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 4 | 100 | 0 | 0 | 0 | 54 | 56 | 2 | 0 | 0.166667 | 0.372414 | 10.0 | 13.0 | 100 | 0.0 | 0.0 | 33.33 | 3 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0 | 0.000 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 25.0 | 60.0 | 0.42 | 1.0 | 12414 | 9/8/19 | 2019 | 1 | False | Cleveland Browns | 13 | 43 | Tennessee Titans | CLE | -5.5 | 44.0 | FirstEnergy Stadium | False | NaN | NaN | NaN | NaN | CLE | TEN |
1 | 1 | A.J. Brown | WR | TEN | 2 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 5 | 25 | 0 | 0 | 0 | 25 | 8 | 1 | 0 | 0.192308 | 0.168919 | 2.5 | 5.5 | 25 | 0.0 | 0.0 | 8.33 | 3 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0 | 0.000 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 27.0 | 59.0 | 0.46 | 1.0 | 12440 | 9/15/19 | 2019 | 2 | False | Tennessee Titans | 17 | 19 | Indianapolis Colts | TEN | -3.5 | 43.5 | Nissan Stadium | False | NaN | NaN | NaN | NaN | TEN | IND |
2 | 1 | A.J. Brown | WR | TEN | 3 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 5 | 4 | 0 | 0 | 0 | 69 | -2 | 0 | 0 | 0.131579 | 0.163507 | 0.4 | 1.4 | 4 | 0.0 | 0.0 | 4.00 | 1 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0 | 0.000 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 39.0 | 80.0 | 0.49 | 1.0 | 12443 | 9/19/19 | 2019 | 3 | False | Jacksonville Jaguars | 20 | 7 | Tennessee Titans | TEN | -2.0 | 38.0 | TIAA Bank Field | False | NaN | NaN | NaN | NaN | JAX | TEN |
3 | 1 | A.J. Brown | WR | TEN | 4 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 3 | 94 | 2 | 0 | 0 | 43 | 51 | 3 | 0 | 0.136364 | 0.330769 | 21.4 | 24.4 | 94 | 0.0 | 0.0 | 31.33 | 3 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.667 | 2 | 0.667 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 26.0 | 62.0 | 0.42 | 1.0 | 12461 | 9/29/19 | 2019 | 4 | False | Atlanta Falcons | 10 | 24 | Tennessee Titans | ATL | -3.5 | 46.5 | Mercedes-Benz Stadium | False | 72.0 | 0.0 | NaN | indoor | ATL | TEN |
4 | 1 | A.J. Brown | WR | TEN | 5 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | 27 | 0 | 0 | 0 | 25 | 2 | 1 | 0 | 0.095238 | 0.242718 | 2.7 | 4.7 | 27 | 0.0 | 0.0 | 13.50 | 2 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0 | 0.000 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 37.0 | 58.0 | 0.64 | 1.0 | 12486 | 10/6/19 | 2019 | 5 | False | Tennessee Titans | 7 | 14 | Buffalo Bills | TEN | -3.0 | 39.5 | Nissan Stadium | False | NaN | NaN | NaN | NaN | TEN | BUF |
5 | 1 | A.J. Brown | WR | TEN | 6 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 4 | 23 | 0 | 1 | 0 | 39 | 5 | 2 | 0 | 0.125000 | 0.138298 | 2.3 | 4.3 | 23 | 0.0 | 0.0 | 11.50 | 2 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0 | 0.000 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 38.0 | 66.0 | 0.58 | 1.0 | 12493 | 10/13/19 | 2019 | 6 | False | Denver Broncos | 16 | 0 | Tennessee Titans | DEN | -1.5 | 41.0 | Sports Authority Field at Mile High | False | NaN | NaN | NaN | NaN | DEN | TEN |
6 | 1 | A.J. Brown | WR | TEN | 7 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | -2 | 0 | 0 | 0 | 0 | 0 | 6 | 8 | 64 | 0 | 0 | 0 | 56 | 17 | 4 | 0 | 0.275862 | 0.284264 | 6.2 | 12.2 | 62 | 0.0 | -2.0 | 10.67 | 7 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0 | 0.000 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 39.0 | 62.0 | 0.63 | 1.0 | 12514 | 10/20/19 | 2019 | 7 | False | Tennessee Titans | 23 | 20 | Los Angeles Chargers | TEN | -2.5 | 42.5 | Nissan Stadium | False | NaN | NaN | NaN | NaN | TEN | LAC |
7 | 1 | A.J. Brown | WR | TEN | 8 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 3 | 11 | 1 | 0 | 0 | 32 | 1 | 1 | 0 | 0.090909 | 0.094955 | 7.1 | 9.1 | 11 | 0.0 | 0.0 | 5.50 | 2 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.500 | 1 | 0.500 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 36.0 | 59.0 | 0.61 | 1.0 | 12530 | 10/27/19 | 2019 | 8 | False | Tennessee Titans | 27 | 23 | Tampa Bay Buccaneers | TEN | -2.0 | 45.5 | Nissan Stadium | False | NaN | NaN | NaN | NaN | TEN | TB |
8 | 1 | A.J. Brown | WR | TEN | 9 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 7 | 81 | 0 | 0 | 0 | 102 | 25 | 4 | 0 | 0.189189 | 0.310976 | 8.1 | 12.1 | 81 | 0.0 | 0.0 | 20.25 | 4 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0 | 0.000 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 50.0 | 72.0 | 0.69 | 1.0 | 12535 | 11/3/19 | 2019 | 9 | False | Carolina Panthers | 30 | 20 | Tennessee Titans | CAR | -3.5 | 43.0 | Bank of America Stadium | False | NaN | NaN | NaN | NaN | CAR | TEN |
9 | 1 | A.J. Brown | WR | TEN | 10 | 2019 | REG | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 4 | 17 | 0 | 0 | 0 | 89 | 12 | 1 | 0 | 0.210526 | 0.354582 | 1.7 | 2.7 | 17 | 0.0 | 0.0 | 17.00 | 1 | 1 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 | 0 | 0.000 | 33.87 | 2019.0 | 2.0 | 51.0 | Jun-00 | 226.0 | 4.49 | 36.5 | 47.0 | 50.0 | 0.94 | 1.0 | 12557 | 11/10/19 | 2019 | 10 | False | Tennessee Titans | 35 | 32 | Kansas City Chiefs | KC | -6.0 | 49.0 | Nissan Stadium | False | NaN | NaN | NaN | NaN | TEN | KC |