Group 2 Members:

  • Dayna Karls
  • Ryan Herrington
  • Cody Nelson
  • Nick Mitchinson
  • Ethan Yu

Descriptions of Questions:

Given a player with a known height and weight, what offensive position (QB, RB, WR, TE, or OL) are they most likely to play in the NFL?

This data and models will be helpful for deciding a new position for players that need a position change in the NFL. Position changes are common in the NFL in order to best use players’ skills effectively. With these models, coaches should be able to plug in a player’s height and weight and get a position recommendation for the player.

In [15]:
import pandas as pd

data = pd.read_csv('Basic_Stats.csv')
data.head()
Out[15]:
Age Birth Place Birthday College Current Status Current Team Experience Height (inches) High School High School Location Name Number Player Id Position Weight (lbs) Years Played
0 NaN Grand Rapids , MI 5/23/1921 Notre Dame Retired NaN 3 Seasons 71.0 NaN NaN Evans, Fred NaN fredevans/2513736 NaN 185.0 1946 - 1948
1 NaN Dayton , OH 12/21/1930 Dayton Retired NaN 1 Season 70.0 NaN NaN Raiff, Jim NaN jimraiff/2523700 NaN 235.0 1954 - 1954
2 56.0 Temple , TX 9/11/1960 Louisiana Tech Retired NaN 1 Season 74.0 NaN NaN Fowler, Bobby NaN bobbyfowler/2514295 NaN 230.0 1985 - 1985
3 30.0 New Orleans , LA 9/30/1986 LSU Retired NaN 5 Seasons 73.0 NaN NaN Johnson, Quinn NaN quinnjohnson/79593 NaN 255.0 2009 - 2013
4 25.0 Detroit , MI 3/31/1992 Central Michigan Active Pittsburgh Steelers 3rd season 77.0 Clintondale HS Clinton Twp.,Macomb Co., MI Walton, L.T. 96.0 l.t.walton/2552444 DE 305.0 NaN

Variables:

The columns we are interested in from this dataset are: Name, Player ID, Position, Height (inches), and Weight (lbs).

We will be finding the predicted position for each players, to test the effectiveness of models for choosing a position for players with a specific height and weight.

Methods:

We will first have to clean the data, specifically the position column since the dataset only has positions for current players. Since the data set is large, and we want positions to be accurate, it makes the most sense to drop the examples with missing position data.

Next, we will need to use one-hot encoding on the remaining position data to make columns for each individual offensive position.

Our model will need to be a classifier. The two best options for this would be k-NN and Decision Tree algorithms, so we will create a model for both and determine which performs better while tuning the size of k and tree depth.