import pandas as pd
df = pd.read_csv("games.csv")
df.head()
GAME_DATE_EST | GAME_ID | GAME_STATUS_TEXT | HOME_TEAM_ID | VISITOR_TEAM_ID | SEASON | TEAM_ID_home | PTS_home | FG_PCT_home | FT_PCT_home | ... | AST_home | REB_home | TEAM_ID_away | PTS_away | FG_PCT_away | FT_PCT_away | FG3_PCT_away | AST_away | REB_away | HOME_TEAM_WINS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022-12-22 | 22200477 | Final | 1610612740 | 1610612759 | 2022 | 1610612740 | 126.0 | 0.484 | 0.926 | ... | 25.0 | 46.0 | 1610612759 | 117.0 | 0.478 | 0.815 | 0.321 | 23.0 | 44.0 | 1 |
1 | 2022-12-22 | 22200478 | Final | 1610612762 | 1610612764 | 2022 | 1610612762 | 120.0 | 0.488 | 0.952 | ... | 16.0 | 40.0 | 1610612764 | 112.0 | 0.561 | 0.765 | 0.333 | 20.0 | 37.0 | 1 |
2 | 2022-12-21 | 22200466 | Final | 1610612739 | 1610612749 | 2022 | 1610612739 | 114.0 | 0.482 | 0.786 | ... | 22.0 | 37.0 | 1610612749 | 106.0 | 0.470 | 0.682 | 0.433 | 20.0 | 46.0 | 1 |
3 | 2022-12-21 | 22200467 | Final | 1610612755 | 1610612765 | 2022 | 1610612755 | 113.0 | 0.441 | 0.909 | ... | 27.0 | 49.0 | 1610612765 | 93.0 | 0.392 | 0.735 | 0.261 | 15.0 | 46.0 | 1 |
4 | 2022-12-21 | 22200468 | Final | 1610612737 | 1610612741 | 2022 | 1610612737 | 108.0 | 0.429 | 1.000 | ... | 22.0 | 47.0 | 1610612741 | 110.0 | 0.500 | 0.773 | 0.292 | 20.0 | 47.0 | 0 |
5 rows × 21 columns
This dataset was acquired via Kaggle, titled "NBA Games Data." Each row of the dataset corresponds to a game and each column represents a statistic related to that game from the 2003 to 2022 seasons.
Link to dataset: https://www.kaggle.com/datasets/nathanlauga/nba-games?resource=download
What statistics related to a NBA game are most valuable for predicting whether or not a home team wins?
Can we create a model using these features to predict whether or not a future team wins?