import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.basketball-reference.com/leagues/NBA_2025_per_game.html"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
table = soup.find("table", {"id": "per_game_stats"})
headers = [ "Player", "Age", "Team", "Pos", "G", "GS", "MP", "FG", "FGA", "FG%", "3P", "3PA", "3P%",
"2P", "2PA", "2P%", "eFG%", "FT", "FTA", "FT%", "ORB", "DRB", "TRB", "AST", "STL", "BLK",
"TOV", "PF", "PTS", "Awards"]
rows = []
for row in table.find_all("tr"):
data = [cell.get_text() for cell in row.find_all("td")]
if data:
rows.append(data)
df = pd.DataFrame(rows, columns=headers)
df.head()
Player | Age | Team | Pos | G | GS | MP | FG | FGA | FG% | ... | ORB | DRB | TRB | AST | STL | BLK | TOV | PF | PTS | Awards | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Giannis Antetokounmpo | 30 | MIL | PF | 9 | 9 | 34.8 | 12.9 | 21.2 | .607 | ... | 2.3 | 10.4 | 12.8 | 5.2 | 0.4 | 0.9 | 2.7 | 3.6 | 31.6 | |
1 | Anthony Davis | 31 | LAL | PF | 9 | 9 | 35.1 | 10.8 | 18.7 | .577 | ... | 2.1 | 8.3 | 10.4 | 2.8 | 1.3 | 2.0 | 2.2 | 1.2 | 31.2 | |
2 | Jayson Tatum | 26 | BOS | SF | 11 | 11 | 36.0 | 9.5 | 20.5 | .465 | ... | 0.5 | 7.2 | 7.6 | 5.0 | 1.6 | 0.5 | 2.9 | 2.5 | 30.5 | |
3 | Nikola Jokić | 29 | DEN | C | 10 | 10 | 38.1 | 10.8 | 19.2 | .563 | ... | 4.5 | 9.2 | 13.7 | 11.7 | 1.7 | 1.0 | 4.1 | 2.0 | 29.7 | |
4 | LaMelo Ball | 23 | CHO | PG | 10 | 10 | 33.4 | 10.2 | 23.0 | .443 | ... | 1.0 | 3.9 | 4.9 | 6.2 | 1.5 | 0.3 | 4.7 | 4.1 | 29.4 |
5 rows × 30 columns
descriptions of the question:¶
Our purpose is to find Who are the top MVP candidates based on the efficiency score?
We are going to Identify players with the highest efficiency scores to determine the leading MVP candidates.
And the efficiency score is defined by the following formula: $$ \text{Efficiency} = \frac{\text{PTS} + \text{TRB} + \text{AST} + \text{STL} + \text{BLK} - (\text{FGA} - \text{FG}) - (\text{FTA} - \text{FT}) - \text{TOV}}{G} $$
Variables:¶
We will dataset with complete player statistics through 2025 NBA seasons from basketball reference, and some of those variables are PTS, TRB, AST FG%,MP and PER. such a dataset is perfect for collecting, it allows us to explore the wide range of player performance analysis.
methods:¶
Model 1: Decision Tree Classification
Model Goal: Our goal is to use a decision tree model to predict whether a player is a potential MVP candidate based on their statistical performance. We will use efficiency score as a key predictor, as well as other important statistics such as PTS, TRB, and AST.
By using decision trees, we can automatically assess feature importance. This helps us determine which statistical indicators (e.g., PTS, TRB, AST, MP and PER) are most influential in predicting player efficiency and MVP candidacy. This insight is valuable in understanding the key factors that drive player performance.
Model 2: Support Vector Machine (SVM)
We plan to use the SVM model to take the characteristics of each player as features such as PTS, EFF, etc., and then classify them to determine which abilities of the player will affect his chances of winning MVP.
At the same time, we will also observe the model's scoring situation and compare the model with other models to determine whether the model correctly classifies most of the players.
Model 3: Logistic Regression
We will also introduce a logistic regression model, where we will use PTS or other factors as features to determine the probability of a player being selected as MVP.
We will introduce a high threshold to classify players and predict the list of players who have a chance to be selected as MVP in 2025.