import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("archive/shopping_behavior_updated.csv")
df.head()
Customer ID | Age | Gender | Item Purchased | Category | Purchase Amount (USD) | Location | Size | Color | Season | Review Rating | Subscription Status | Shipping Type | Discount Applied | Promo Code Used | Previous Purchases | Payment Method | Frequency of Purchases | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 55 | Male | Blouse | Clothing | 53 | Kentucky | L | Gray | Winter | 3.1 | Yes | Express | Yes | Yes | 14 | Venmo | Fortnightly |
1 | 2 | 19 | Male | Sweater | Clothing | 64 | Maine | L | Maroon | Winter | 3.1 | Yes | Express | Yes | Yes | 2 | Cash | Fortnightly |
2 | 3 | 50 | Male | Jeans | Clothing | 73 | Massachusetts | S | Maroon | Spring | 3.1 | Yes | Free Shipping | Yes | Yes | 23 | Credit Card | Weekly |
3 | 4 | 21 | Male | Sandals | Footwear | 90 | Rhode Island | M | Maroon | Spring | 3.5 | Yes | Next Day Air | Yes | Yes | 49 | PayPal | Weekly |
4 | 5 | 45 | Male | Blouse | Clothing | 49 | Oregon | M | Turquoise | Spring | 2.7 | Yes | Free Shipping | Yes | Yes | 31 | PayPal | Annually |
Customer ID:A unique identifier assigned to each individual customer, facilitating tracking and analysis of their shopping behavior over time.
Age: The age of the customer, providing demographic information for segmentation and targeted marketing strategies.
Gender: The gender identification of the customer, a key demographic variable influencing product preferences and purchasing patterns.
Item Purchased: The specific product or item selected by the customer during the transaction.
Category: The broad classification or group to which the purchased item belongs (e.g., clothing, electronics, groceries).
Purchase Amount (USD): The monetary value of the transaction, denoted in United States Dollars (USD), indicates the cost of the purchased item(s).
Location: The geographical location where the purchase was made, offering insights into regional preferences and market trends.
Size: The size specification (if applicable) of the purchased item, relevant for apparel, footwear, and certain consumer goods.
Color: The color variant or choice associated with the purchased item, influencing customer preferences and product availability.
Season: The seasonal relevance of the purchased item (e.g., spring, summer, fall, winter), impacting inventory management and marketing strategies.
Review Rating: A numerical or qualitative assessment provided by the customer regarding their satisfaction with the purchased item.
Subscription Status: Indicates whether the customer has opted for a subscription service, offering insights into their level of loyalty and potential for recurring revenue.
Shipping Type: Specifies the method used to deliver the purchased item (e.g., standard shipping, express delivery), influencing delivery times and costs.
Discount Applied: Indicates if any promotional discounts were applied to the purchase, shedding light on price sensitivity and promotion effectiveness.
Promo Code Used: Notes whether a promotional code or coupon was utilized during the transaction, aiding in the evaluation of marketing campaign success.
Previous Purchases: Provides information on the number or frequency of prior purchases made by the customer, contributing to customer segmentation and retention strategies.
Payment Method: Specifies the mode of payment employed by the customer (e.g., credit card, cash), offering insights into preferred payment options.
Frequency of Purchases: Indicates how often the customer engages in purchasing activities, a critical metric for assessing customer loyalty and lifetime value.
# plt.scatter(df.Age, df["Review Rating"])
df = pd.read_csv("archive/shopping_behavior_updated.csv")
df = df.join(pd.get_dummies(df.Gender))
df.head()
Customer ID | Age | Gender | Item Purchased | Category | Purchase Amount (USD) | Location | Size | Color | Season | Review Rating | Subscription Status | Shipping Type | Discount Applied | Promo Code Used | Previous Purchases | Payment Method | Frequency of Purchases | Female | Male | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 55 | Male | Blouse | Clothing | 53 | Kentucky | L | Gray | Winter | 3.1 | Yes | Express | Yes | Yes | 14 | Venmo | Fortnightly | 0 | 1 |
1 | 2 | 19 | Male | Sweater | Clothing | 64 | Maine | L | Maroon | Winter | 3.1 | Yes | Express | Yes | Yes | 2 | Cash | Fortnightly | 0 | 1 |
2 | 3 | 50 | Male | Jeans | Clothing | 73 | Massachusetts | S | Maroon | Spring | 3.1 | Yes | Free Shipping | Yes | Yes | 23 | Credit Card | Weekly | 0 | 1 |
3 | 4 | 21 | Male | Sandals | Footwear | 90 | Rhode Island | M | Maroon | Spring | 3.5 | Yes | Next Day Air | Yes | Yes | 49 | PayPal | Weekly | 0 | 1 |
4 | 5 | 45 | Male | Blouse | Clothing | 49 | Oregon | M | Turquoise | Spring | 2.7 | Yes | Free Shipping | Yes | Yes | 31 | PayPal | Annually | 0 | 1 |
hist = plt.hist(df.Category)
What factors help a company predict what demographic (Age, Gender) is purchasing certain items?
How does seasonality affect what/how many items are purchased?
What type of person is most likely to be subscribed?