# A tibble: 1,458 × 8
faa name lat lon alt tz dst tzone
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 04G Lansdowne Airport 41.1 -80.6 1044 -5 A America/…
2 06A Moton Field Municipal Airport 32.5 -85.7 264 -6 A America/…
3 06C Schaumburg Regional 42.0 -88.1 801 -6 A America/…
4 06N Randall Airport 41.4 -74.4 523 -5 A America/…
5 09J Jekyll Island Airport 31.1 -81.4 11 -5 A America/…
6 0A9 Elizabethton Municipal Airport 36.4 -82.2 1593 -5 A America/…
7 0G6 Williams County Airport 41.5 -84.5 730 -5 A America/…
8 0G7 Finger Lakes Regional Airport 42.9 -76.8 492 -5 A America/…
9 0P2 Shoestring Aviation Airfield 39.8 -76.6 1000 -5 U America/…
10 0S9 Jefferson County Intl 48.1 -123. 108 -8 A America/…
# ℹ 1,448 more rows
Yet another example:
planes
# A tibble: 3,322 × 9
tailnum year type manufacturer model engines seats speed engine
<chr> <int> <chr> <chr> <chr> <int> <int> <int> <chr>
1 N10156 2004 Fixed wing multi… EMBRAER EMB-… 2 55 NA Turbo…
2 N102UW 1998 Fixed wing multi… AIRBUS INDU… A320… 2 182 NA Turbo…
3 N103US 1999 Fixed wing multi… AIRBUS INDU… A320… 2 182 NA Turbo…
4 N104UW 1999 Fixed wing multi… AIRBUS INDU… A320… 2 182 NA Turbo…
5 N10575 2002 Fixed wing multi… EMBRAER EMB-… 2 55 NA Turbo…
6 N105UW 1999 Fixed wing multi… AIRBUS INDU… A320… 2 182 NA Turbo…
7 N107US 1999 Fixed wing multi… AIRBUS INDU… A320… 2 182 NA Turbo…
8 N108UW 1999 Fixed wing multi… AIRBUS INDU… A320… 2 182 NA Turbo…
9 N109UW 1999 Fixed wing multi… AIRBUS INDU… A320… 2 182 NA Turbo…
10 N110UW 1999 Fixed wing multi… AIRBUS INDU… A320… 2 182 NA Turbo…
# ℹ 3,312 more rows
And a final example:
airlines
# A tibble: 16 × 2
carrier name
<chr> <chr>
1 9E Endeavor Air Inc.
2 AA American Airlines Inc.
3 AS Alaska Airlines Inc.
4 B6 JetBlue Airways
5 DL Delta Air Lines Inc.
6 EV ExpressJet Airlines Inc.
7 F9 Frontier Airlines Inc.
8 FL AirTran Airways Corporation
9 HA Hawaiian Airlines Inc.
10 MQ Envoy Air
11 OO SkyWest Airlines Inc.
12 UA United Air Lines Inc.
13 US US Airways Inc.
14 VX Virgin America
15 WN Southwest Airlines Co.
16 YV Mesa Airlines Inc.
A diagram
Relational diagram
Primary Keys
A primary key is a variable or set of variables that uniquely identifies each observation
When more than one variable is needed, the key is called a compound key.
Q: What are the primary keys for airports, planes, airlines?
Foreign Keys
A foreign key is a variable (or set of variables) that corresponds to a primary key in another table.
Typically, we want the foreign key to be the same name as the primary key. It makes it easier to interpret what’s happening.
Q: Find a foreign key that is not identical to it’s primary key. Why might this be good design?
Surrogate Keys
There isn’t a very good primary key for flights. So, you could make one:
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
left_join(), inner_join(), right_join(), full_join(), semi_join(), and anti_join()
You should learn them all! But in the interest of time, we will focus on my favorite, left_join.
left_join
left_join is a mutating join because it adds columns, like mutate.
it takes two or three arguments.
flights_small = flights |>select(year, time_hour, origin, dest, tailnum, carrier)# left_join(flights_small, airlines)# or, with a pipe:flights_small |>left_join(airlines)
Joining with `by = join_by(carrier)`
# A tibble: 336,776 × 7
year time_hour origin dest tailnum carrier name
<int> <dttm> <chr> <chr> <chr> <chr> <chr>
1 2013 2013-01-01 05:00:00 EWR IAH N14228 UA United Air Lines Inc.
2 2013 2013-01-01 05:00:00 LGA IAH N24211 UA United Air Lines Inc.
3 2013 2013-01-01 05:00:00 JFK MIA N619AA AA American Airlines Inc.
4 2013 2013-01-01 05:00:00 JFK BQN N804JB B6 JetBlue Airways
5 2013 2013-01-01 06:00:00 LGA ATL N668DN DL Delta Air Lines Inc.
6 2013 2013-01-01 05:00:00 EWR ORD N39463 UA United Air Lines Inc.
7 2013 2013-01-01 06:00:00 EWR FLL N516JB B6 JetBlue Airways
8 2013 2013-01-01 06:00:00 LGA IAD N829AS EV ExpressJet Airlines I…
9 2013 2013-01-01 06:00:00 JFK MCO N593JB B6 JetBlue Airways
10 2013 2013-01-01 06:00:00 LGA ORD N3ALAA AA American Airlines Inc.
# ℹ 336,766 more rows
# A tibble: 63 × 9
year time_hour origin dest tailnum carrier type engines seats
<int> <dttm> <chr> <chr> <chr> <chr> <chr> <int> <int>
1 2013 2013-01-01 06:00:00 LGA ORD N3ALAA AA <NA> NA NA
2 2013 2013-01-02 18:00:00 LGA ORD N3ALAA AA <NA> NA NA
3 2013 2013-01-03 06:00:00 LGA ORD N3ALAA AA <NA> NA NA
4 2013 2013-01-07 19:00:00 LGA ORD N3ALAA AA <NA> NA NA
5 2013 2013-01-08 17:00:00 JFK ORD N3ALAA AA <NA> NA NA
6 2013 2013-01-16 06:00:00 LGA ORD N3ALAA AA <NA> NA NA
7 2013 2013-01-20 18:00:00 LGA ORD N3ALAA AA <NA> NA NA
8 2013 2013-01-22 17:00:00 JFK ORD N3ALAA AA <NA> NA NA
9 2013 2013-10-11 06:00:00 EWR MIA N3ALAA AA <NA> NA NA
10 2013 2013-10-14 08:00:00 JFK BOS N3ALAA AA <NA> NA NA
# ℹ 53 more rows
What if there are multiple matches??
Put this in Rstudio to see what happens… you get a helpful error!
Joining with `by = join_by(year, month, day, hour)`
Warning in left_join(flights_no_good, weather_no_good): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 5 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
"many-to-many"` to silence this warning.