We are going to use the species data from data carpentry to explore with the dplyr
, tidyr
and purrr
packages. The data are available at http://kbroman.org/datacarp/portal_data_joined.csv. Here we attach the packages we will use with the library
command.
library(dplyr)
library(tidyr)
library(purrr)
library(readr)
surveys <- read_csv("../data/surveys.csv")
## Parsed with column specification:
## cols(
## record_id = col_integer(),
## month = col_integer(),
## day = col_integer(),
## year = col_integer(),
## plot_id = col_integer(),
## species_id = col_character(),
## sex = col_character(),
## hindfoot_length = col_integer(),
## weight = col_integer()
## )
surveys %>%
filter(!(is.na(hindfoot_length) & is.na(weight)),
!is.na(sex)) %>%
group_by(species_id, sex) %>%
summarize(hflen = mean(hindfoot_length, na.rm = TRUE),
wt = mean(weight, na.rm = TRUE))
## # A tibble: 48 x 4
## # Groups: species_id [?]
## species_id sex hflen wt
## <chr> <chr> <dbl> <dbl>
## 1 BA F 13.16129 9.161290
## 2 BA M 12.64286 7.357143
## 3 DM F 35.71269 41.609685
## 4 DM M 36.18823 44.353134
## 5 DO F 35.48633 48.531250
## 6 DO M 35.69878 49.135102
## 7 DS F 49.58317 117.749548
## 8 DS M 50.30194 122.290984
## 9 NL F 32.02419 154.282209
## 10 NL M 32.57743 165.652893
## # ... with 38 more rows
Here is a spread table for one of the traits.
surveys %>%
filter(!(is.na(hindfoot_length) & is.na(weight)),
!is.na(sex)) %>%
group_by(species_id, sex) %>%
summarize(hflen = mean(hindfoot_length, na.rm = TRUE)) %>%
spread(sex, hflen)
## # A tibble: 25 x 3
## # Groups: species_id [25]
## species_id F M
## * <chr> <dbl> <dbl>
## 1 BA 13.16129 12.64286
## 2 DM 35.71269 36.18823
## 3 DO 35.48633 35.69878
## 4 DS 49.58317 50.30194
## 5 NL 32.02419 32.57743
## 6 OL 20.28767 20.75833
## 7 OT 20.28165 20.25740
## 8 OX 19.66667 18.80000
## 9 PB 25.88267 26.43199
## 10 PE 20.27206 20.13213
## # ... with 15 more rows
How would you do this for two traits? Turns out this is more challenging, and depends on what you want to compare. The simplest thing is to arrange
or subset
the earlier table by sex
. If you want four columns on one line, for sex
and the two traits, you will have to be creative. What you want to do is avoid typing the trait names or the sexes if you can. I don’t have an easy solution!
The following gives us a list of 2-trait tables by sex. Notice the use of map
from package purrr
to act separately on each sex
.
tmp <- surveys %>%
filter(!(is.na(hindfoot_length) & is.na(weight)),
!is.na(sex))
tmp <- split(tmp, tmp$sex)
tmp2 <- map(tmp, function(x) {
x %>%
group_by(species_id, sex) %>%
summarize(hflen = mean(hindfoot_length, na.rm = TRUE),
wt = mean(weight, na.rm = TRUE))
})
tmp2
## $F
## # A tibble: 23 x 4
## # Groups: species_id [?]
## species_id sex hflen wt
## <chr> <chr> <dbl> <dbl>
## 1 BA F 13.16129 9.16129
## 2 DM F 35.71269 41.60968
## 3 DO F 35.48633 48.53125
## 4 DS F 49.58317 117.74955
## 5 NL F 32.02419 154.28221
## 6 OL F 20.28767 31.06582
## 7 OT F 20.28165 24.83090
## 8 OX F 19.66667 21.00000
## 9 PB F 25.88267 30.21088
## 10 PE F 20.27206 22.82218
## # ... with 13 more rows
##
## $M
## # A tibble: 25 x 4
## # Groups: species_id [?]
## species_id sex hflen wt
## <chr> <chr> <dbl> <dbl>
## 1 BA M 12.64286 7.357143
## 2 DM M 36.18823 44.353134
## 3 DO M 35.69878 49.135102
## 4 DS M 50.30194 122.290984
## 5 NL M 32.57743 165.652893
## 6 OL M 20.75833 32.068410
## 7 OT M 20.25740 23.695382
## 8 OX M 18.80000 21.000000
## 9 PB M 26.43199 33.816498
## 10 PE M 20.13213 20.584906
## # ... with 15 more rows
What about this? It gives small tables by species. The challenge is that species that do not have records for both sexes will mess up any easy combining.
tmp <- surveys %>%
filter(!(is.na(hindfoot_length) & is.na(weight)),
!is.na(sex))
tmp <- split(tmp, tmp$species_id)
tmp2 <- map(tmp, function(x) {
x %>%
select(-species_id) %>%
group_by(sex) %>%
summarize(hflen = mean(hindfoot_length, na.rm = TRUE),
wt = mean(weight, na.rm = TRUE))
})
tmp2
## $BA
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 13.16129 9.161290
## 2 M 12.64286 7.357143
##
## $DM
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 35.71269 41.60968
## 2 M 36.18823 44.35313
##
## $DO
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 35.48633 48.53125
## 2 M 35.69878 49.13510
##
## $DS
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 49.58317 117.7495
## 2 M 50.30194 122.2910
##
## $NL
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 32.02419 154.2822
## 2 M 32.57743 165.6529
##
## $OL
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 20.28767 31.06582
## 2 M 20.75833 32.06841
##
## $OT
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 20.28165 24.83090
## 2 M 20.25740 23.69538
##
## $OX
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 19.66667 21
## 2 M 18.80000 21
##
## $PB
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 25.88267 30.21088
## 2 M 26.43199 33.81650
##
## $PE
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 20.27206 22.82218
## 2 M 20.13213 20.58491
##
## $PF
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 15.55062 7.974394
## 2 M 15.62078 7.885287
##
## $PH
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 25.90000 30.85000
## 2 M 25.54545 31.45455
##
## $PI
## # A tibble: 1 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 M 22.5 19.42857
##
## $PL
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 19.75000 19.31250
## 2 M 20.26316 18.68421
##
## $PM
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 20.27978 22.12567
## 2 M 20.53002 20.80120
##
## $PP
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 21.67662 17.18067
## 2 M 21.83864 17.18449
##
## $PX
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 19 20
## 2 M 20 18
##
## $RF
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 17.52727 13.68421
## 2 M 17.50000 12.44444
##
## $RM
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 16.39188 11.10220
## 2 M 16.49151 10.12323
##
## $RO
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 14.75 11.0
## 2 M 16.00 9.5
##
## $RX
## # A tibble: 1 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 M 18.5 15.5
##
## $SF
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 27.50000 69.00000
## 2 M 26.34783 53.43478
##
## $SH
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 29.02817 78.75641
## 2 M 27.98333 65.17742
##
## $SO
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F 25.63333 55.32258
## 2 M 25.72727 55.70000
##
## $SS
## # A tibble: 2 x 3
## sex hflen wt
## <chr> <dbl> <dbl>
## 1 F NaN 57
## 2 M NaN 130
Suppose you want to bring in taxonomic information from another file. While you can use left_join(surveys, species)
to create a larger table, what can we do keeping the two tables separate? This can be useful if both tables are large, or if they are configured in different ways. Here
species <- read_csv("../data/species.csv")
## Parsed with column specification:
## cols(
## species_id = col_character(),
## genus = col_character(),
## species = col_character(),
## taxa = col_character()
## )
Here we want summaries by genus
. Notice the use of map_df
from purrr
to get results as a data frame.
species_taxa <- split(species, species$genus)
map_df(species_taxa,
function(x, surveys) {
# Identify species in this taxa
taxa_sp <- x$species_id
surveys %>%
filter(!(is.na(hindfoot_length) & is.na(weight)),
!is.na(sex),
species_id %in% taxa_sp) %>%
group_by(sex) %>%
summarize(hflen = mean(hindfoot_length, na.rm = TRUE),
wt = mean(weight, na.rm = TRUE))
},
surveys,
.id = "genus")
## # A tibble: 20 x 4
## genus sex hflen wt
## <chr> <chr> <dbl> <dbl>
## 1 Baiomys F 13.16129 9.161290
## 2 Baiomys M 12.64286 7.357143
## 3 Chaetodipus F 23.82453 23.763824
## 4 Chaetodipus M 23.92986 24.712219
## 5 Dipodomys F 37.87090 55.244360
## 6 Dipodomys M 37.91624 56.243034
## 7 Neotoma F 32.02419 154.282209
## 8 Neotoma M 32.57743 165.652893
## 9 Onychomys F 20.28219 26.780959
## 10 Onychomys M 20.40312 26.246466
## 11 Perognathus F 15.82996 8.574803
## 12 Perognathus M 15.76056 8.204182
## 13 Peromyscus F 20.26602 22.491649
## 14 Peromyscus M 20.29880 20.644279
## 15 Reithrodontomys F 16.43876 11.220080
## 16 Reithrodontomys M 16.50682 10.159941
## 17 Sigmodon F 27.94872 71.696000
## 18 Sigmodon M 27.31915 61.336842
## 19 Spermophilus F NaN 57.000000
## 20 Spermophilus M NaN 130.000000