
Extract a character column into multiple columns using regex groups in a column of nested data frames
Source:R/nest_extract.R
nest_extract.Rdnest_extract() is used to extract capturing groups from a column in a nested
data frame using regular expressions into a new column. If the groups don't
match, or the input is NA, the output will be NA.
Usage
nest_extract(
.data,
.nest_data,
col,
into,
regex = "([[:alnum:]]+)",
remove = TRUE,
convert = FALSE,
...
)Arguments
- .data
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).
- .nest_data
A list-column containing data frames
- col
Column name or position within
.nest_data(must be present within all nested data frames in.nest_data). This is passed totidyselect::vars_pull().This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).
- into
Names of new variables to create as character vector. Use
NAto omit the variable in the output.- regex
A string representing a regular expression used to extract the desired values. There should be one group (defined by
()) for each element ofinto.- remove
If
TRUE, remove input column from output data frame.- convert
If
TRUE, will runtype.convert()withas.is = TRUEon new columns. This is useful if the component columns are integer, numeric or logical.NB: this will cause string
"NA"s to be converted toNAs.- ...
Additional arguments passed on to
tidyr::extract()methods.
Value
An object of the same type as .data. Each object in the column .nest_data
will have new columns created according to the capture groups specified in
the regular expression.
Details
nest_extract() is a wrapper for tidyr::extract() and maintains the functionality
of extract() within each nested data frame. For more information on extract()
please refer to the documentation in 'tidyr'.
See also
Other tidyr verbs:
nest_drop_na(),
nest_fill(),
nest_replace_na(),
nest_separate(),
nest_unite()
Examples
set.seed(123)
gm <- gapminder::gapminder
gm <-
gm %>%
dplyr::mutate(comb = sample(c(NA, "a-b", "a-d", "b-c", "d-e"),
size = nrow(gm),
replace = TRUE))
gm_nest <- gm %>% tidyr::nest(country_data = -continent)
gm_nest %>%
nest_extract(country_data,
col = comb,
into = c("var1","var2"),
regex = "([[:alnum:]]+)-([[:alnum:]]+)")
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [396 × 7]>
#> 2 Europe <tibble [360 × 7]>
#> 3 Africa <tibble [624 × 7]>
#> 4 Americas <tibble [300 × 7]>
#> 5 Oceania <tibble [24 × 7]>