Extract a character column into multiple columns using regex groups in a column of nested data frames

nest_extract() is used to extract capturing groups from a column in a nested data frame using regular expressions into a new column. If the groups don't match, or the input is NA, the output will be NA.

Usage

nest_extract(
  .data,
  .nest_data,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

col

Column name or position within .nest_data (must be present within all nested data frames in .nest_data). This is passed to tidyselect::vars_pull().

This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).

into

Names of new variables to create as character vector. Use NA to omit the variable in the output.

regex

A string representing a regular expression used to extract the desired values. There should be one group (defined by ()) for each element of into.

remove

If TRUE, remove input column from output data frame.

convert

If TRUE, will run type.convert() with as.is = TRUE on new columns. This is useful if the component columns are integer, numeric or logical.

NB: this will cause string "NA"s to be converted to NAs.

...

Additional arguments passed on to tidyr::extract() methods.

Value

An object of the same type as .data. Each object in the column .nest_data will have new columns created according to the capture groups specified in the regular expression.

Details

nest_extract() is a wrapper for tidyr::extract() and maintains the functionality of extract() within each nested data frame. For more information on extract() please refer to the documentation in 'tidyr'.

Examples

set.seed(123)
gm <- gapminder::gapminder 

gm <- 
  gm %>% 
  dplyr::mutate(comb = sample(c(NA, "a-b", "a-d", "b-c", "d-e"),
                              size = nrow(gm),
                              replace = TRUE))
                              
gm_nest <- gm %>% tidyr::nest(country_data = -continent)

gm_nest %>% 
  nest_extract(country_data,
               col = comb,
               into = c("var1","var2"),
               regex = "([[:alnum:]]+)-([[:alnum:]]+)")
#> # A tibble: 5 × 2
#>   continent country_data      
#>   <fct>     <list>            
#> 1 Asia      <tibble [396 × 7]>
#> 2 Europe    <tibble [360 × 7]>
#> 3 Africa    <tibble [624 × 7]>
#> 4 Americas  <tibble [300 × 7]>
#> 5 Oceania   <tibble [24 × 7]>