step_untokenize creates a specification of a recipe step that will convert a tokenlist into a character predictor.

step_untokenize(
  recipe,
  ...,
  role = NA,
  trained = FALSE,
  columns = NULL,
  sep = " ",
  skip = FALSE,
  id = rand_id("untokenize")
)

# S3 method for step_untokenize
tidy(x, ...)

Arguments

recipe

A recipe object. The step will be added to the sequence of operations for this recipe.

...

One or more selector functions to choose variables. For step_untokenize, this indicates the variables to be encoded into a tokenlist. See recipes::selections() for more details. For the tidy method, these are not currently used.

role

Not used by this step since no new variables are created.

trained

A logical to indicate if the recipe has been baked.

columns

A list of tibble results that define the encoding. This is NULL until the step is trained by recipes::prep.recipe().

sep

a character to determine how the tokens should be separated when pasted together. Defaults to " ".

skip

A logical. Should the step be skipped when the recipe is baked by recipes::bake.recipe()? While all operations are baked when recipes::prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.

id

A character string that is unique to this step to identify it.

x

A step_untokenize object.

Value

An updated version of recipe with the new step added to the sequence of existing steps (if any).

Details

This steps will turn a tokenlist back into a character vector. This step is calling paste internally to put the tokens back together to a character.

See also

step_tokenize() to turn character into tokenlist.

Examples

library(recipes) library(modeldata) data(okc_text) okc_rec <- recipe(~ ., data = okc_text) %>% step_tokenize(essay0) %>% step_untokenize(essay0) okc_obj <- okc_rec %>% prep() juice(okc_obj, essay0) %>% slice(1:2)
#> # A tibble: 2 x 1 #> essay0 #> <fct> #> 1 twice upon a time there was a boy who died twice and lived happily ever after… #> 2 i'm chill and steady br i'm a teacher amp musician br i like playing outside …
juice(okc_obj) %>% slice(2) %>% pull(essay0)
#> [1] i'm chill and steady br i'm a teacher amp musician br i like playing outside dislike school nights br and i'm very very lucky #> 750 Levels: _updates_ br br br i've written this as pretty much a big stream of info it's not in any way creative prose it's also not the ideal delivery mechanism but hopefully serves to give you some insight on who i am what i'm about where i'm going what i'm looking for and what's important to me br br humor i would say my humor is pretty unique i take things i have learned about a person and dream up scenarios where i play back my knowledge of her him in slightly exaggerated ways bringing in the context of the current moment and weaving in any and all additional persons around me i can use all that data i've been collecting about you your world and the greater universe and transform it into some vociferous laughter or so i say br br destroying banality with every breath br br br br gt now the schpeel br br i grew up rural only child dad immigrated from germany in his mid twenties mom has euro immigrant parents big but odd farm till adolescence i stayed in my home town for school then headed from east dc metro to west sf with a few years in palo alto menlo park san bruno for good in my late twenties general plan well established but new old challenges continue to press for a lifelong solution i've been here for about 12 years now minus travel time year in portland and two winters in tahoe including this one br br i've come to see art in everything and music and boundless energy sf bay is truly a bubble but a sustainable bubble of intellect soul and know how that shapes its world each and every day i am honored to be a part of that br br i am adventurous inventive and fiery ...
tidy(okc_rec, number = 2)
#> # A tibble: 1 x 3 #> terms value id #> <chr> <chr> <chr> #> 1 essay0 <NA> untokenize_eE9C3
tidy(okc_obj, number = 2)
#> # A tibble: 1 x 3 #> terms value id #> <quos> <chr> <chr> #> 1 essay0 " " untokenize_eE9C3