step_pos_filter creates a specification of a recipe step that will filter a tokenlist based on part of speech tags.

  role = NA,
  trained = FALSE,
  columns = NULL,
  keep_tags = "NOUN",
  skip = FALSE,
  id = rand_id("pos_filter")

# S3 method for step_pos_filter
tidy(x, ...)



A recipe object. The step will be added to the sequence of operations for this recipe.


One or more selector functions to choose variables. For step_pos_filter, this indicates the variables to be encoded into a tokenlist. See recipes::selections() for more details. For the tidy method, these are not currently used.


Not used by this step since no new variables are created.


A logical to indicate if the recipe has been baked.


A list of tibble results that define the encoding. This is NULL until the step is trained by recipes::prep.recipe().


Character variable of part of speech tags to keep. See details for complete list of tags. Defaults to "NOUN".


A logical. Should the step be skipped when the recipe is baked by recipes::bake.recipe()? While all operations are baked when recipes::prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations.


A character string that is unique to this step to identify it.


A step_pos_filter object.


An updated version of recipe with the new step added to the sequence of existing steps (if any).


Possible part of speech tags for spacyr engine are: "ADJ", "ADP", "ADV", "AUX", "CONJ", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PART", "PRON", "PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X" and "SPACE". For more information look here

See also

step_tokenize() to turn character into tokenlist.

Other tokenlist to tokenlist steps: step_lemma(), step_ngram(), step_stem(), step_stopwords(), step_tokenfilter(), step_tokenmerge()


# \dontrun{ library(recipes) short_data <- data.frame(text = c("This is a short tale,", "With many cats and ladies.")) okc_rec <- recipe(~ text, data = short_data) %>% step_tokenize(text, engine = "spacyr") %>% step_pos_filter(text, keep_tags = "NOUN") %>% step_tf(text) okc_obj <- prep(okc_rec) juice(okc_obj)
#> # A tibble: 2 x 3 #> tf_text_cats tf_text_ladies tf_text_tale #> <dbl> <dbl> <dbl> #> 1 0 0 1 #> 2 1 1 0
# }