step_pos_filter creates a specification of a recipe step that will filter a tokenlist based on part of speech tags.

step_pos_filter(
recipe,
...,
role = NA,
trained = FALSE,
columns = NULL,
keep_tags = "NOUN",
skip = FALSE,
id = rand_id("pos_filter")
)

# S3 method for step_pos_filter
tidy(x, ...)

## Arguments

recipe A recipe object. The step will be added to the sequence of operations for this recipe. One or more selector functions to choose variables. For step_pos_filter, this indicates the variables to be encoded into a tokenlist. See recipes::selections() for more details. For the tidy method, these are not currently used. Not used by this step since no new variables are created. A logical to indicate if the recipe has been baked. A list of tibble results that define the encoding. This is NULL until the step is trained by recipes::prep.recipe(). Character variable of part of speech tags to keep. See details for complete list of tags. Defaults to "NOUN". A logical. Should the step be skipped when the recipe is baked by recipes::bake.recipe()? While all operations are baked when recipes::prep.recipe() is run, some operations may not be able to be conducted on new data (e.g. processing the outcome variable(s)). Care should be taken when using skip = TRUE as it may affect the computations for subsequent operations. A character string that is unique to this step to identify it. A step_pos_filter object.

## Value

An updated version of recipe with the new step added to the sequence of existing steps (if any).

## Details

Possible part of speech tags for spacyr engine are: "ADJ", "ADP", "ADV", "AUX", "CONJ", "CCONJ", "DET", "INTJ", "NOUN", "NUM", "PART", "PRON", "PROPN", "PUNCT", "SCONJ", "SYM", "VERB", "X" and "SPACE". For more information look here https://spacy.io/api/annotation#pos-tagging.

step_tokenize() to turn character into tokenlist.

## Examples

# \dontrun{
library(recipes)

short_data <- data.frame(text = c("This is a short tale,",
#> 2            1              1            0# }