Data wrangling

Convert categorical features to integer

Convert features with non-numeric values (categoricals) to integers. Values are converted to integers from range 0 to n_categories-1. There are two options to handle unknown values during transformation:

  • raise error when new unknown value is present during transofmration,
  • assign contant value for unknown values, for example -1.

The object that is used for preprocessing is called encoder. In this recipe, we are using OrdinalEncoder from scikit-learn.

You can use encoder object on new dataset. Please check Save to pickle recipe to save the object for later use.

pandascategorical

Required packages

You need below packages to use the code generated by recipe. All packages are automatically installed in MLJAR Studio.

pandas>=1.0.0

sklearn>=1.0.0

Interactive recipe

You can use below interactive recipe to generate code. This recipe is available in MLJAR Studio.

In the below recipe, we assume that you have following variables available in your notebook:

  • df_1 (type DataFrame)
  • df_2 (type DataFrame)

Python code

# Python code will be here