The error "ValueError: y contains previously unseen labels" occurs because the LabelEncoder has encountered previously unseen labels during the inverse_transform step.
To fix this error, you can modify the code by adding a check to ensure that the labels in the test data have been previously seen by the LabelEncoder. One way to do this is to use the classes_
attribute of the LabelEncoder object, which stores the unique labels seen during the fit_transform step.
Here's an example of how you could modify the code:
cssX_train_obj_imputed_encoded = pd.DataFrame(impute_test_rf.transform(X_train_obj), columns=X_train_obj.columns)
# Inverse transform the encoded data
X_train_obj_imputed = X_train_obj_imputed_encoded.apply(lambda series: pd.Series(
le.inverse_transform(series[series.notnull()]) if set(series.unique()).issubset(set(le.classes_)) else series,
index=series.index
))
In this modified code, the inverse_transform
step is only applied to series where all unique labels in the series are already present in le.classes_
. For any series where there are previously unseen labels, the original encoded values are kept.
This should resolve the ValueError you encountered.