This is why we use one hot encoder to perform “binarization” of the category and include it as a feature to train the model. Another Example: Suppose you have ‘flower’ feature which can take values ‘daffodil’, ‘lily’, and ‘rose’. One hot encoding converts ‘flower’ feature...
Since SGDClassifier is a linear model, when it comes to categorical data, the type of encoding is rather important. So did you check the coefficients of your model? Did you use an OneHotEncoder for encoding your categories? Basically, the value of the "unseen categories" will have an impact...
from sklearn.preprocessing import LabelEncoder, OneHotEncoder labelencoder_y_1 = LabelEncoder() y = labelencoder_y_1.fit_transform(y) Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split ...