classsklearn.preprocessing.OneHotEncoder(*, categories='auto', drop=None, sparse=True, dtype=<class'numpy.float64'>, handle_unknown='error') 将分类特征编码为 one-hot 数值数组。 该转换器的输入应该是类似整数或字符串的数组,表示分类(离散)特征所采用的值。这些特征使用one-hot(又名“one-of-K”...
'uses IE','uses Safari']>>> enc = preprocessing.OneHotEncoder(categories=[genders, locations, browsers])>>>#Note that for there are missing categorical values for the 2nd and 3rd>>>#feature>>> X = [['male','
1import numpyasnp2import pandasaspd3fromcategory_encoders import OneHotEncoder4# category_encoders 直接支持dataframe56# 随机生成一些训练集7train_set = pd.DataFrame(np.array([['male',10],['female',20], ['male',10],8['female',20],['female',15]]),9columns = ['Sex','Type'])10train...
# 初始化 OneHotEncoder 对象 ohe = OneHotEncoder() 4.进行独热编码 #对 DataFrame 中的指定列(1到3列)进行独热编码,并转换为 numpy 数组 df_transformed = ohe.fit_transform(df.iloc[:, 1:4]).toarray() # 获取独热编码后的特征的分类信息 df_transformed2 = ohe.categories_ # 获取独热编码后的...
另外一种将标称型特征转换为能够被scikit-learn中模型使用的编码是one-of-K, 又称为 独热码或dummy encoding。 这种编码类型已经在类OneHotEncoder中实现。该类把每一个具有n_categories个可能取值的categorical特征变换为长度为n_categories的二进制特征向量,里面只有一个地方是1,其余位置都是0。
print(enc.categories_) [array(['女', '男'], dtype=object), array([0, 1, 2], dtype=object), array([0, 1, 2, 3], dtype=object)] 1. 2. 3. 4. 一个例子: from sklearn.preprocessing import OneHotEncoder enc = OneHotEncoder() ...
categories = "auto"代表请算法自动遍历类别。 from sklearn.preprocessing import OneHotEncoder data2 = data.copy() data2_fit = OneHotEncoder(categories = "auto").fit(data2.iloc[:,1:3]) data2_result = data2_fit.transform(data2.iloc[:,1:3]).toarray() ...
首先,我们需要将原始数据中的类别进行推断,即指定categories参数,这样OneHotEncoder才能知道每个特征的类别。 python from sklearn.preprocessing import OneHotEncoder #创建OneHotEncoder对象 encoder = OneHotEncoder(categories='auto', drop=None, sparse=True, dtype=np.float64) #对特征进行独热编码 encoder.fit(...
from sklearn.preprocessing import OneHotEncoder enc = OneHotEncoder(handle_unknown='ignore') X = [['M ale', 1], ['Female', 3], ['Fem ale', 2]] enc.fit(X) print(enc.categories_) a = enc.transform([['Female', 1], ['Male', 4]]) ...
data.head()fromsklearn.preprocessingimportOneHotEncoder#独热编码X=data.iloc[:,1:-1]#提取特征性别和年龄enc=OneHotEncoder(categories='auto').fit(X)#categories='auto'是自动遍历特征中有几类result=enc.transform(X).toarray()#toarray转化为数组,不转化的话是由0,1组成的sparse matrix...