#查看是否存在重复行df.duplicated().sum()0#查看是否存在缺失值df.isnull().sum()Rank0Name0Platform0Year271Genre0Publisher58NA_Sales0EU_Sales0JP_Sales0Other_Sales0Global_Sales0dtype:int64 查看数据类型 #查看数据类型df.info()<class'pandas.core.frame.DataFrame'>RangeIndex:16598entries,0to16597Data...
data/sales/items.csv:销售商品名称; data/sales/shops.csv:销售商店的名称; data/sales/test.csv:测试数据集,包含商品 ID 和商店 ID。 现在依次读取这五份数据集。 importpandasaspdimportwarnings warnings.filterwarnings("ignore") df_train = pd.read_csv('./sales/sales_train_v2.csv') df_categories =...
subtitle =f"{coffee_name}{int(coffee_data['sum'].sum())}"# 定义每个子图的子标题内容 axes[x,y].set_title(subtitle)# 制作子标题内容 axes[x,y].set_xlabel('month',size=16)# x轴标题设置 axes[x,y].set_ylabel('sum',size=16)# y轴标题设置 figure3.suptitle('coffee month sales',size...
train.csv中的数据中一共含有9列信息, 其中store为对应店铺的id序号; DayOfWeek代表着每周开店的天数;Data是对应销售额Sales产生的日期;Sales就是销售额的历史数据;Customers为进店的客人数量;Open则表示这个店铺是否开门与否;Promo表示商店是否在当天有促销活动;StateHoliday与SchoolHoliday分别表示了是否是国定假日或是学...
[-5:,:].sum().sort_values(ascending=False)FGE_near5=pd.DataFrame(data=FGE_near5,columns={'Genre_sales'})fig,(ax1,ax2)=plt.subplots(2,1,figsize=(12,6))sns.barplot(x=FGE.index,y='Genre_sales',data=FGE,ax=ax1)sns.barplot(x=FGE_near5.index,y='Genre_sales',data=FGE_near5,ax...
data=df_groupby_train_m, color='0.75') axes[2].set_title("Sales (groupby by month)", fontsize=20) # 基于线性回归 axes[2] = sns.regplot(x='time', y='mean', data=df_groupby_train_m, scatter_kws=dict(color='0.75'), line_kws={"color": "red"}, ...
"filename":"competitive-data-science-predict-future-sales.zip", }, "Montreal Bixi Bike Data": { "source":"kaggle", "name":"supercooler8/bixi-bike-montreal", "path":"bixi_bike_data", "filename":"bixi-bike-montreal.zip", },
#数据写入数据库sales=pd.read_csv("/Users/lizhongyao/Desktop/mysite/data/sales_train.csv")#日期格式转换sales.date=sales.date.apply(lambdax:datetime.datetime.strptime(x,'%d.%m.%Y'))sales.to_sql('sales',engine,index=False)item_cat=pd.read_csv("/Users/lizhongyao/Desktop/mysite/data/item...
# Merge data candidates_bestsellers = pd.merge(unique_transactions, bestsellers_previous_week, on="week") display(candidates_bestsellers) test_set_transactions = unique_transactions.drop_duplicates("customer_id").reset_index(drop=True) test_set_transactions["week"] = test_week display(test_set_...
• sales : 工作部门 • salary:薪资水平 有好多个部门, table(HR$salary) 工资水平是多少; 高的低中 high low medium 1237 7316 6446 且将low记为1,medium记为2,high记为3 > HR$salary[HR$salary=="low"]<-"1" > HR$salary[HR$salary=="medium"]<-"2" ...