无论何时进行任何聚合操作,都最好包含一个计数(n()),或计算非缺失值(sum(!is.na(x))),这样你就可以确认支持你的结论的数据基数。 例如,让我们看一下平均延迟最高的飞机(通过其尾号标识) > delays <- not_cancelled %>% + group_by(tailnum) %>% + summarise( + delay = mean(arr_delay) + ) ...
Batting %>% group_by(playerID) %>% summarise(total = sum(G)) %>% arrange(desc(total)) %>% head(5) 这样可以按进行数据处理时的思路写代码, 一步步深入, 既易写又易读, 接近于从左到右的自然语言顺序, 对比一下用R自带函数实现的: head(arrange(summarise(group_by(Batting, playerID), total ...
使用group_by和summarize函数,按照书名、作者和出版社进行聚合,计算出每本书的评论数和评分均值。 library(wordcloud) sales_by_book <- read.csv("bestsellers.csv") sales_by_book$Name <- tolower(sales_by_book$Name) sales_by_book <- sales_by_book %>% group_by(Name) %>% summarise(Sales = ...
group_by()grouped<-group_by(df,v1,V2)#data被v1,v2进行分组 newdata<-summarise(grouped,mean_age=mean(age),sum_sale=sum(sales)) 以上两段代码的效果是等同的!data.table把我们刚刚用group_by和summarise组合才能实现的功能,直接在一句代码里面就实现了,而且代码的可读性和可扩展运用性非常强! 以上讲的...
根据某个变量分别对两组数据进行描述行统计(用到group_by和summarize) 合并这两个描述性统计量 # Aggregate Millennium Falcon for the total quantity in each partmillennium_falcon_colors<-millennium_falcon%>%group_by(color_id)%>%summarize(total_quantity=sum(quantity))# Aggregate Star Destroyer for the ...
df22<-df11%>%mutate(Gender=as.factor(Gender))summ22<-summarise_all(df22["Gender"],funs(nlevels(.),nmiss=sum(is.na(.))) 23.按分类变量汇总数据 summ24 <- summarise_at(group_by(df, Class_2), vars(Minute), funs(n(), mean(., na.rm = T))) #...
1 summarise(group_by(dt,type),total=sum(dist)) 连接符 %>% 包里还新引进了一个操作符, 使用时把数据名作为开头, 然后依次对此数据进行多步操作. 比如: 1 2 3 4 5 Batting %>% group_by(playerID) %>% summarise(total = sum(G)) %>% arrange(desc(total)) %>% head(5) 这样可以按进行...
注意:是否也可以创建一个包含4行摘要的表?我认为Count = n()命令可以用于此? E.g. Group Number of Rows Perc a 20 0.6 b 20 0.7 c 50 0.9 d 10 0.24 或者一个总的摘要(即在整个表中,“diff”变量为1的行的百分比是多少?): d = sum(train_data$diff) / count(train_data$diff) Thanks...
# create two columns with the sum and length of TC in each group which you can use later # for average calculation summarize(new = n_distinct(PC1), n = n(), TC_sum = sum(TC)) %>% group_by(PN, GOT) %>% summarise(TOT_new = sum(new), meanTC = sum(TC_sum)/sum(n))#Sou...
使用mean+rowSums: