《Data Mining》学习——FP-growth算法 对Apriori算法的改进 Apriori算法在挖掘事物关联规则有一定的弊端,也就是在数据量过大,而最小支持度阈值却很低的情况下,Apriori算法对事物数据库的遍历,尤其在编程过程中对组合步骤中,嵌套了过多的循环,导致挖掘效率低下。对此,做出改进的经典算法之一是FP-growth算法。 FP...
FP-growth算法在《Han et al., Mining frequent patterns without candidate generation》一文中进行了描述,其中“FP”代表频繁模式。给定一个交易数据集,FP-growth的第一步是计算项的频率并确定频繁项。与Apriori类似的算法不同,FP-growth的第二步使用后缀树(FP-tree)结构来编码事务,而不需要显式生成候选集,这种...
use another algorithm, for example FP Growth, which is more scalable. Seethis blogfor some details on Apriori vs. FP Growth. Or do both of the above points by using FPGrowth in Spark MLlib on a cluster. And the nice thing is: you can stay in your familiar R Studio environment!
Algorithm 1: TD-FP-Growth Input: a transaction database, with items in each transaction sorted in the lexicographic order, a minimum support: minsup. Output: frequent patterns above the minimum support. Method: build the FP-tree; then call mine-tree ( ? , H); Procedure mine-tree(X, H)...