一般我们可以用如下代码使用FPGrowth
import org
.apache
.spark
.ml
.fpm
.FPGrowth
val dataset
= spark
.createDataset(Seq(
"1 2 5",
"1 2 3 5",
"1 2")
).map(t
=> t
.split(" ")).toDF("items")
val fpgrowth
= new FPGrowth().setItemsCol("items").setMinSupport(0.5).setMinConfidence(0.6)
val model
= fpgrowth
.fit(dataset
)
model
.freqItemsets
.show()
model
.associationRules
.show()
model
.transform(dataset
).show()
其中 FPGrowth.FreqItemset的返回类型为DataFrame("items"[Array], "freq"[Long]) 如果要对其进步一处理,可以先使用如下代码进行转换
import scala
.collection
.mutatble
model
.freqItemsets
.map
{row
=>
(row
.getAs
[mutable
.WrappedArray
[String
]](0).toSet
,row
.getLong(1))
}
转载请注明原文地址: https://yun.8miu.com/read-25018.html