Call For Papers: Special Issue on Causal and Explainable AI. Submission deadline 30 Apr 2024.
> # rules with rhs containing "Survived" only
> rules <- apriori(titanic.raw,
+ parameter = list(minlen=2, supp=0.005, conf=0.8),
+ appearance = list(rhs=c("Survived=No", "Survived=Yes"),
+ default="lhs"),
+ control = list(verbose=F))
> rules.sorted <- sort(rules, by="lift")
> inspect(rules.sorted)
> library(arules)
> # find association rules with default settings
> rules <- apriori(titanic.raw)
> inspect(rules)
lhs rhs support confidence lift
1 {} => {Age=Adult} 0.9504771 0.9504771 1.0000000
2 {Class=2nd} => {Age=Adult} 0.1185825 0.9157895 0.9635051
3 {Class=1st} => {Age=Adult} 0.1449341 0.9815385 1.0326798
4 {Sex=Female} => {Age=Adult} 0.1930940 0.9042553 0.9513700
5 {Class=3rd} => {Age=Adult} 0.2848705 0.8881020 0.9343750
6 {Survived=Yes} => {Age=Adult} 0.2971377 0.9198312 0.9677574
7 {Class=Crew} => {Sex=Male} 0.3916402 0.9740113 1.2384742
...
We then set rhs=c("Survived=No", "Survived=Yes") in appearance to make sure that only "Survived=No" and "Survived=Yes" will appear in the rhs of rules.
> str(titanic.raw)
'data.frame': 2201 obs. of 4 variables:
$ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 3 3 3 3 ...
$ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
$ Age : Factor w/ 2 levels "Adult","Child": 2 2 2 2 2 2 2 2 2 2 ...
$ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
This page shows an example of association rule mining with R. It demonstrates association rule mining, pruning redundant rules and visualizing association rules.
The Titanic dataset is used in this example, which can be downloaded as "titanic.raw.rdata" at the Data page.
In the above result, rule 2 provides no extra knowledge in addition to rule 1, since rules 1 tells us that all 2nd-class children survived. Generally speaking, when a rule (such as rule 2) is a super rule of another rule (such as rule 1) and the former has the same or a lower lift, the former rule (rule 2) is considered to be redundant. Below we prune redundant rules.
> # find redundant rules
> subset.matrix <- is.subset(rules.sorted, rules.sorted)
> subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA
> redundant <- colSums(subset.matrix, na.rm=T) >= 1
> which(redundant)
[1] 2 4 7 8
> # remove redundant rules
> rules.pruned <- rules.sorted[!redundant]
> inspect(rules.pruned)
> plot(rules, method="graph", control=list(type="items"))
Package arulesViz supports visualization of association rules with scatter plot, balloon plot, graph, parallel coordinates plot, etc.
> library(arulesViz)
> plot(rules)
> plot(rules, method="paracoord", control=list(reorder=TRUE))