This page shows an example of association rule mining with R. It demonstrates association rule mining, pruning redundant rules and visualizing association rules. The Titanic DatasetThe Titanic dataset is used in this example, which can be downloaded as "titanic.raw.rdata" at the Data page.> str(titanic.raw)'data.frame': 2201 obs. of 4 variables:$ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3 3 3 3 3 3 3 3 ...$ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...$ Age : Factor w/ 2 levels "Adult","Child": 2 2 2 2 2 2 2 2 2 2 ...$ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...Association Rule Mining> library(arules)> # find association rules with default settings> rules <- apriori(titanic.raw)> inspect(rules) lhs rhs support confidence lift1 {} => {Age=Adult} 0.9504771 0.9504771 1.00000002 {Class=2nd} => {Age=Adult} 0.1185825 0.9157895 0.96350513 {Class=1st} => {Age=Adult} 0.1449341 0.9815385 1.03267984 {Sex=Female} => {Age=Adult} 0.1930940 0.9042553 0.95137005 {Class=3rd} => {Age=Adult} 0.2848705 0.8881020 0.93437506 {Survived=Yes} => {Age=Adult} 0.2971377 0.9198312 0.96775747 {Class=Crew} => {Sex=Male} 0.3916402 0.9740113 1.2384742...We then set rhs=c("Survived=No", "Survived=Yes") in appearance to make sure that only "Survived=No" and "Survived=Yes" will appear in the rhs of rules.> # rules with rhs containing "Survived" only> rules <- apriori(titanic.raw, + parameter = list(minlen=2, supp=0.005, conf=0.8), + appearance = list(rhs=c("Survived=No", "Survived=Yes"), + default="lhs"), + control = list(verbose=F))> rules.sorted <- sort(rules, by="lift")> inspect(rules.sorted) Pruning Redundant RulesIn the above result, rule 2 provides no extra knowledge in addition to rule 1, since rules 1 tells us that all 2nd-class children survived. Generally speaking, when a rule (such as rule 2) is a super rule of another rule (such as rule 1) and the former has the same or a lower lift, the former rule (rule 2) is considered to be redundant. Below we prune redundant rules.> # find redundant rules> subset.matrix <- is.subset(rules.sorted, rules.sorted)> subset.matrix[lower.tri(subset.matrix, diag=T)] <- NA> redundant <- colSums(subset.matrix, na.rm=T) >= 1> which(redundant)[1] 2 4 7 8> # remove redundant rules> rules.pruned <- rules.sorted[!redundant]> inspect(rules.pruned)Visualizing Association RulesPackage arulesViz supports visualization of association rules with scatter plot, balloon plot, graph, parallel coordinates plot, etc. > library(arulesViz)> plot(rules)> plot(rules, method="graph", control=list(type="items"))> plot(rules, method="paracoord", control=list(reorder=TRUE)) |




