Skip to content

xoxoxoxooxoxoxox/Frequent-Itemsets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Frequent-Itemsets

This folder file contains the Apriori algorithm and SON algorithm that uses to discover all frenquent itemsets.

Part I - Apriori:

[Input]
The input file is a single line of nested JSON array, within the array, each basket is represented as a JSON array of integers representing item numbers. A sample file is as follows:

[[1, 2], [1, 2, 3], [1, 3, 4], [2, 3, 4], [3, 4]] # 5 baskets

[Output]
Print out candidates and frequent itemsets in each pass, each per line as a SORTED list, until C(k) or L(k) is empty. An example is as follows:

C1: [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]]
L1: [[1], [2], [3], [4], [5], [6]]
C2: [[1, 2], [1, 3], [1, 4], [1, 5], [1, 6], [2, 3], [2, 4], [2, 5], [2, 6], [3, 4], [3, 5], [3, 6], [4, 5], [4, 6], [5, 6]]
L2: [[1, 2], [1, 3], [2, 3], [2, 6], [3, 5], [4, 5]]
C3: [[1, 2, 3]]
L3: []

Part II - SON Algorithm:

[Input]
Input of this phase include two parts, one is chunks of baskets the same as phase 1 input, the other is the candidates found in phase 1.

[Output]
One line per each global frequent itemset and its global count, in the following format:
[[1, 2], 18]
[[1, 3], 20]
[[1], 31]
[[2], 22]
[[3], 22]
[[4], 12]
[[6], 15]
[[2, 3], 13]

About

Frequent Itemsets Algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages