Title | Signature-based Tree for Finding Frequent Itemsets |
Publication Type | Journal Article |
Year of Publication | 2023 |
Authors | Benelhadj, MEH |
Secondary Authors | Deye, MM |
Tertiary Authors | Slimani, Y |
Journal | JOURNAL OF COMMUNICATIONS SOFTWARE AND SYSTEMS |
Volume | 9 |
Issue | 1 |
Start Page | 70 |
Pagination | 70-80 |
Date Published | MARCH 2023 |
Keywords | Data compression, Data mining, Data storage, Signature., Tree structure |
Abstract | The efficiency of a data mining process depends on the data structure used to find frequent itemsets. Two approaches are possible: use the original transaction dataset or transform it into another more compact structure. Many algorithms use trees as compact structure, like FP-Tree and the associated algorithm FP-Growth. Although this structure reduces the number of scans (only 2), its efficiency depends on two criteria: (i) the size of the support (small or large); (ii) the type of transaction dataset (sparse or dense). But these two criteria can generate very large trees. In this paper, we propose a new tree-based structure that emphasizes on transactions and not on itemsets. Hence, we avoid the problem of support values that have a negative impact on the generated tree. |
DOI | 10.24138/jcomss-2022-0065 |
Full Text | The efficiency of a data mining process depends on the data structure used to find frequent itemsets. Two approaches are possible: use the original transaction dataset or transform it into another more compact structure. Many algorithms use trees as compact structure, like FP-Tree and the associated algorithm FP-Growth. Although this structure reduces the number of scans (only 2), its efficiency depends on two criteria: (i) the size of the support (small or large); (ii) the type of transaction dataset (sparse or dense). But these two criteria can generate very large trees. In this paper, we propose a new tree-based structure that emphasizes on transactions and not on itemsets. Hence, we avoid the problem of support values that have a negative impact on the generated tree. |