Published May 4, 2020 | Version v1
Dataset Open

Datasets for Itemset, Sequence and Tree Mining

Authors/Creators

  • 1. RPI

Description

There are three different datasets included, that can be used for itemset, sequence and tree mining methods.

dense_db.zip

contains various real itemset datasets like chess, connect, mushroom, pumsb, T10I4D100K, T40I10D100K and so on, used in the papers on frequent, closed and maximal itemset mining. For example, Mohammed J. Zaki and Ching-Jui Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462–478, April 2005. doi:10.1109/69.846291. Or Karam Gouda and Mohammed J. Zaki. Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223–242, November 2005. doi:10.1007/s10618-005-0002-x.

 

plandata.zip

Planning dataset for sequence mining. It was used in the paper Mohammed J. Zaki, Neal Lesh, and Mitsunori Ogihara. PLANMINE: predicting plan failures using sequence mining. Artificial Intelligence Review, 14(6):421–446, December 2000. Special issue on Applications of Data Mining. doi:https://doi.org/10.1023/A:1006612804250.

 

cslogs.zip

The CSLOGS data was used for tree mining, e.g., in Mohammed J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):1021–1035, August 2005. Special issue on Mining Biological Data. doi:10.1109/TKDE.2005.125.

 

Files

cslogs.zip

Files (34.3 MB)

Name Size Download all
md5:2e5080ec31c155dd4aa5f281233028e8
1.1 MB Preview Download
md5:7d9045c4e533ffb1f51fa641029e943a
12.0 MB Preview Download
md5:9563db062017c5e8b3ee94deb9365bba
21.2 MB Preview Download

Additional details

References

  • Mohammed J. Zaki and Ching-Jui Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462–478, April 2005. doi:10.1109/69.846291.
  • Karam Gouda and Mohammed J. Zaki. Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223–242, November 2005. doi:10.1007/s10618-005-0002-x.
  • Mohammed J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):1021–1035, August 2005. Special issue on Mining Biological Data. doi:10.1109/TKDE.2005.125.
  • Mohammed J. Zaki, Neal Lesh, and Mitsunori Ogihara. PLANMINE: predicting plan failures using sequence mining. Artificial Intelligence Review, 14(6):421–446, December 2000. Special issue on Applications of Data Mining. doi:https://doi.org/10.1023/A:1006612804250.