Datasets for Itemset, Sequence and Tree Mining
Description
There are three different datasets included, that can be used for itemset, sequence and tree mining methods.
dense_db.zip
contains various real itemset datasets like chess, connect, mushroom, pumsb, T10I4D100K, T40I10D100K and so on, used in the papers on frequent, closed and maximal itemset mining. For example, Mohammed J. Zaki and Ching-Jui Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462–478, April 2005. doi:10.1109/69.846291. Or Karam Gouda and Mohammed J. Zaki. Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223–242, November 2005. doi:10.1007/s10618-005-0002-x.
plandata.zip:
Planning dataset for sequence mining. It was used in the paper Mohammed J. Zaki, Neal Lesh, and Mitsunori Ogihara. PLANMINE: predicting plan failures using sequence mining. Artificial Intelligence Review, 14(6):421–446, December 2000. Special issue on Applications of Data Mining. doi:https://doi.org/10.1023/A:1006612804250.
cslogs.zip:
The CSLOGS data was used for tree mining, e.g., in Mohammed J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):1021–1035, August 2005. Special issue on Mining Biological Data. doi:10.1109/TKDE.2005.125.
Files
cslogs.zip
Additional details
References
- Mohammed J. Zaki and Ching-Jui Hsiao. Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Transactions on Knowledge and Data Engineering, 17(4):462–478, April 2005. doi:10.1109/69.846291.
- Karam Gouda and Mohammed J. Zaki. Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery: An International Journal, 11(3):223–242, November 2005. doi:10.1007/s10618-005-0002-x.
- Mohammed J. Zaki. Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 17(8):1021–1035, August 2005. Special issue on Mining Biological Data. doi:10.1109/TKDE.2005.125.
- Mohammed J. Zaki, Neal Lesh, and Mitsunori Ogihara. PLANMINE: predicting plan failures using sequence mining. Artificial Intelligence Review, 14(6):421–446, December 2000. Special issue on Applications of Data Mining. doi:https://doi.org/10.1023/A:1006612804250.