Published March 12, 2017 | Version v1
Journal article Open

EVOLUTIONARY FEATURE SELECTION FOR BIG DATA PROCESSING USING MAPREDUCE AND APSO

  • 1. PG Scholar, Shree Venkateshwara Hi-Tech Engineering College, Gobi, Tamilnadu
  • 2. Associate Professor, Department of Computer Science and Engineering, Shree Venkateshwara Hi-Tech Engineering College, Gobi, Tamilnadu
  • 3. Professor & Head, Department of Computer Science and Engineering, Shree Venkateshwara Hi-Tech Engineering College, Gobi, Tamilnadu

Description

Big Data -A, an acceleration framework that optimizes Big Data with plug-in components for fast data movement, overcoming the existing limitations. A novel network-levitated merge algorithm is introduced to merge data without repetition and disk access. In addition, a full pipeline is designed to overlap the shuffle, merge, and reduce phases. Our experimental results show that Big Data -A significantly speeds up data movement in Map Reduce and doubles the throughput of Big Data. In addition, Big Data -A significantly reduces disk accesses caused by intermediate data. In this paper, we propose, APSO, a distributed frequent sub graph mining method over Map Reduce. Given a graph database, and a minimum support threshold, APSO generates a complete set of frequent sub graphs. To overcome the dependency among the states of a mining process, APSO runs in an iterative fashion, where the output from the reducers of iteration i−1 is used as an input for the mappers in the iteration i. The mappers of iteration i generate candidate sub graphs of size i (number of edge), and also compute the local support of the candidate pattern. The reducers of iteration i then find the true frequent sub graphs (of size i) by aggregating their local supports. They also write the data in disk that are processed in subsequent iterations.

Files

GLCMC-007.pdf

Files (425.4 kB)

Name Size Download all
md5:2c16cd71f8fffed6fc7523becfd25ed8
425.4 kB Preview Download

Additional details

References

  • 1. Anusuya.D, Senthil Kumar.R, Senthil Prakash.T, Manimozhi.N,( July - December 2016) 'Novel Feature Selection for BigData Processing using MapReduce and APSO' Volume 8, (P) 108-110. 2. Aggarwal, Charu C., (2007) 'Data streams: models and algorithms' Vol. 31.Springer. 3. Arinto Murdopo, (July 2013) 'Distributed Decision Tree Learning for Mining Big Data Streams', Master of Science Thesis, European Master in Distributed Computing. 4. S. Fong, X.S. Yang, S. Deb, (Dec. 2013) 'Swarm Search for Feature Selection in Classi-fication', The 2nd International Conference on Big Data Science and En-gineering (BDSE 2013), 2013, 3-5. 5. Mohamed Medhat Gaber, Arkady Zaslavsky, Shonali Krishnaswamy, (June 2005) 'Mining data streams: a review', ACM SIGMOD Record, Volume 34 Issue 2, pp.18-26. 6. Quinlan, J. R., (1993) 'C4.5: Programs for Machine Learning' Morgan Kauf-mann Publishers. 7. Rokach, Lior, and OdedMaimon, (2005) 'Top-down induction of decision trees classifiers-a survey Systems, Man, and Cybernetics', Part C: Ap-plications and Reviews, IEEE Transactions on 35, no. 4: 476-487. 8. Wei Fan, Albert Bifet, (April 2013) 'Mining Big Data: Current Status, and Forecast to the Future', SIGKDD Explorations, Volume 14, Issue 2, pp.1-5.