An Explainable Hybrid Deep Learning and Gradient Boosting Framework for Ad Click Fraud Detection
Authors/Creators
- 1. M.Tech Student, Department of Computer Science & Engineering, Impact College of Engineering and Applied Sciences, Bangalore, India. (Affiliated to VTU, Belagavi, India)
- 2. Assistant Professor, Department of Computer Science & Engineering, Impact College of Engineering and Applied Sciences, Bangalore, India. (Affiliated to VTU, Belagavi, India)
Description
Abstract
Regrettably, one of the biggest issues of the digital advertising world is click fraud. It is very expensive and renders online marketing statistics less dependable. Old methods of detection of fraud do not always detect new fraud patterns and sophisticated user actions. In this research Ad Click Fraud Detection Dataset on Kaggle is utilized. It has records on the way the users reacted to both real and fabricated ad hits. In the preprocessing stage, the unnecessary characteristics were discarded and class imbalance was corrected with the help of RUS and SMOTE. This was in order to ensure the quality of data and impartiality. Some of the ML and DL models that were developed and experimented with include CNN, DNN, RNN, LR, DT, RF, KNN, ANN, Gradient Boosting, LightGBM, XGBoost, NB and SVM. In order to make the predictions even more accurate, we employed LSTM, GRU, LSTM +GRU, and a Voting Classifier (XGBoost with Bagging DT). XAI (LIME and SHAP) was also used by us to simplify the results. The application of the learned models was also developed into a Flask-based web interface to predict ad click fraud in real time. Due to tests, the Voting Classifier achieved 100% accuracy, precision, recall, and F1-score, which is powerful and can be used as an effective method of detecting fake ad clicks.
Keywords: Click fraud, machine learning, deep learning, online-advertising, bot detection, pay-per click, fraud
1. Introduction
The online advertisement is a significant aspect of business marketing in the contemporary digital economy, as it enables enterprises to access the right individuals both on the computer and phones. Current systems of online marketing are founded on the PPC model where the advertisers pay each time a person clicks on one of their ads [1]. The more companies invest in digital advertisements, the more they will rely on user click statistics to determine the success of their campaign and the amount of money they are earning back. However, such reliance has also created massive security gaps particularly regarding click scams where the PPC systems are deceived to generate money [2].
One of the major issues that affect the digital advertising business is click fraud, a process where an individual clicks an advertisement without their authorization or with ill motives [3]. Such activities are normally performed by software robots, farm of clicks, or rivalry, which results in deformed interaction metrics and wasted cash [4]. The common methods of identifying fraud can be heuristic or statistical techniques and they cannot always catch up with the emerging forms of fraud [5]. VPNs, proxy servers, and distributed botnets allow fraudsters to conceal their names, posing as actual users, thus complicating the process of catching them [6]. Consequently, the gap in the research on designing models capable of altering and being cognized to discern faked clicks in a changing real world context continues to exist significantly [7].
The paper proposes an intelligent analytical approach to detecting ad click fraud with the help of user interaction with advertising data. This is aimed at making sure that legal and illegal activities are rightly differentiated and that the system can be understood and used in real time [8]. The proposed model is expected to produce a scalable, transparent, and data-driven detection system due to the employment of behavioural characteristics and contact patterns. XAI is another feature added to make things more understandable and make the prediction made by the model more understandable and appealing to the marketer [9].
Eradication of fraud and safeguarding advertisers funds, promotion of healthy competition are some of the things that this research contributes to the improvement of online advertisement. Ultimately, it establishes trust, reliability and sustainability in the online marketing space [10].
2. Related Work
Over the last several years, much research was conducted on the methods of ML and AI to identify fake clicks on online advertisements. Shaik and Kakulapati [11] developed a feature based ML strategy of detecting fake ad clicks, demonstrating that trained models have the capability to classify user click behavior accurately. Their study was however limited by small data and failure to keep up with changing fraud trends. The traditional classifiers were also employed to detect click fraud online by Aljabri and Mohammad [12], as data-based approaches are superior to rule-based ones. Although they achieved some good results, their work was more about accuracy than being able to be interpreted and scaled right away which are significant in the application in the real world in advertising networks.
Sisodia and Sisodia [13] developed a generalization stack architect on predicting publisher behavior where the datasets are highly skewed. The trick behind this framework was to use more than one base learner in order to achieve better classification. Nevertheless, the algorithm consumed a significant amount of computational resources and it was not clear what features were most significant. This work was extended by Sisodia, Sisodia, and Singh [14], who focused on the significance of important features in the process of bad publisher identification and highlighted the usefulness of feature selection in fraud detection. Instead, their research merely examined fixed data sets and did not examine how to counter click fraud mechanisms that always evolve. Alzahrani and Aljabri [15] have conducted an entire research on the AI-based approaches to detecting ad click fraud. They highlighted XAI and ensemble learning as the fields that require further research in the future. They found in their review that they were in a recurring issue on balancing between producing very accurate models and models that are easy to understand and apply to real-life applications.
The scam detection systems have become more powerful with the help of DL techniques. In PPC campaigns, Batool and Byun [16] proposed an ensemble DL system to locate click frauds. This architecture was closer to the truth and it was less susceptible to being misled by malevolent user actions, however it was harder to execute in real time since it had so many complications. Sisodia and Sisodia [17] developed a model based on the K-nearest neighbor using quad divide prototype selection to recognize an uneven dataset. The model minimizes the computational burnt out yet sensitive to noises and fraud tendencies that are not apparent. Similarly, Chari et al. [18] considered several ML algorithms applying behavioural and feature engineering methods. They achieved good results and when using large datasets problems of scalability and overfitting occurred.
Also besides the work of ads research, large studies of fraud detection have provided us with valuable information. Dekou et al. [19] considered ML approaches to identifying fraud in online marketplaces, with the domain adaptation and generalizability to discover the various forms of fraud in platforms. Also Sisodia and Sisodia [20] applied gradient boosting to identify fake publishers, but this was more precise compared to the conventional classifiers but had issues with timing data and interpretation of the model.
3. Materials And Methods
The idea behind the proposed system is to generate a scalable, intelligent system of detecting fake ad clicks that is intelligible and precise. The approach involves pre-processing the Ad Click Fraud Detection Dataset on Kaggle that contains annotated records of user interaction and feature optimization, feature encoding, and normalization to ensure the quality of inputs to be used to train the model. Data mismatch is corrected with the help of RUS and SMOTE to produce more general results. Several various ML and DL models are applied, and to perform the predictions even more precise, ensemble learning is applied with a Voting Classifier that includes XGBoost, Bagging, and DT. The advanced designs of LSTM, GRU and hybrid LSTM +GRU networks are applied to understand the changes in user behavior over time. XAI techniques such as LIME and SHAP are also designed to enable the model to be more understandable and a Flask-based deployment interface allows fraud to be detected in real time. This comprehensive approach ensures that this ad click fraud detection is more scalable, reliable, and powerful.
Fig. 1. System Architecture
Figure 1 demonstrates the system design, which indicates a full workflow of the ad click fraud work finding. The first step is to obtain the AD Click Fraud data. Secondly is data
pre-processing which involves cleaning, removal of nulls, removal of duplicates and label encoding. Correlation visualization and analysis of data help us understand the trends, and the use of such techniques as SMOTE ensures the even distribution of the data. The data has been divided into a train and test set. A model is built then by using ML and DL. Plain models are tested, graded with metrics, saved, and deployed with the use of Flask. This allows XAI to describe real-time fraud with LIME and SHAP.
-
Dataset Collection
The data set of the present research is the Ad Click Fraud Detection Dataset of the Kaggle webpage. This data consists of 5,000 entries of user interaction obtained in various forms of online ads. The dataset contains 21 features as well as categorical, numerical, and time-related, such as device type, browser, duration of clicking, and behavioral indicators. The is fraudulent column decides whether it is true or false. This data includes diverse realistic user actions and a skewed class distribution that reflects the occurrence of fraud in the real world. It is a highly diverse and versatile model of user-machine interactions and time series, and thus can be used to test sophisticated ML and DL models to detect counterfeit ad clicks.
Fig. 2. Ad Click Fraud Dataset
-
Pre-Processing
In the data preparation phase, cleaning, encoding, balancing, correlation analysis and split are some of the steps that are undertaken in a planned manner to ensure the data quality and model reliability is upheld. This would enable proper and equitable training of the fraud detector model.
-
Data Cleaning: The step entailed identifying and eliminating values of null and duplications of records in a preconceived manner in order to ensure that the data was complete and reliable. Through this process, the only valid and unique data instances were retained to study. Pretreatment of the data enhances the quality of the overall data, reduces the likelihood of spurious trends, and prevents potential weaknesses in learning the model. The step maintains the purity of the data, on which the preprocessing will be reliable and the evaluation of the models will be accurate.
-
Categorical Data Encoding: Categorical features which symbolized written or qualitative data were converted to numbers in order to simplify training a ML model. It is a conversion that ensures that programs can be able to correctly interpret and consume categorical variables without being biased by non-numeric data. Encoding allows the encoding of various attributes into one space of features. This not only makes the process of computation quicker, but also assists models to discover valuable patterns across nominal dimensions of the data.
-
Class Balancing (Random Under-Sampling): Random Under-Sampling was employed in order to correct the issue of a discrepancy in class between real and fake clicks. This approach equalizes the number of the majority with the minority class, resulting in no bias of the models to strong categories. The balance of the information ensures equal learning of both classes, and increases the predictability and stability of the predictions. It is quite a significant contribution to more accurate, recallable, and generalizable scam detection models particularly when skewed real-world data is involved.
-
Feature Correlation Analysis: To examine the relationship between the numerical aspects of each other, the use of feature correlation analysis was directed at identifying the potential of a multicollinearity. The step can assist in determining the most significant variables to the prediction job and the dependencies between features. By extracting the patterns of correlation, the traits that are not necessary or strongly correlated will be eliminated and this will result in a simplified model that is easier to comprehend. Such analysis enhances the reliability of the model, reduces the chances of overfitting, and ensures that the most practical attributes are brought out in the course of training.
-
Training and Testing
In order to simplify the process of testing and checking the model, the balanced dataset had been divided into training and testing groups. Much attention was paid to the testing set that allowed observing the extent to which the model learned using the training set and was applied to new data. Their separation in this manner prevents overfitting, and also ensures that the evaluation measures are a proper representation of the ability of the model to predict the real world. This will ensure that the trial results maintain validity because it will maintain the right train-test ratio and also it will enable the fair comparisons of the various detection models.
-
Algorithms
-
logistic equation to estimate the possibility of various events. It can compute fast and be easily comprehended as well as be statistically sound. These properties render it applicable to benchmarking and determining how features influence the scam detection activities [22].
DT is a checked learning strategy that divides the figures more than once and more with respect to the feature values to create comprehensible decision rules. The way it is assembled is obvious, which is why it is not difficult to examine decision paths and the most significant characteristics. This assists in classification which does make sense in both numeric and qualitative properties [24].
RF is a majority voting based ensemble learning algorithm which employs a combination of multiple DT to make predictions more stable. It prevents overfitting of the model, manages noises, and identifies nonlinear associations and provides the correct measures of feature importance to be easily interpreted [27].
KNN is an algorithm which classifies instances into sets with the majority class of the most similar instances in the feature space. It is able to manage complex data distributions; detect minute variations in the behavior of fraud patterns because it learns in a distance and achieves this [21].
ANN has multiple layers of neurons which are connected and learns nonlinear features relations through weights optimization. Since it has the ability to represent complex relationships and extract the latent representations, it is more precise in classification and useful in massive, divergent datasets.
Naïve Bayes, assuming that traits are conditionally independent, is a probabilistic classifier founded on the Bayes theorem. It is very easy, however, and can scale well to a large amount of data, classifying quickly, in scale, and reliably, as well as with reasonable baseline performance [23].
Gradient Boosting makes a good student out of the weak models one after another that correct their errors, which the preceding models made. This monotonic accuracy enhancement identifies the complex feature interactions, enhances the generalization and reduces overfitting through controlled learning updates [25].
LightGBM executes gradient boosting in a manner that is very fast and effortless and was created to be scaled. It processes large datasets in a short period of time and precisely and minimizes training time and leaf-wise tree development and histogram computation.
XGBoost is a graded boosting algorithm that employs regularization, parallel processing, and shrinkage to ensure that it is faster. It is very effective with large classification tasks as it is powerful, can be scaled and it can make more accurate predictions [28].
SVM optimizes the hyperplane which maximizes the margins between the classes. It is effective with nonlinear boundaries and it generalizes well through the use of kernel functions. This is suitable in cases of binary classification tasks that have few or skewed data.
CNN automatically extracts hierarchical and spatial information of structured data by convolution and pooling layers. It can learn powerful representations, which increase the reliability of classification, reduces sensitivity to noise, and improves the performance of anomaly detection [29].
DNN introduces additional layers to ANN designs in order to be able to learn higher-order feature abstractions. It is capable of representing complex relationships with a high degree of accuracy because of its deep hierarchical representations. This improves the precision of predictions and generalization with large sets of data. RNN permits the establishment of feedback loops which allow you to learn both sequential and temporal data. It is good at modelling dynamic trends and better predicting behavior in time dependent tasks in behaviour analysis by maintaining contextual information over time steps. LSTM is a state-of-the-art form of RNN which employs memory cells and gating mechanisms to solve vanishing gradient problems. It is effective in long-term time series relations, thus, making sequential and time-series prediction tasks more predictable and precise to forecast their outcomes [26].
GRU is a revised recurrent design which regulates the passage of time information using update and reset gates. It offers a good tradeoff between learning and working and has higher convergence speed and successful sequence modeling with reduced computational complexity. The hybrid LSTM–GRU The model has long-term memory retention and speed of computation. It considers both the short-term and long-term time correlations through the basis of complementary gating mechanisms. This results in increased performance of sequence learning and improved generalization.
The Voting Classifier combines XGBoost and Bagging using DT to exploit the best of the two. It reduces bias and randomness in the form of majority vote, which provides consistent, reliable, and overarching fraud detection capabilities.
-
Integration of XAI and Flask Framework
The ad click fraud detection system can be explained with the added Explainable XAI techniques such as LIME and SHAP, which allows one to see the model predictions in a clearer way. Using a waterfall chart, LIME allows providing local explanations of the influence of various features on certain predictions. SHAP on the other hand provides a global perspective as it demonstrates the significance of features to both fraudulent and legal classes. These approaches combined assist users and analysts in understanding the reasoning behind decision making which enhances confidence and responsibility in the results of models.
The Flask framework is also used to do real-time deployment and create an interactive web interface that allows adding new data and provides the results of fraud detection immediately, as well as explanations provided by XAI. The system is applicable in real life experiences, where online advertising keeps on varying due to this combination which makes it scalable, transparent, and readily accessible.
4. Experimental Results
-
Accuracy: The degree to which a test can distinguish between unwell and healthy individuals is referred to as its accuracy. In order to obtain an idea of the accuracy of a test we want to determine the percentage of cases which are true positives and true negatives. Mathematically this can be expressed as:
-
Precision: Precision is the ratio of the number of cases or samples correctly classified to that which were correctly classified as positives. The way to determine the precision is, then, the following:
-
Recall: Recall, which is a measure used in ML, is the ability of a model to locate all the important examples of a particular class. It demonstrates the degree to which a model represents the example of a particular class. It is estimated by dividing the correct predictions of positive observations with the real positives.
-
F1-Score: F1 score is a method of quantifying the accuracy of a ML solution. It sums the accuracy and recalls scores of a model. The accuracy measure gives the number of times, in the entire dataset, a model made a correct guess.
Table 1: Performance Evaluation
|
ML Model |
Accuracy |
Precision |
Recall |
F1-Score |
|
CNN |
0.9839 |
0.9837 |
0.9846 |
0.9839 |
|
DNN |
0.9799 |
0.9797 |
0.9805 |
0.9799 |
|
RNN |
0.9678 |
0.9685 |
0.9691 |
0.9678 |
|
Logistic Regression |
0.9738 |
0.9741 |
0.9749 |
0.9738 |
|
DT |
0.9638 |
0.9637 |
0.9637 |
0.9637 |
|
RF |
0.6781 |
0.6875 |
0.6821 |
0.6766 |
|
KNN |
0.5272 |
0.5269 |
0.5269 |
0.5268 |
|
ANN |
0.9336 |
0.9335 |
0.9342 |
0.9336 |
|
Gradient Boosting |
0.8632 |
0.8889 |
0.8687 |
0.8619 |
|
LightGBM |
0.9598 |
0.9612 |
0.9614 |
0.9598 |
|
XGBoost |
0.8652 |
0.8902 |
0.8707 |
0.8640 |
|
NB |
0.9718 |
0.9722 |
0.9730 |
0.9718 |
|
SVM |
0.4809 |
0.4877 |
0.4900 |
0.4603 |
|
LSTM |
0.9953 |
0.9952 |
0.9955 |
0.9953 |
|
GRU |
0.9927 |
0.9928 |
0.9926 |
0.9927 |
|
LSTM + GRU |
0.9940 |
0.9941 |
0.9939 |
0.9940 |
|
Voting Classifier |
1.0000 |
1.0000 |
1.0000 |
1.0000 |
Table.1 compares the performance of models with the accuracy, precision, recall, and the F1-score. It demonstrates that the Voting Classifier achieved the highest scores, being more successful than any other ML and DL models.
Fig.3: Comparison Graph
As the Fig.3 comparison graph shows, the Voting Classifier is superior to all of them in accuracy, precision, recall and F1-score. The LSTM and LSTM + GRU structures rank second and third respectively.
Fig.4: Click on Ad
Fig.4 has the input interface with a live fraud detection demo that allows users to select and test various real-time fraud detection applications.
Fig.5: Predicted Result
Figure 5 provides the forecast output in the form of "Fraud Detected" indicating that behavior analysis identified high risk suspicious activity
Fig.6: Click on Ad
Figure 6 demonstrates the input screen that allows the user to select among the various fraud detection tools that include fitness, tech startups, and automotive analysis.
Fig.7: Predicted Result
Legitimate Activity is the result of the prediction that appears on the output interface of Fig.7. This implies that the individual is behaving normally and has no indications of fraud.
Conclusion
Ultimately, this paper aimed to apply advanced DL and ML techniques to develop an intelligent and trustworthy platform to detect ad click fraud in online advertising. The system relied on the Ad Click Fraud Detection Dataset that has been used by Kaggle and consists of user interaction data that displays the real and fake ad hits. Thorough preprocessing included RUS and the SMOTE to ensure that the data that would be fed into the model was of high quality and fair. Many various algorithms have been taken and tested with different standard classification metrics. Such algorithms involved not only traditional classifiers such as LR, DT, and RF but also the DL models such as CNN, DNN, and RNN. To achieve improved future prediction, hybrid and ensemble models, such as LSTM, GRU, LSTM+GRU, and Voting Classifier which integrates XGBoost, Bagging and Decision Tree, were designed. The Voting Classifier performed better than all standard models with precision rate, recall rate and F1-score of 100 percent accuracy rate. It was also easier to understand and use, with XAI (LIME and SHAP) methods, and a Flask-based web deployment. In general, the proposed structure demonstrates a powerful, distinct, and implementable method of decreasing ad click fraud within real-life online advertising environments. The system can further be optimized by fusing real-time data streams of the various ad platforms to make it more flexible. This will simplify the aspect of identifying shifting fraud trends. The inclusion of more advanced forms of feature engineering and reinforcement learning might improve decisions further and scale the system, as well. The inclusion of cross-platform behavioral analytics and unsupervised anomaly identification in the model would help to make it more general with a wider range of datasets. Application of the framework in the cloud or on the edge might also assist in the detection of fraud on a huge scale, and latency. This would enable round the clock monitoring and automated response mechanisms that would streamline the operations of digital advertising ecosystems.
References
-
Abbas, Z. A., Hilal, Z. M., & Jabbar, H. G. (2025). Click Fraud Detection in Online Advertising: A Comparative Study of Machine Learning Models. International Journal of Safety & Security Engineering, 15(3).
-
Singh, B., Dutta, P. K., & Kaunert, C. (2025). Deep Diving into Finan cial Frauds via Ad Click, Credit Card Management and Document Di spensation in E‐Commerce Transactions. Generative Artificial Intellig ence in Finance: Large Language Models, Interfaces, and Industry Use Cases to Transform Accounting and Finance Processes, 99-123.
-
Fernando, C., Walgampaya, C., & Alawatugoda, J. (2025). C2IDTL: Novel Click to Image Conversion Approach for Deep Transfer Learning in Click Fraud Detection on Digital Platforms. IEEE Access.
-
Ma, D., & Wan, F. (2025). Research on Intelligent Recognition of Ad Click Fraud Based on Deep FM Heterogeneous Integration Model. International Journal of High Speed Electronics and Systems, 2540726.
-
Subburayan, B., Winster, D., Dhanalakshmi, K., & Rajkumar, R. (2025). Combating Evolving Threats: A Systematic Review of Online Ad Fraud Detection. Available at SSRN 5263103.
-
Juniper Research, “Quantifying the Cost of Ad Fraud: 2023–2028,” Hampshire, U.K., 2024. [Online].Available: https://fraudblocker.com/wp-content/uploads/2023/09/Ad-Fraud- Whitepaper_Juniper-Research.pdf
-
Wasted Ad Spend Report 2024, 2024. [Online]. Available: https://lp.lunio.ai/wp-content/uploads/2023/09/Lunio_Wasted_Ad_Spend_Report_2024_V 2.pdf
-
B. Kirkwood, M. Vanamala, and N. Seliya, “Click fraud detection of online advertising using machine learning algorithms,” Proc. IEEE Int. Conf. Electro Inf. Technol. (eIT), May 2024, pp. 586–590.
-
A. Purwar, A. K. Jain, I. Chawla, I. Gupta, M. Raj, and D. Jain, “Click fraud detection using ensemble classifier,” Proc. Int. Conf. Artif.-Bus. Anal., Quantum Mach. Learn., Jan. 2024, pp. 15–23.
-
L. Singh, D. Sisodia, K. Shashvat, A. Kaur, and P. C. Sharma, “A reliable click-fraud detection system for the investigation of fraudulent publishers in online advertising,” in Applied Intelligence in Human-Computer Interaction, Boca Raton, FL, USA: CRC Press, Jul. 2023.
-
S. Shaik and V. Kakulapati, “Fraud detection of AD clicks using machine learning techniques,” J. Sci. Res. Rep., vol. 29, no. 7, pp. 84–89, Jun. 2023.
-
M. Aljabri and R. M. A. Mohammad, “Click fraud detection for online advertising using machine learning,” Egyptian Informat. J., vol. 24, no. 2, pp. 341–350, Jul. 2023, doi: 10.1016/j.eij.2023.05.006.
-
D. Sisodia and D. S. Sisodia, “Stacked generalization architecture for predicting publisher behaviour from highly imbalanced user-click data set for click fraud detection,” New Gener. Comput., vol. 41, no. 3, pp. 581–606, Sep. 2023, doi: 10.1007/s00354-023-00218-1.
-
D. Sisodia, D. S. Sisodia, and D. Singh, “Evaluating feature importance to investigate publishers conduct for detecting click fraud,” in Machine Intelligence Techniques for Data Analysis and Signal Processing (Lecture Notes in Electrical Engineering), vol. 997. Berlin, Germany: Springer, 2023, pp. 515–524, doi: 10.1007/978-981-99-0085-5_42.
-
R. A. Alzahrani and M. Aljabri, “AI-based techniques for ad click fraud detection and prevention: Review and research directions,” J. Sensor Actuator Netw., vol. 12, no. 1, p. 4, Dec. 2022, doi: 10.3390/jsan12010004.
-
A. Batool and Y.-C. Byun, “An ensemble architecture based on deep learning model for click fraud detection in Pay-Per-Click advertisement campaign,” IEEE Access, vol. 10, pp. 113410–113426, 2022, doi: 10.1109/ACCESS.2022.3211528.
-
D. Sisodia and D. S. Sisodia, “Quad division prototype selection-based K-nearest neighbor classifier for click fraud detection from highly skewed user click dataset,” Eng. Sci. Technol., Int. J., vol. 28, Apr. 2022, Art. no. 101011, doi: 10.1016/j.jestch.2021.05.015.
-
H. Chari, S. Aswale, V. N. Pawar, P. Shetgaonkar, and K. M. C. Kumar, “Advertisement click fraud detection using machine learning techniques,” in Proc. Int. Conf. Technol. Advancements Innov. (ICTAI), Nov. 2021, pp. 109–114.
-
R. Dekou, S. Savo, S. Kufeld, D. Francesca, and R. Kawase, “Machine learning methods for detecting fraud in online marketplaces,” Proc. CEUR Workshop, vol. 3052, Jan. 2021, pp. 3–7.
-
D. Sisodia and D. S. Sisodia, “Gradient boosting learning for fraudulent publisher detection in online advertising,” Data Technol. Appl., vol. 55, no. 2, pp. 216–232, Apr. 2021, doi: 10.1108/dta-04-2020-0093.
-
G. S. Thejas, S. Dheeshjith, S. S. Iyengar, N. R. Sunitha, and P. Badrinath, “A hybrid and effective learning approach for click fraud detection,” Mach. Learn. With Appl., vol. 3, Mar. 2021, Art. no. 100016, doi: 10.1016/j.mlwa.2020.100016.
-
B. Viruthika, S. S. Das, E. Manishkumar, and D. Prabhu, “Detection of advertisement click fraud using machine learning,” Int. J. Adv. Sci. Technol., vol. 29, no. 5, pp. 3238–3245, 2020, doi: 10.13140/RG.2.2.23528.90881.
-
A Click Fraud Detection Scheme Based on Cost-Sensitive CNN and Feature Matrix, Google, Mountain View, CA, USA, 2020.
-
E.-A. Minastireanu and G. Mesnita, “Light GBM machine learning algorithm to online click fraud detection,” J. Inf. Assurance Cybersecur., vol. 2019, pp. 1–12, Apr. 2019, doi: 10.5171/2019.263928.
-
R. Mouawi, M. Awad, A. Chehab, I. H. E. Hajj, and A. Kayssi, “Towards a machine learning approach for detecting click fraud in mobile advertizing,” in Proc. Int. Conf. Innov. Inf. Technol. (IIT), Nov. 2018, pp. 88–92, doi: 10.1109/INNOVATIONS.2018.8605973.
-
K. S. Perera, B. Neupane, M. A. Faisal, Z. Aung, and W. L. Woon, “A novel ensemble learning-based approach for click fraud detection in mobile advertising,” in Mining Intelligence and Knowledge Exploration (Lecture Notes in Computer Science), vol. 8284. Berlin, Germany: Springer, 2013, pp. 370–382, doi: 10.1007/978-3-319-03844-5_38.
-
D. Berrar, “Random forests for the detection of click fraud in online mobile advertising,” in Proc. Int. Work. Fraud Detect. Mob. Advert. (FDMA), Singapore, 2012, pp. 1–10. [Online]. Available: http://berrar.com/resources/Berrar_FDMA2012.pdf.
-
C. Phua, E.-Y. Cheu, G.-E. Yap, K. Sim, and M.-N. Nguyen, “Feature engineering for click fraud detection,” in Proc. Work. Fraud Detect. Mob. Advert., 2012, pp. 1–10.
-
Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.
-
G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, and G. Bing, “Learning from class-imbalanced data: Review of methods and applications,” Expert Syst. Appl., vol. 73, pp. 220–239, May 2017, doi: 10.1016/j.eswa.2016.12.035.