AI-generated Deepfakes: Detection and Bias Analysis
Authors/Creators
Description
Executive Summary
DeepFakes, synthetic manipulations of faces produced with generative artificial intelligence, threaten the authenticity of content and expose detectors to the tough task of dealing with a multiplicity of content and great variability, compression levels and deepfake generation pipelines. Against this backdrop, this doctoral thesis investigates how errors made by DeepFake detectors relate to high-level facial attributes, and how this knowledge can be used to design more robust and interpretable systems. The work is positioned in computer vision for face forensics and addresses a central challenge in the field, namely the lack of generalization across manipulation families, compression levels, and capture conditions. The thesis contributes a scalable semi-supervised pipeline for attribute labeling, an empirical analysis that links false positives and false negatives to specific appearance factors, and a set of attribute-aware training strategies that improve reliability without sacrificing transparency. While the manuscript surveys the state of the art and reports several educational and industrial activities, the scientific core is concentrated in Chapter 4, Analysis of DF Detection through Attribute Labeling, and Chapter 5, Attribute-Aware Training Strategies. This summary emphasizes those two chapters while outlining the end-to-end approach and the resulting guidelines.
Introduction: Problem framing and research questions.
DeepFake generation has progressed quickly, which reduces the saliency of traditional visual artifacts and stresses detectors that were tuned for early datasets. As new pipelines and editing chains proliferate, model failures concentrate on specific slices of the data, for example under certain occlusions or appearance conditions. The thesis therefore asks three questions. RQ1 concerns interpretability, namely whether facial attribute labeling can make model behavior more transparent. RQ2 concerns reliability, namely which attributes correlate the most with false positives and false negatives at video level. RQ3 concerns design, namely whether detector training and evaluation can be reorganized around attribute information to improve robustness.
Datasets, preprocessing, and detection baselines.
Experiments are carried out on the FaceForensics++ corpus, focusing on pristine YouTube videos and their DeepFakes counterparts at the c40 compression tier. Videos are decoded, frames are uniformly sampled with a fixed skip factor to reduce redundancy, and faces are detected with a classical cascade. Tightly cropped face patches at (224×224) feed an image-level classifier. The reference detector is intentionally simple in order to isolate the effect of attributes. A VGG16 backbone with ImageNet initialization is used as a frozen feature extractor, followed by a compact classification head. Training, validation, and test splits are balanced by class and kept disjoint at video level. Video decisions are obtained by aggregating frame scores and by selecting the principal face track when multiple faces are present. With this configuration the system reaches high accuracy at video level, with precision and recall balanced across classes, and with stable behavior under the adopted compression and aggregation protocol. These baselines establish a transparent testbed where attribute-conditioned effects can be measured.
State of the Art.
The thesis is grounded in a structured literature review that systematizes methods, datasets, and evaluation protocols in face deepfake detection. The review follows a documented search-and-screen protocol across Scopus, IEEE Xplore, ScienceDirect and targeted snowballing, with inclusion criteria privileging reproducible evaluations on public dataset such as FaceForensics++, Celeb-DF, and DeepFake Detection Challenge Dataset (DFDC). The surveyed methods cover frame-level artifact detectors, face-centric physiological and geometric cues, and spatio–temporal pipelines, complemented by multimodal audio–visual approaches and fairness analyses. Surveys and benchmarks are used to map strengths and failure modes under compression, editing chains, and cross-dataset shift, which motivates the thesis focus on interpretability and robustness. This evidence base exposes two persistent gaps that directly shape Chapters 4 and 5: limited understanding of how high-level attributes relate to misclassifications, and a lack of training workflows that explicitly control attribute exposure. The review therefore provides both the methodological landscape for the baseline detector and the conceptual rationale for a semi-supervised attribute labeling pipeline and attribute-aware training strategies.
Industry activities in PwC.
The industrial period at PricewaterhouseCoopers Business Services Italia S.r.l. translated the research agenda into production-minded prototypes and evaluation practices. A staged sequence of projects consolidated the tooling and informed design choices later adopted in the thesis: a transfer-learning binary image classifier established a lightweight baseline for data hygiene and augmentation; a compact MNIST workflow standardized end-to-end training and reporting; a classical Olivetti face identification exercise validated reproducible splits and error inspection; finally, a video-level deepfake detector combined face cropping, pre-trained convolutional backbones, and simple temporal aggregation. These projects were developed with Python and common DL frameworks, presented in internal seminars, and used to stress operational constraints such as storage, compute budgets, compression tolerance, and threshold calibration. The lessons learned informed Chapter 4 by fixing a robust preprocessing and evaluation protocol for the attribute-enriched analysis, and guided Chapter 5 by prioritizing bias-aware curation, targeted augmentations, and attribute-conditioned training and reporting that can be adopted in enterprise settings without excessive complexity.
Analysis of DF Detection through Attribute Labeling.
Chapter 4 introduces a semi-supervised pipeline that scales facial attribute annotation with limited human effort. A seed set of fifty real videos is manually labeled for gender, hair color, hair length, ear visibility, and ethnicity, together with a set of boolean context flags. The manipulated companions inherit the same labels since identity swap keeps stable appearance traits. A per-attribute classifier, based on a compact convolutional backbone, is then trained on the seed, used to pseudo-label the remaining videos, and iteratively refined by including high-confidence predictions. This procedure yields a labeled testbed for downstream analysis without incurring prohibitive annotation costs. The chapter documents safeguards adopted to control error propagation, such as confidence thresholds for acceptance, manual spot checks on the tail of the confidence distribution, and the exclusion of attributes that collapse to a single value in the seed. The labeled corpus enables a systematic link between detector outputs and appearance factors. The analysis proceeds in three layers. First, descriptive statistics quantify error rates per attribute value, for example the share of false positives for short hair and for long hair. Second, dependence measures are computed, including group-wise differences in precision and recall, effect sizes on confusion-matrix entries, and non-parametric association coefficients for categorical variables. Third, simple interpretable models are fitted on the test predictions, for example logistic regressions that use attribute dummies to explain the probability of a misclassification, together with partial-dependence visualizations. Across these views, two factors emerge consistently. Ear visibility correlates with detector behavior, with elevated error rates when ears are occluded by hair or accessories. Hair length also influences outcomes, particularly in profiles and semi-profiles where the face contour and side hair interact with compression. These patterns persist after controlling for class balance, video aggregation, and the presence of multiple faces, which indicates that they are not artifacts of sampling alone. The chapter further reports calibration slices by attribute, showing that score distributions shift across groups. In particular, the same confidence threshold yields different precision–recall trade-offs when ears are not visible, which suggests group-aware thresholding or improved calibration as practical remedies. The multi-face setting is addressed explicitly. Since the dataset includes crowd scenes and interviews with more than one person in frame, the analysis compares three aggregation rules, namely majority voting across faces, selection of the largest face, and selection of the most persistent track over time. The last option, which approximates the main subject, reduces spurious errors introduced by bystanders, and clarifies that the attribute effects highlighted above are not driven by incidental faces. Taken together, the chapter provides an empirical map of where the baseline detector breaks, and it does so in terms that a human analyst can verify on representative clips.
Attribute-Aware Training Strategies.
Building on the evidence gathered in Chapter 4, Chapter 5 investigates how facial attributes can be used not only for post hoc analysis but also to probe the behaviour of a DeepFake detector under controlled distribution shifts. The chapter introduces an attribute-aware pipeline in which a VGG16-based classifier is trained multiple times on FaceForensics++, each time excluding one attribute value from the training data and then evaluating all models on a common, complete test set. This controlled exclusion design, combined with subgroup metrics and minimum support thresholds, provides an exploratory view of how the absence of specific attribute configurations affects accuracy, AUC and the balance between TPR and TNR. The results suggest a moderate but recurrent effect for hair_length and a more pronounced sensitivity to is_ears_visible, with the removal of visible-ear cases in training associated with a higher rate of FAKE → REAL errors. The chapter concludes with cautious operational recommendations: curating training data to ensure explicit coverage of critical attributes, monitoring performance by subgroup, and considering attribute-aware calibration or thresholding in deployment. These findings do not provide definitive bias guarantees, but they indicate that simple attribute-aware training and evaluation strategies may help to organise robustness checks and to support more transparent assessment of DeepFake detectors.
Conclusions and Future Work.
The concluding chapter revisits the three research questions through the lens of the empirical results and frames the contribution of the thesis as primarily methodological and exploratory. The work combines a standard VGG16-based DeepFake detector on FaceForensics++ with a semi-supervised ResNet18 attribute labeller, and uses the resulting annotations to study misclassification patterns and simple attribute-aware training variants. The analysis suggests that certain appearance factors, in particular hair length and ear visibility, are associated with variations in error rates and may therefore offer useful context for post hoc monitoring and bias-aware inspection, although no causal conclusions are drawn. The chapter also highlights the main limitations of the study, including the focus on a single dataset and compression level, a limited and imbalanced attribute set, reliance on univariate statistics, and the use of a deliberately simple architecture. Within these constraints, the thesis contributions can be seen as: (i) a reproducible semi-supervised pipeline for facial attribute labeling on DeepFake benchmarks; (ii) an empirical indication that attribute-conditioned error analysis may help organise the evaluation of DeepFake detectors; and (iii) a set of preliminary ideas for attribute-aware training and calibration. Possible extensions include applying the same protocol to more diverse datasets and richer attribute taxonomies, adopting multivariate analytical tools, and exploring the integration of attribute information into data curation, sampling and threshold selection to support more robust and fairness-aware detection in future work.
Abstract (English)
Abstract
DeepFakes, synthetic manipulations of faces produced with generative artificial intelligence, threaten the authenticity of content and expose detectors to the tough task of dealing with a multiplicity of content and great variability, compression levels and deepfake generation pipelines. Against this backdrop, this doctoral thesis investigates how misclassifications in DeepFake detection relate to high-level facial attributes, and how this knowledge can guide more robust and interpretable detectors. The work proceeds in two stages. In a first analysis, a frame-level classifier distinguishes manipulated from authentic content and its errors are examined post hoc. Videos from the dataset are preprocessed by detecting and cropping faces with a cascade classifier. The dataset is enriched through a facial-attribute labeling pipeline that starts from a small manually annotated seed and expands on the whole dataset with per-attribute semi-supervised classifier to derive labels such as gender, hair color, hair length, ear visibility, and ethnicity. Subsequently, was created a DeepFake classifier that delivers achieves good results on the primary subject in each video. Attribute-wise error analysis (including label-level metrics and statistical dependence measures) reveals systematic patterns: in particular, ear visibility and hair length emerge as influential contextual factors that can a"ect decisions. In an extension of the analysis, insights are stress-tested via controlled exclusion experiments that remove one or more values of a given attribute during training, and the related models are evaluated on the complete test set. The results show that some characteristics impact model performance and decision behavior; for example, removing training exposure to certain visibility conditions degrades the detector’s ability at test time. These findings motivate data curation that balances key attribute conditions, applies targeted augmentations, and assesses the influence of attributes on the final outcome. Overall, the thesis contributes a scalable semi-supervised pipeline for attribute labeling and practical guidelines for bias-aware training. The study advances interpretability and tackles the field’s central generalization problem by showing that explicit attribute information can guide data curation and training so that models become more reliable to real-world variability.
Files
STILE_Vittorio_PhD_THESIS.pdf
Files
(3.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:afd008adc78a2fd4bfb8faf3a58d9aca
|
3.9 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/vstile?tab=repositories
- Programming language
- Python
- Development Status
- Active
References
- Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). MesoNet: A Compact Facial Video Forgery Detection Network. 2018 IEEE International Workshop on Information Forensics and Security (WIFS), 1–7. https://doi.org/10.1109/WIFS.2018.8630761
- Amerini, I., Galteri, L., Caldelli, R., & Del Bimbo, A. (2019). Deepfake Video Detection through Optical Flow Based CNN. 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 1205–1207. https://doi.org/10.1109/ICCVW.2019.00152
- Anshul, A., Gopal, S., Rajan, D., & Chng, E. S. (2025). Intra-modal and Cross-modal Synchronization for Audio-visual Deepfake Detection and Temporal Localization. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 13826–13836.
- Beckmann, A., Hilsmann, A., & Eisert, P. (2023). Fooling State-of-the-Art Deepfake Detection with High-Quality Deepfakes (arXiv:2305.05282). arXiv. https://doi.org/10.48550/arXiv.2305.05282
- Chan, T.-H., Jia, K., Gao, S., Lu, J., Zeng, Z., & Ma, Y. (2015). PCANet: A Simple Deep Learning Baseline for Image Classification? IEEE Transactions on Image Processing, 24(12), 5017–5032. https://doi.org/10.1109/TIP.2015.2475625
- Chandrasegaran, S. R., Xu, M., & Mandal, B. (2021). Cross-modal deepfake detection using inconsistent audio-visual cues. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1–9.
- Dang, H., Liu, F., Stehouwer, H., Liu, X., & Jain, A. K. (2020). Detection of deepfake videos using multi-attentional convolutional neural networks. European Conference on Computer Vision, 660–676.
- Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., & Ferrer, C. C. (2020). The DeepFake Detection Challenge (DFDC) Dataset. arXiv. https://doi.org/10.48550/ARXIV.2006.07397
- Durall, R., Keuper, M., & Keuper, J. (2020a). Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7887–7896. https://doi.org/10.1109/CVPR42600.2020.00791
- Durall, R., Keuper, M., & Keuper, J. (2020b). Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 7887–7896. https://doi.org/10.1109/CVPR42600.2020.00791
- European Data Protection Board. (2020). Guidelines 4/2019 on Article 25 Data Protection by Design and by Default. https://www.edpb.europa.eu/our-work-tools/our-documents/guidelines/guidelines-42019-article-25-data-protection-design-and_en
- European Data Protection Supervisor. (2023, December). TechSonar Report 2023–2024: Emerging Technologies (including Deepfake Detection). https://www.edps.europa.eu/system/files/2023-12/23-12-04_techsonar_23-24_en.pdf
- European Union. (2016). Regulation (EU) 2016/679 (General Data Protection Regulation): Articles 25 (Data Protection by Design and by Default) and 32 (Security of Processing). https://eur-lex.europa.eu/eli/reg/2016/679/oj
- European Union. (2024). Regulation (EU) 2024/1689 (Artificial Intelligence Act), including Annex III: High-Risk AI Systems Categories. https://eur-lex.europa.eu/eli/reg/2024/1689/oj
- Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd.
- Floridi, L. (2022). Etica dell'intelligenza artificiale: Sviluppi, opportunità, sfide (M. Durante, Ed.; Prima edizione). Raffaello Cortina Editore.
- Gong, Y., & Zhang, P. (2021). Research on Mnist Handwritten Numbers Recognition based on CNN. Journal of Physics: Conference Series, 2138(1), 012002. https://doi.org/10.1088/1742-6596/2138/1/012002
- Guarnera, L., Giudice, O., Guarnera, F., Ortis, A., Puglisi, G., Paratore, A., Bui, L. M. Q., Fontani, M., Coccomini, D. A., Caldelli, R., Falchi, F., Gennaro, C., Messina, N., Amato, G., Perelli, G., Concas, S., Cuccu, C., Orrù, G., Marcialis, G. L., & Battiato, S. (2022). The Face Deepfake Detection Challenge. Journal of Imaging, 8(10), 263. https://doi.org/10.3390/jimaging8100263
- Güera, D., & Delp, E. J. (2018). Deepfake Video Detection Using Recurrent Neural Networks. 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 1–6. https://doi.org/10.1109/AVSS.2018.8639163
- Guerrero-Contreras, G., Balderas-Díaz, S., García-Pascual, A., & Munõz, A. (2024). Self-Learning Systems for Enhanced Traffic Management in Urban Settings. https://doi.org/10.5281/ZENODO.11917270
- Guerrero-Contreras, G., Balderas-Díaz, S., García-Pascual, A., & Muñoz, A. (2025). Adaptive Vehicle Detection in Urban Environments: A Self-learning Approach. In P. Novais, P. B. D., I. Satoh, V. J. Inglada, S. R. González, E. Jove Pérez, J. Parra Domínguez, P. Chamoso, & R. S. Alonso (Eds), Ambient Intelligence – Software and Applications – 15th International Symposium on Ambient Intelligence (Vol. 1279, pp. 25–34). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-83117-1_3
- Hazirbas, C., Bitton, J., Dolhansky, B., Pan, J., Gordo, A., & Ferrer, C. C. (2022a). Towards Measuring Fairness in AI: The Casual Conversations Dataset. IEEE Transactions on Biometrics, Behavior, and Identity Science, 4(3), 324–332. https://doi.org/10.1109/TBIOM.2021.3132237
- Hazirbas, C., Bitton, J., Dolhansky, B., Pan, J., Gordo, A., & Ferrer, C. C. (2022b). Towards Measuring Fairness in AI: The Casual Conversations Dataset. IEEE Transactions on Biometrics, Behavior, and Identity Science, 4(3), 324–332. https://doi.org/10.1109/TBIOM.2021.3132237
- Katamneni, V. S., & Rattani, A. (2024). Contextual Cross-Modal Attention for Audio-Visual Deepfake Detection and Localization (arXiv:2408.01532). arXiv. https://doi.org/10.48550/arXiv.2408.01532
- Korshunov, P., & Marcel, S. (2018a). DeepFakes: A New Threat to Face Recognition? Assessment and Detection (arXiv:1812.08685). arXiv. https://doi.org/10.48550/arXiv.1812.08685
- Korshunov, P., & Marcel, S. (2018b). DeepFakes: A New Threat to Face Recognition? Assessment and Detection (arXiv:1812.08685). arXiv. https://doi.org/10.48550/arXiv.1812.08685
- Laudonia, A., Avolio, F., Cosmo, N., Giannetti, I., Liberanome, P., Maciariello, F., & Stile, V. (2026). AI-Driven Financial Risk Prevention: The Role of HR Analytics in Corporate Crisis Management Under Industry 5.0. Procedia Computer Science. https://www.sciencedirect.com/science/article/pii/S1877050926005053
- Levi, G., & Hassncer, T. (2015a). Age and gender classification using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 34–42.
- Levi, G., & Hassncer, T. (2015b). Age and gender classification using convolutional neural networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 34–42. https://doi.org/10.1109/CVPRW.2015.7301352
- Levi, G., & Hassner, T. (2015). Age and gender classification using convolutional neural networks. 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 34–42. https://doi.org/10.1109/CVPRW.2015.7301352
- Li, Y., Chang, M.-C., & Lyu, S. (2018). In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking. 2018 IEEE International Workshop on Information Forensics and Security (WIFS), 1–7. https://doi.org/10.1109/WIFS.2018.8630787
- Li, Y., Chang, M.-C., & Lyu, S. (2020). Celeb-DF (v2): A new dataset for deepfake forensics [Data set]. https://cse.buffalo.edu/~siweilyu/celeb-deepfakeforensics.html
- Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020). Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3204–3213. https://doi.org/10.1109/CVPR42600.2020.00327
- Medical Device Coordination Group. (2025, June). MDCG 2019-11 Rev.1: Guidance on Qualification and Classification of Software under Regulation (EU) 2017/745 – MDR and Regulation (EU) 2017/746 – IVDR. https://health.ec.europa.eu/document/download/b45335c5-1679-4c71-a91c-fc7a4d37f12b_en
- Neekhara, P., Dolhansky, B., Bitton, J., & Ferrer, C. C. (2021a). Adversarial Threats to DeepFake Detection: A Practical Perspective. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 923–932. https://doi.org/10.1109/CVPRW53098.2021.00103
- Neekhara, P., Dolhansky, B., Bitton, J., & Ferrer, C. C. (2021b). Adversarial Threats to DeepFake Detection: A Practical Perspective. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 923–932. https://doi.org/10.1109/CVPRW53098.2021.00103
- Nguyen, T., Nguyen, C., Nguyen, D. T., Nguyen, H. Q., & Nahavandi, S. (2019). Deep learning for deepfakes creation and detection: A survey. arXiv Preprint arXiv:1909.11573.
- Rana, M. S., Nobi, M. N., Murali, B., & Sung, A. H. (2022a). Deepfake Detection: A Systematic Literature Review. IEEE Access, 10, 25494–25513. https://doi.org/10.1109/ACCESS.2022.3154404
- Rana, M. S., Nobi, M. N., Murali, B., & Sung, A. H. (2022b). Deepfake Detection: A Systematic Literature Review. IEEE Access, 10, 25494–25513. https://doi.org/10.1109/ACCESS.2022.3154404
- Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019). FaceForensics++: Learning to Detect Manipulated Facial Images. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 1–11. https://doi.org/10.1109/ICCV.2019.00009
- Sabir, E., Cheng, W., Jaiswal, A., AbdAlmageed, W., Masi, I., & Natarajan, P. (2019). Recurrent convolutional strategies for face manipulation detection in videos. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 1–9.
- Schäfer, J., & Boubeta-Puig, J. (n.d.). Applied Computer Science: 11th Spanish German Symposium, SGSOACS 2025, Vienna, Austria, June 30–July 3, 2025, Proceedings (1st edn). Springer Cham. Retrieved 14 December 2025, from https://link.springer.com/book/9783032148155
- Stile, V. (2024, October 10). Recognition of Deepfakes Generated Through Al. https://zenodo.org/doi/10.5281/zenodo.17929426
- Stile, V., Bonino, V., & Cosmo, N. (2024). The Impact of BI and AI on Traditional Structures with Legal and Philosophical Insights. Proceedings of the 21st Conference of the Italian Chapter of AIS (itAIS 2024) on AISeL.
- Stile, V., Caldelli, R., Guerrero-Contreras, G., Balderas-Díaz, S., & Medina-Bulo, I. (2025). Analysis of DeepFake Detection through Semi-Supervised Facial Attribute Labeling. Proceedings of the 11th Spanish-German Symposium on Applied Computer Science (SGSOACS 2025), Communications in Computer and Information Science (CCIS), 2831, XX, 138. https://link.springer.com/book/9783032148155
- Tapo, A. A., Traore, A., Danioko, S., & Tembine, H. (2024). Machine Intelligence in Africa: A survey. arXiv. https://doi.org/10.48550/arXiv.2402.02218
- Tolosana, R., Vera-Rodriguez, R., Fierrez, J., Morales, A., & Ortega-Garcia, J. (2020). Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion, 64, 131–148.
- Verdoliva, L. (2020). Media Forensics and DeepFakes: An Overview. IEEE Journal of Selected Topics in Signal Processing, 14(5), 910–932. https://doi.org/10.1109/JSTSP.2020.3002101
- World Health Organization. (2021). Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. World Health Organization. https://www.who.int/publications/i/item/9789240029200
- World Health Organization. (2024). Ethics and Governance of Artificial Intelligence for Health: Guidance on Large Multimodal Models (LMMs). World Health Organization. https://www.who.int/publications/i/item/9789240084759
- Yang, X., Li, Y., & Lyu, S. (2019). Exposing Deep Fakes Using Inconsistent Head Poses. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 8261–8265. https://doi.org/10.1109/ICASSP.2019.8683164
- Yu, B., Yin, H., & Zhu, Z. (2018). Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 3634–3640. https://doi.org/10.24963/ijcai.2018/505
- Zhang, N., Paluri, M., Ranzato, M., Darrell, T., & Bourdev, L. (2014). PANDA: Pose aligned networks for deep attribute modeling. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1637–1644.
- Zhou, P., Han, X., Morariu, V. I., & Davis, L. S. (2017). Two-stream neural networks for tampered face detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1831–1839.