ULMFiT: Universal Language Model Fine-Tuning for Text Classification

Herat Joshi

doi:10.54105/ijamst.E3049.04061024/

Published October 30, 2024 | Version CC-BY-NC-ND 4.0

Journal article Open

ULMFiT: Universal Language Model Fine-Tuning for Text Classification

Herat Joshi (Contact person)¹

1. Department of Analytics & Decision Support, Great River Health Systems, Burlington, (Iowa), United States of America (USA).

Contributors

Contact person:

Herat Joshi¹

Researcher:

Shenson Joseph²

1. Department of Analytics & Decision Support, Great River Health Systems, Burlington, (Iowa), United States of America (USA).
2. Department of Computer Engineering, University of North Dakota, Houston, (Texas), United States of America (USA).

Abstract: While inductive transfer learning has revolutionized computer vision, current approaches to natural language processing still need training from the ground up and task-specific adjustments. As a powerful transfer learning approach applicable to any NLP activity, we provide Universal Language Model Fine-tuning (ULMFiT) and outline essential strategies for language model fine-tuning. With an error reduction of 18–24% on most datasets, our technique considerably surpasses the state-of-the-art on six text categorization tasks. Additionally, it achieves the same level of performance as training on 100 times more data with only 100 annotated examples. We have made our pretrained models and code publicly available.

Files

E304904061024.pdf

Files (563.0 kB)

Name	Size	Download all
E304904061024.pdf md5:0c1384447355c0418317a04e2d4771ff	563.0 kB	Preview Download

Additional details

DOI: 10.54105/ijamst.E3049.04061024
EISSN: 2582-7596

Accepted: 2024-10-15

Manuscript received on 02 October 2024 | Revised Manuscript received on 11 October 2024 | Manuscript Accepted on 15 October 2024 | Manuscript published on 30 October 2024.

Adeborna, Esi and Siau, Keng, "AN APPROACH TO SENTIMENT ANALYSIS –THE CASE OF AIRLINE QUALITY RATING"(2014). PACIS 2014 Proceedings. Paper 363. http://aisel.aisnet.org/pacis2014/363
Hung T. Vo, Hai C. Lam, Duc Dung Nguyen, Nguyen Huynh Tuong, "Topic classification and sentiment analysis for Vietnamese education survey system. (2016). In Asian Journal of Computer Science and Information Technology (Vol. 6, Issue 3). Innovative Journal. https://doi.org/10.15520/ajcsit.v6i3.44 .
Sarkar, S., Seal, T., & Bandyopadhyay, S. K. (2016). Sentiment analysis - An objective view. Journal for Research, 2(2), 26-29. https://www.researchgate.net/publication/328610677_Sentiment_Analysis-An_Objective_View
Joseph, Shenson and Joshi, Herat and Hassan, Md. Mehedi and Bairagi, Anupam Kumar, Advancing Quantum Machine Learning: From Theoretical Concepts to Experimental Implementations (July 01, 2024). Available at SSRN: https://ssrn.com/abstract=4946682 or http://dx.doi.org/10.2139/ssrn.4946682
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J Mach Learn Res 15, 1929-1958. http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf
Joshi, H. (2022). Navigating the intersection of machine learning and healthcare: A review of current applications. International Journal of Advanced Research in Computer and Communication Engineering, 11(10). https://doi.org/10.17148/IJARCCE.2022.111016
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. https://www.deeplearningbook.org/
Zagoruyko, S., & Komodakis, N. (2016). Wide Residual Networks. Proceedings of the British Machine Vision Conference 2016. doi:10.5244/c.30.87 DOI: https://dx.doi.org/10.5244/C.30.87
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr.2016.90
Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/p18-1031
Joshi, H., Joseph, S., & Shukla, P. (2024). Unlocking Potential. In Advances in Medical Technologies and Clinical Practice (pp. 313–341). IGI Global. https://doi.org/10.4018/979-8-3693-5893-1.ch016
M. E. Peters, W. Ammar, C. Bhagavatula, and R. Power, "Semisupervised sequence tagging with bidirectional language models," arXiv preprint arXiv:1705.00108, 2017. https://doi.org/10.18653/v1/p17-1161
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1310.1531
Long, J., Shelhamer, E., & Darrell, T. (2014). Fully Convolutional Networks for Semantic Segmentation (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1411.4038
Sarhan, I., & Spruit, M. (2020). Can We Survive without Labelled Data in NLP? Transfer Learning for Open Information Extraction. In Applied Sciences (Vol. 10, Issue 17, p. 5758). MDPI AG. https://doi.org/10.3390/app10175758
Deng, J., Dong, W., Socher, R., Li, L.-J., Kai Li, & Li Fei-Fei. (2009). ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops). IEEE. https://doi.org/10.1109/cvpr.2009.5206848
Bryan McCann, James Bradbury, Caiming Xiong, and Richard Socher. Learned in Translation: Contextualized Word Vectors. In Advances in Neural Information Processing Systems, 2017. https://doi.org/10.48550/arXiv.1708.00107
McCann, B., Bradbury, J., Xiong, C., & Socher, R. (2017). Learned in Translation: Contextualized Word Vectors (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1708.00107
Mou, L., Peng, H., Li, G., Xu, Y., Zhang, L., & Jin, Z. (2015). Discriminative Neural Sentence Modeling by Tree-Based Convolution (Version 5). arXiv. https://doi.org/10.48550/ARXIV.1504.01106 .
Zhou, P., Qi, Z., Zheng, S., Xu, J., Bao, H., & Xu, B. (2016). Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1611.06639
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level Convolutional Networks for Text Classification (Version 3). arXiv. https://doi.org/10.48550/ARXIV.1509.01626
Johnson, R., & Zhang, T. (2016). Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1602.02373
Johnson, R., & Zhang, T. (2017). Deep Pyramid Convolutional Neural Networks for Text Categorization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. https://doi.org/10.18653/v1/p17-1052 .
Patel, V. A., & Joshi, M. V. (2018). Convolutional neural network with transfer learning for rice type classification. In J. Zhou, P. Radeva, D. Nikolaev, & A. Verikas (Eds.), Tenth International Conference on Machine Vision (ICMV 2017). Tenth International Conference on Machine Vision (ICMV 2017). SPIE. https://doi.org/10.1117/12.2309482 .
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics. https://doi.org/10.3115/v1/d14-1162 .
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality (Version 1). arXiv. https://doi.org/10.48550/ARXIV.1310.4546 .
Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching Word Vectors with Subword Information (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1607.04606 .
M. Rei, "Semi-supervised multitask learning for sequence labeling," arXiv preprint https://arxiv.org/pdf/1704.07156
Joshi, H. (2024). Artificial Intelligence in Project Management: A Study of The Role of Ai-Powered Chatbots in Project Stakeholder Engagement. In Indian Journal of Software Engineering and Project Management (Vol. 4, Issue 1, pp. 20–25). Lattice Science Publication (LSP). https://doi.org/10.54105/ijsepm.b9022.04010124
McCann, B., Bradbury, J., Xiong, C., & Socher, R. (2017). Learned in Translation: Contextualized Word Vectors (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1708.00107
Kumbhakarna, V. M., Kulkarni, S. B., & Dhawale, A. D. (2020). NLP Algorithms Endowed f or Automatic Extraction of Information from Unstructured Free Text Reports of Radiology Monarchy. In International Journal of Innovative Technology and Exploring Engineering (Vol. 9, Issue 12, pp. 338–343). https://doi.org/10.35940/ijitee.l8009.1091220
Chellatamilan, T., Valarmathi, B., & Santhi, K. (2020). Research Trends on Deep Transformation Neural Models for Text Analysis in NLP Applications. In International Journal of Recent Technology and Engineering (IJRTE) (Vol. 9, Issue 2, pp. 750–758). https://doi.org/10.35940/ijrte.b3838.079220
Hudaa, S., Setiyadi, D. B. P., Lydia, E. L., Shankar, K., Nguyen, P. T., Hashim, W., & Maseleno, A. (2019). Natural Language Processing utilization in Healthcare. In International Journal of Engineering and Advanced Technology (Vol. 8, Issue 6s2, pp. 1117–1120). https://doi.org/10.35940/ijeat.f1305.0886s219

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	12	12
Downloads	15	15
Data volume	9.6 MB	9.6 MB

ULMFiT: Universal Language Model Fine-Tuning for Text Classification

Contributors

Contact person:

Researcher:

Files

E304904061024.pdf

Files (563.0 kB)

Additional details

Identifiers

Dates

References

ULMFiT: Universal Language Model Fine-Tuning for Text Classification

Creators

Contributors

Contact person:

Researcher:

Description

Files

E304904061024.pdf

Files (563.0 kB)

Additional details

Identifiers

Dates

References