Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

Published June 24, 2022 | Version 1
Book Open

Deep Learning: Theoretical and Practical Approach

Description

Deep Learning Course Book In Persian
This book includes three parts; The first part provides necessary prerequisites for deep learning topics such as linear algebra, statistics and probability, information theory, data mining, signal processing, machine learning, etc. The main issues in deep learning, including artificial neural networks, evaluation criteria, optimization methods, represent learning, recurrent neural networks, convolutional neural networks, and generative networks, fall within the scope of the second part. Also, the third part of this book is dedicated to advanced topics in this field. Natural language models, attention mechanism, transfer learning, domain adaption, and neural architecture search are examples of the titles of this part.

Files

DeepLearningBook.pdf

Files (15.5 MB)

Name Size Download all
md5:7958f973da8b1f251a980978aa4c3fcf
15.5 MB Preview Download

Additional details

References

  • Bishop, Christopher. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, 2018.
  • Goodfellow, Ian, et al. Deep Learning (Adaptive Computation and Machine Learning Series). Illustrated, The MIT Press, 2016.
  • Yuille, A. L., and Anand Rangarajan. "The Concave-Convex Procedure." Neural Computation, vol. 15, no. 4, 2003, pp. 915–36. Crossref, doi:10.1162/08997660360581958.
  • Schuster, M., and K. K. Paliwal. "Bidirectional Recurrent Neural Networks." IEEE Transactions on Signal Processing, vol. 45, no. 11, 1997, pp. 2673–81. Crossref, doi:10.1109/78.650093.
  • Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
  • Chu, Jielei, et al. "Restricted boltzmann machines with gaussian visible units guided by pairwise constraints." IEEE transactions on cybernetics 49.12 (2018): 4321-4334.
  • Hu, Hengyuan, Lisheng Gao, and Quanbin Ma. "Deep restricted boltzmann networks." arXiv preprint arXiv:1611.07917 (2016).
  • Tolstikhin, Ilya, et al. "Wasserstein auto-encoders." arXiv preprint arXiv:1711.01558 (2017).
  • Park, Saerom, and Jaewook Lee. "Stability Analysis of Denoising Autoencoders Based on Dynamical Projection System." IEEE Transactions on Knowledge and Data Engineering (2020).
  • Vincent, Pascal, et al. "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion." Journal of machine learning research 11.12 (2010).
  • Hinton, Geoffrey E. "Boltzmann machine." Scholarpedia 2.5 (2007): 1668.
  • Alain, Guillaume, and Yoshua Bengio. "What regularized auto-encoders learn from the data-generating distribution." The Journal of Machine Learning Research 15.1 (2014): 3563-3593.
  • Ng, Andrew. "Sparse autoencoder." CS294A Lecture notes 72.2011 (2011): 1-19.
  • Hinton, Geoffrey E. "A practical guide to training restricted Boltzmann machines." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 599-619.
  • Chen, Fu-qiang, et al. "Contractive de-noising auto-encoder." International Conference on Intelligent Computing. Springer, Cham, 2014.
  • Hwang, Juno, Wonseok Hwang, and Junghyo Jo. "Tractable loss function and color image generation of multinary restricted Boltzmann machine." arXiv preprint arXiv:2011.13509 (2020).
  • Carlson, David, Volkan Cevher, and Lawrence Carin. "Stochastic spectral descent for restricted Boltzmann machines." Artificial Intelligence and Statistics. PMLR, 2015.
  • Thickstun, John. "Kantorovich-Rubinstein Duality." (2019).
  • Hoffman, Matthew D., et al. "Stochastic variational inference." Journal of Machine Learning Research 14.5 (2013).
  • Cremer, Chris, Xuechen Li, and David Duvenaud. "Inference suboptimality in variational autoencoders." International Conference on Machine Learning. PMLR, 2018
  • Rezende, Danilo Jimenez, Shakir Mohamed, and Daan Wierstra. "Stochastic backpropagation and approximate inference in deep generative models." International conference on machine learning. PMLR, 2014.
  • Kingma, Diederik P., and Max Welling. "Auto-encoding variational bayes." arXiv preprint arXiv:1312.6114 (2013).
  • Zhang, Cheng, et al. "Advances in variational inference." IEEE transactions on pattern analysis and machine intelligence 41.8 (2018): 2008-2026.
  • Wainwright, Martin J., and Michael Irwin Jordan. Graphical models, exponential families, and variational inference. Now Publishers Inc, 2008.
  • Johnson, Matthew J., et al. "Composing graphical models with neural networks for structured representations and fast inference." Advances in neural information processing systems 29 (2016): 2946-2954.
  • Yoshizawa, Shuji, Masahiko Morita, and Shun-Ichi Amari. "Capacity of associative memory using a nonmonotonic neuron model." Neural Networks 6.2 (1993): 167- 176.
  • Torres, Joaquín J., Lovorka Pantic, and Hilbert J. Kappen. "Storage capacity of attractor neural networks with depressing synapses." Physical Review E 66.6 (2002): 061910.
  • Demircigil, Mete, et al. "On a model of associative memory with huge storage capacity." Journal of Statistical Physics 168.2 (2017): 288-299.
  • Amit, Daniel J., Hanoch Gutfreund, and Haim Sompolinsky. "Statistical mechanics of neural networks near saturation." Annals of physics 173.1 (1987): 30-67.
  • Krotov, Dmitry, and John J. Hopfield. "Dense associative memory for pattern recognition." Advances in neural information processing systems 29 (2016): 1172- 1180.
  • Cho, Youngmin. Kernel methods for deep learning. University of California, San Diego, 2012.
  • Cho, Youngmin. Kernel methods for deep learning. University of California, San Diego, 2012.
  • Hopfield, John J. "Neural networks and physical systems with emergent collective computational abilities." Proceedings of the national academy of sciences 79.8 (1982): 2554-2558.
  • Gretton, Arthur, et al. "A kernel two-sample test." The Journal of Machine Learning Research 13.1 (2012): 723-773.
  • Yuan, Ao, et al. "U-statistic with side information." Journal of multivariate analysis 111 (2012): 20-38.
  • Ramsauer, Hubert, et al. "Hopfield networks is all you need." arXiv preprint arXiv:2008.02217 (2020).
  • Widrich, Michael, et al. "Modern hopfield networks and attention for immune repertoire classification." arXiv preprint arXiv:2007.13505 (2020).
  • Demircigil, Mete, et al. "On a model of associative memory with huge storage capacity." Journal of Statistical Physics 168.2 (2017): 288-299.
  • Elman, Jeffrey L. "Finding structure in time." Cognitive science 14.2 (1990): 179- 211.
  • Krotov, Dmitry, and John Hopfield. "Large associative memory problem in neurobiology and machine learning." arXiv preprint arXiv:2008.06996 (2020).
  • Kim, Do-Hyun, Jinha Park, and Byungnam Kahng. "Enhanced storage capacity with errors in scale-free Hopfield neural networks: An analytical study." PloS one 12.10 (2017): e0184683.
  • Amit, Daniel J., Hanoch Gutfreund, and Haim Sompolinsky. "Statistical mechanics of neural networks near saturation." Annals of physics 173.1 (1987): 30-67.
  • Jordan, M. I. Serial order: a parallel distributed processing approach. Technical report, June 1985-March 1986. No. AD-A-173989/5/XAB; ICS-8604. California Univ., San Diego, La Jolla (USA). Inst. for Cognitive Science, 1986.
  • Lipton, Zachary C., John Berkowitz, and Charles Elkan. "A critical review of recurrent neural networks for sequence learning." arXiv preprint arXiv:1506.00019 (2015).
  • Ratcliff, Roger. "A theory of memory retrieval." Psychological review 85.2 (1978): 59.
  • Montúfar, Guido. "Restricted boltzmann machines: Introduction and review." Information Geometry and Its Applications IV. Springer, Cham, 2016.
  • Aarts, Emile HL, and Jan HM Korst. "Boltzmann machines and their applications." International Conference on Parallel Architectures and Languages Europe. Springer, Berlin, Heidelberg, 1987.
  • Jaeger, Herbert, and Harald Haas. "Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication." science 304.5667 (2004): 78-80.
  • Robbins, Herbert, and Sutton Monro. "A stochastic approximation method." The annals of mathematical statistics (1951): 400-407.
  • Dauphin, Yann, et al. "Identifying and attacking the saddle point problem in highdimensional non-convex optimization." arXiv preprint arXiv:1406.2572 (2014).
  • Darken, Christian, Joseph Chang, and John Moody. "Learning rate schedules for faster stochastic gradient search." Neural networks for signal processing. Vol. 2. 1992.
  • Sutton, Richard. "Two problems with back propagation and other steepest descent learning procedures for networks." Proceedings of the Eighth Annual Conference of the Cognitive Science Society, 1986. 1986.
  • Qian, Ning. "On the momentum term in gradient descent learning algorithms." Neural networks 12.1 (1999): 145-151.
  • Nesterov, Yurii E. "A method for solving the convex programming problem with convergence rate O (1/k^ 2)." Dokl. akad. nauk Sssr. Vol. 269. 1983.
  • Bengio, Yoshua, Nicolas Boulanger-Lewandowski, and Razvan Pascanu. "Advances in optimizing recurrent networks." 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2013.
  • Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online learning and stochastic optimization." Journal of machine learning research 12.7 (2011).
  • Sutskever, Ilya. Training recurrent neural networks. Toronto, Canada: University of Toronto, 2013.
  • Dean, Jeffrey, et al. "Large scale distributed deep networks." Advances in neural information processing systems 25 (2012): 1223-1231.
  • Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
  • Zeiler, Matthew D. "Adadelta: an adaptive learning rate method." arXiv preprint arXiv:1212.5701 (2012).
  • Heusel, Martin, et al. "Gans trained by a two time-scale update rule converge to a local nash equilibrium." Advances in neural information processing systems 30 446 (2017).
  • Dozat, Timothy. "Incorporating nesterov momentum into adam." (2016).
  • Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  • Johnson, Melvin, et al. "Google's multilingual neural machine translation system: Enabling zero-shot translation." Transactions of the Association for Computational Linguistics 5 (2017): 339-351.
  • Reddi, Sashank J., Satyen Kale, and Sanjiv Kumar. "On the convergence of adam and beyond." arXiv preprint arXiv:1904.09237 (2019).
  • Loshchilov, I., and F. Hutter. "Decoupled weight decay regularization. arXiv." Preprint published January 4 (2019).
  • Ma, Jerry, and Denis Yarats. "Quasi-hyperbolic momentum and adam for deep learning." arXiv preprint arXiv:1810.06801 (2018).
  • Lucas, James, et al. "Aggregated momentum: Stability through passive damping." arXiv preprint arXiv:1804.00325 (2018).
  • Niu, Feng, et al. "Hogwild!: A lock-free approach to parallelizing stochastic gradient descent." arXiv preprint arXiv:1106.5730 (2011).
  • McMahan, Brendan, and Matthew Streeter. "Delay-tolerant algorithms for asynchronous distributed online learning." Advances in Neural Information Processing Systems 27 (2014): 2915-2923.
  • Abadi, Martín, et al. "Tensorflow: Large-scale machine learning on heterogeneous distributed systems." arXiv preprint arXiv:1603.04467 (2016).
  • Zhang, Sixin, Anna Choromanska, and Yann LeCun. "Deep learning with elastic averaging SGD." arXiv preprint arXiv:1412.6651 (2014).
  • LeCun, Y., L. Bottou, and G. B. Orr. "Neural Networks-Tricks of the Trade, vol. 1524, ed. by G. Orr and K. Müller." (1998): 5-50.
  • Bengio, Yoshua, et al. "Curriculum learning." Proceedings of the 26th annual international conference on machine learning. 2009.
  • Zaremba, Wojciech, and Ilya Sutskever. "Learning to execute." arXiv preprint arXiv:1410.4615 (2014).
  • Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. PMLR, 2015.
  • Neelakantan, Arvind, et al. "Adding gradient noise improves learning for very deep networks." arXiv preprint arXiv:1511.06807 (2015).
  • Bradbury, James, et al. "Quasi-recurrent neural networks." arXiv preprint arXiv:1611.01576 (2016).
  • Balduzzi, David, and Muhammad Ghifary. "Strongly-typed recurrent neural networks." International Conference on Machine Learning. PMLR, 2016.
  • Jaeger, Herbert. "Echo state network." scholarpedia 2.9 (2007): 2330.
  • Gallicchio, Claudio, and Alessio Micheli. "Deep echo state network (deepesn): A brief survey." arXiv preprint arXiv:1712.04323 (2017).
  • Vlachas, Pantelis R., et al. "Backpropagation algorithms and reservoir computing in recurrent neural networks for the forecasting of complex spatiotemporal dynamics." Neural Networks 126 (2020): 191-217.
  • Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).
  • Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012): 1097-1105.
  • Khan, Asifullah, et al. "A survey of the recent architectures of deep convolutional neural networks." Artificial Intelligence Review 53.8 (2020): 5455-5516.
  • Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." Thirty-first AAAI conference on artificial intelligence. 2017.
  • Dang, Lanxue, Peidong Pang, and Jay Lee. "Depth-wise separable convolution neural network with residual connection for hyperspectral image classification." Remote Sensing 12.20 (2020): 3408.
  • Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  • Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. "Rectifier nonlinearities improve neural network acoustic models." Proc. icml. Vol. 30. No. 1. 2013.
  • Shang, Wenling, et al. "Understanding and improving convolutional neural networks via concatenated rectified linear units." international conference on machine learning. PMLR, 2016.
  • Clevert, Djork-Arné, Thomas Unterthiner, and Sepp Hochreiter. "Fast and accurate deep network learning by exponential linear units (elus)." arXiv preprint arXiv:1511.07289 (2015).
  • Agarap, Abien Fred. "Deep learning using rectified linear units (relu)." arXiv preprint arXiv:1803.08375 (2018).
  • Dumoulin, Vincent, and Francesco Visin. "A guide to convolution arithmetic for deep learning." arXiv preprint arXiv:1603.07285 (2016).
  • Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.
  • Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018
  • Kumar, Siddharth Krishna. "On weight initialization in deep neural networks." arXiv preprint arXiv:1704.08863 (2017).
  • Elfwing, Stefan, Eiji Uchibe, and Kenji Doya. "Sigmoid-weighted linear units for neural network function approximation in reinforcement learning." Neural Networks 107 (2018): 3-11.
  • Lu, Zhilong, et al. "LSTM variants meet graph neural networks for road speed prediction." Neurocomputing 400 (2020): 34-45.
  • Krause, Ben, et al. "Multiplicative LSTM for sequence modelling." arXiv preprint arXiv:1609.07959 (2016).
  • Wu, Yuhuai, et al. "On multiplicative integration with recurrent neural networks." arXiv preprint arXiv:1606.06630 (2016).
  • He, Kaiming, et al. "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.
  • Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." International Conference on Machine Learning. PMLR, 2019.
  • Kolesnikov, Alexander, et al. "Big transfer (bit): General visual representation learning." Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16. Springer International Publishing, 2020.
  • Lin, Tsung-Yi, et al. "Focal loss for dense object detection." Proceedings of the IEEE international conference on computer vision. 2017.
  • Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  • Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." Thirty-first AAAI conference on artificial intelligence. 2017.
  • Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
  • Bansal, Aayush, et al. "Pixelnet: Representation of the pixels, by the pixels, and for the pixels." arXiv preprint arXiv:1702.06506 (2017).
  • Zhang, Cheng, et al. "Advances in variational inference." IEEE transactions on pattern analysis and machine intelligence 41.8 (2018): 2008-2026.
  • Gulrajani, Ishaan, et al. "Pixelvae: A latent variable model for natural images." arXiv preprint arXiv:1611.05013 (2016).
  • Stuner, Bruno, Clément Chatelain, and Thierry Paquet. "Handwriting recognition using cohort of LSTM and lexicon verification with extremely large lexicon." Multimedia Tools and Applications 79.45 (2020): 34407-34427.
  • Wigington, Curtis, et al. "Start, follow, read: End-to-end full-page handwriting recognition." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
  • Martínek, Jiří, Ladislav Lenc, and Pavel Král. "Training strategies for OCR systems for historical documents." IFIP International Conference on Artificial Intelligence Applications and Innovations. Springer, Cham, 2019.
  • Sabour, Sara, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." arXiv preprint arXiv:1710.09829 (2017).
  • Patrick, Mensah Kwabena, et al. "Capsule networks–a survey." Journal of King Saud University-computer and information sciences (2019).
  • Choi, Jaewoong, et al. "Attention routing between capsules." Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 2019.
  • Fu, Jun, et al. "Dual attention network for scene segmentation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
  • Maity, Rajib. "Basic Concepts of Probability and Statistics." Statistical Methods in Hydrology and Hydroclimatology. Springer, Singapore, 2018. 7-51.
  • Liu, Linfeng, and Liping Liu. "Localizing and Amortizing: Efficient Inference for Gaussian Processes." Asian Conference on Machine Learning. PMLR, 2020.
  • Ng, Andrew, and Sparse Autoencoder. "CS294A Lecture notes." Dosegljivo: https://web. stanford. edu/class/cs294a/sparseAutoencoder_2011new. pdf.[Dostopano 20. 7. 2016] (2011).
  • Roy, Jean-Francis, Mario Marchand, and François Laviolette. "A column generation bound minimization approach with PAC-Bayesian generalization guarantees." Artificial Intelligence and Statistics. PMLR, 2016.
  • Ganin, Yaroslav, et al. "Domain-adversarial training of neural networks." The journal of machine learning research 17.1 (2016): 2096-2030.
  • Martens, James. "New insights and perspectives on the natural gradient method." arXiv preprint arXiv:1412.1193 (2014).
  • Pascanu, Razvan, and Yoshua Bengio. "Revisiting natural gradient for deep networks." arXiv preprint arXiv:1301.3584 (2013).
  • Ly, Alexander, et al. "A tutorial on Fisher information." Journal of Mathematical Psychology 80 (2017): 40-55.
  • Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
  • Nguyen, XuanLong, Martin J. Wainwright, and Michael I. Jordan. "On surrogate loss functions and f-divergences." The Annals of Statistics 37.2 (2009): 876-904.
  • Polyanskiy, Yury. "January 7, 2020 Typed by Suzanne Sigalla (ENSAE, CREST)." (2020).
  • Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010.
  • He, Kaiming, et al. "Delving deep into rectifiers: Surpassing human-level performance on imagenet classification." Proceedings of the IEEE international conference on computer vision. 2015.
  • Mishkin, Dmytro, and Jiri Matas. "All you need is a good init." arXiv preprint arXiv:1511.06422 (2015).
  • Mitrović, Dalibor, Matthias Zeppelzauer, and Christian Breiteneder. "Features for content-based audio retrieval." Advances in computers. Vol. 78. Elsevier, 2010. 71- 150.
  • Sheikhpour, Razieh, et al. "A survey on semi-supervised feature selection methods." Pattern Recognition 64 (2017): 141-158.
  • Kuhn, Max, and Kjell Johnson. Applied predictive modeling. Vol. 26. New York: Springer, 2013.
  • Chun-Lin, Liu. "A tutorial of the wavelet transform." NTUEE, Taiwan (2010).
  • Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
  • Gao, Junyu, Qi Wang, and Yuan Yuan. "SCAR: Spatial-/channel-wise attention regression networks for crowd counting." Neurocomputing 363 (2019): 1-8.
  • van de Geijn, Robert, and Margaret Myers. "Advanced Linear Algebra: Foundations to Frontiers." Creative Commons NonCommercial (CC BY-NC) (2020).
  • Kouemou, Guy Leonard, and Dr Przemyslaw Dymarski. "History and theoretical basics of hidden Markov models." Hidden Markov Models, Theory and Applications 1 (2011).
  • Chen, Long, et al. "Sca-cnn: Spatial and channel-wise attention in convolutional networks for image captioning." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  • Wang, Qilong, et al. "ECA-Net: efficient channel attention for deep convolutional neural networks, 2020 IEEE." CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2020.
  • Liu, Yang, et al. "An attention-gated convolutional neural network for sentence classification." Intelligent Data Analysis 23.5 (2019): 1091-1107.
  • Popescu, Gabriel. Quantitative phase imaging of cells and tissues. McGraw-Hill Education, 2011.
  • Hemanth, D. Jude, Deepak Gupta, and Valentina Emilia Balas, eds. Intelligent Data Analysis for Biomedical Applications: Challenges and Solutions. Academic Press, 2019.
  • Kehtarnavaz, Nasser. "Frequency Domain Processing." Digital Signal Processing System Design, 2008.
  • Cerna, Michael, and Audrey F. Harvey. The fundamentals of FFT-based signal analysis and measurement. Application Note 041, National Instruments, 2000.
  • L. Martignon. International Encyclopedia of the Social & Behavioral Sciences, Pergamon, 2001.
  • Sinha, Ankur, Pekka Malo, and Kalyanmoy Deb. "A review on bilevel optimization: from classical to evolutionary approaches and applications." IEEE Transactions on Evolutionary Computation 22.2 (2017): 276-295.
  • Torrey, Lisa, and Jude Shavlik. "Transfer learning." Handbook of research on machine learning applications and trends: algorithms, methods, and techniques. IGI global, 2010. 242- 264.
  • Hospedales, Timothy, et al. "Meta-learning in neural networks: A survey." arXiv preprint arXiv:2004.05439 (2020).
  • Wang, Xiaolong, et al. "Non-local neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  • Ullah, Amin, et al. "Action recognition in video sequences using deep bi-directional LSTM with CNN features." IEEE access 6 (2017): 1155-1166.
  • Zhang, Han, et al. "Self-attention generative adversarial networks." International conference on machine learning. PMLR, 2019.
  • Wick, Christoph, Christian Reul, and Frank Puppe. "Comparison of OCR Accuracy on Early Printed Books using the Open Source Engines Calamari and OCRopus." J. Lang. Technol. Comput. Linguistics 33.1 (2018): 79-96.
  • Yin, Wenpeng, et al. "Abcnn: Attention-based convolutional neural network for modeling sentence pairs." Transactions of the Association for Computational Linguistics 4 (2016): 259- 272.
  • Liu, Hanxiao, Karen Simonyan, and Yiming Yang. "Darts: Differentiable architecture search." arXiv preprint arXiv:1806.09055 (2018).
  • Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
  • Cheng, Jianpeng, Li Dong, and Mirella Lapata. "Long short-term memory-networks for machine reading." arXiv preprint arXiv:1601.06733 (2016).
  • Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
  • Snell, Jake, Kevin Swersky, and Richard S. Zemel. "Prototypical networks for few-shot learning." arXiv preprint arXiv:1703.05175 (2017).
  • Chen, Ting, et al. "Self-supervised gans via auxiliary rotation loss." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
  • Chen, Ting, et al. "Self-supervised gans via auxiliary rotation loss." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
  • LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
  • Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
  • He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  • Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
  • Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
  • Howard, Andrew, et al. "Searching for mobilenetv3." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
  • Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
  • Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE international conference on computer vision. 2015.
  • Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015): 91-99.
  • Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
  • Felzenszwalb, Pedro F., and Daniel P. Huttenlocher. "Efficient graph-based image segmentation." International journal of computer vision 59.2 (2004): 167-181.
  • Liu, Wei, et al. "Ssd: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
  • Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  • Mirza, Mehdi, and Simon Osindero. "Conditional generative adversarial nets." arXiv preprint arXiv:1411.1784 (2014).
  • Radford, Alec, Luke Metz, and Soumith Chintala. "Unsupervised representation learning with deep convolutional generative adversarial networks." arXiv preprint arXiv:1511.06434 (2015).
  • Arjovsky, Martin, Soumith Chintala, and Léon Bottou. "Wasserstein generative adversarial networks." International conference on machine learning. PMLR, 2017.
  • Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-consistent adversarial networks." Proceedings of the IEEE international conference on computer vision. 2017.
  • Liu, Steven, et al. "Diverse image generation via self-conditioned gans." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.
  • Liu, Shichen, et al. "Generalized zero-shot learning with deep calibration network." Advances in Neural Information Processing Systems. 2018.
  • Wang, Mei, and Weihong Deng. "Deep visual domain adaptation: A survey." Neurocomputing 312 (2018): 135-153.
  • Xian, Yongqin, et al. "Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly." IEEE transactions on pattern analysis and machine intelligence 41.9 (2018): 2251-2265.
  • Brown, Tom B., et al. "Language models are few-shot learners." arXiv preprint arXiv:2005.14165 (2020).
  • Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
  • Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Learning deep features for discriminative localization." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921-2929. 2016.