A Leap among Quantum Computing and Quantum Neural Networks: A Survey

In recent years, Quantum Computing witnessed massive improvements in terms of available resources and algorithms development. The ability to harness quantum phenomena to solve computational problems is a long-standing dream that has drawn the scientific community’s interest since the late ’80s. In such a context, we propose our contribution. First, we introduce basic concepts related to quantum computations, and then we explain the core functionalities of technologies that implement the Gate Model and Adiabatic Quantum Computing paradigms. Finally, we gather, compare, and analyze the current state-of-the-art concerning Quantum Perceptrons and Quantum Neural Networks implementations.


INTRODUCTION
Artificial Intelligence (AI) has intrigued and puzzled many generations of scientists and largely fueled novelists imaginations. The modern definition of AI -as the ensemble of computer systems empowered with the ability to learn from data through statistical techniques -dates back to 1959. Machine Learning (ML), a subclass of AI, is a discipline that aims to study algorithms that can learn from data to perform tasks without following explicit instructions.
Often, these algorithms are based on a computational model that belongs to differentiable programming techniques, called Neural Networks (NNs). The success of such algorithms resides in their ability to learn to achieve a specific goal [93,116], i.e., they learn to discover hidden patterns and relations among data to fulfill the task at hand [87,115].
Mathematically, NNs are made of a sequence of transformations, called layers, composed of linear operators and elementwise nonlinearities. Then, the goal of learning is to modify the transformations' parameters to fulfill a task successfully. Whenever a model accounts for more than a couple of such layers, it is called a Deep Learning (DL) model or a Deep Neural Network (DNN). Thanks to their representation power and the development of new technologies and training algorithms, DL models obtained astonishing results in the last two decades, achieving superhuman performance on certain tasks [174]. However, higher performances require more complex models and larger datasets to train them, thus constantly increasing the hunger for resources and power to learn to solve a given problem.
In this regard, quantum computers might offer new solutions that exploit quantum phenomena such as interference, superposition, and entanglement. Such a characteristic is expected to speed up the computational time and to reduce the requirements for extensive resources, yielding the concepts of quantum advantage and quantum supremacy [134,145].
The mentioned quantum phenomena are described within the framework of quantum mechanics [61,91,159], a "young" physics theory formalized at the beginning of the 20th century. Such a theory unveils the intrinsic statistical characteristic of nature, a behavior that unfortunately is hidden from us, at least in the macroscopic world.
The quest for a quantum computer started with the ideas of Benoff [23] and Feynmann [73,121], in the 1980s, pointing out that quantum computers should be more efficient in solving specific problems. For example, quantum devices might help in studying very complex, e.g., entangled, systems by emulating or simulating [66,74] their behavior in chips that are quantum by their nature [150,193].
From a computer science point of view, a quantum computer represents a computational model based on the principles and postulates of quantum mechanics. Such techniques aim at embracing the power of quantum theory into a device that can be conveniently "programmed" to fulfill a given task. Moreover, the result of the computation itself might represent a quantum object encoding different answers, each solving a specific problem [59]. It is only recently that researchers have succesfully realized a quantum processor capable of performing controllable computations based on quantum phenomena. Several industrial applications already benefited from such a technology, such as: chemistry [104], optimization problems [132], finance [67], quantum sensing [56], and quantum imaging [48], among others. Besides the mentioned applications, quantum computers might offer advantages in terms of energy management compared to classical ones. Indeed, quantum computers are expected to be more energy-efficient than supercomputers considering [10]. For that reason, hybrid approaches might offer exciting solutions to lower the energy consumption for a given computation by moving the high-energy portion of the computation on the Quantum Processing Unit (QPU) while leaving the low-energy one to the cloud [182]. A fundamental result of quantum information theory is the observation that, although quantum phenomena allow solving some classical problems more efficiently than classical computations [27], quantum computers cannot compute any function which is not turing-computable [58].
Concerning the context of Quantum Machine Learning (QML) [30], a relevant contribution comes from the ability of quantum devices to speed up linear algebra-related tasks. For example, it has been shown that using the Harrow-Hassidim-Lloyd (HHL) quantum algorithm [85,198] to sample solutions for a system of linear equations offers an exponential speedup over its classical counterpart. Furthermore, Shor's algorithm [171,172] for integer factorization and Grover's algorithm [80] for searching unstructured databases are additional examples of procedures that benefit from a quantum formulation. Although general QML is of great interest, it is not the central topic of our work as it has already been covered extensively in the literature. Therefore, we focus on the QML sub-area concerning recent Quantum Neural Networks (QNNs) approaches. Despite the fact that their name recalls the neural network structure mentioned earlier, they are characterized by a completely different design.
Concerning quantum computations, the most commonly adopted paradigms are the Gate Model (GM) [134] and the Adiabatic Quantum Computation (AQC) [11]. In Inspite of being equivalent up to a polynomial overhead, they represent two profoundly different computation approaches. The first one is based on the concept of "gate": a unitary transformation applied to one or more quantum bits (qubits), i.e., the basic units of quantum information (see section 3).
Instead, in the AQC, one typically encodes the desired objective function into a quantum system and then lets it evolve.
However, in both paradigms, the QPU state at the end of the evolution embodies the answer to the given problem.
Manuscript submitted to ACM Thus, we can highlight the main differences between the two approaches as follows. The GM allows users to control the transformation to apply on the qubits directly and is discrete in time, while AQC does not allow to control single qubits directly and does not discretize the time property of the system.
Unfortunately, quantum technologies are still at their dawn, having minimal computational capacity. Furthermore, significant technological challenges arise from the requirement for quantum systems to be isolated from the environment in order to avoid decoherence, which causes lack of information stored in the quantum device. Therefore, researchers typically rely on quantum simulators to test their ideas while waiting for the next generation of quantum computers.
Whether quantum supremacy is real or not is still an open question. For example, we do not expect quantum computers to solve efficiently worst-case NP-hard problems like combinatorial ones. Instead, we do expect that we will be able to find a better approximation to the solution or find it faster than a classical computer [109,146].
Stemming from those considerations, we conceived this survey to offer both the neophyte and the more experienced reader insights into several fundamental topics in the quantum computation field. Moreover, noticing that the literature lacks a detailed discussion about the latest achievements concerning Quantum Perceptrons (QPs) and QNNs, we gather, analyze and discuss state-of-the-art approaches related to those topics. Notably, we can summarize our work as follows: • we review the current state-of-the-art regarding QPs and QNNs by discerning among theoretical formulations, simulations, and implementations on real quantum devices; • we report the main achievements by different research groups concerning the topic of quantum supremacy; • we provide a gentle introduction to several basic notions about quantum mechanics, quantum information, and quantum computational models; • we collect and organize the most relevant papers to this survey on a GitHub 1 page allowing the interested readers to easily and quickly browse through the literature.
Moreover, to ease the understanding of the paper's content, in Appendix B we summarize basic notions about the Dirac notation, postulates of quantum mechanics, the physical realization of qubits, the Bloch Sphere representation, the Variational Principle, and the Adiabatic Theorem. We suggest the reader who is unfamiliar with these topics to peruse the Appendix B before reading any further.
Concerning the remaining part of the paper, it is organized as follows. In section 2, we report on other surveys on the topic at hand. In section 3, we introduce the fundamental concept of a qubit, and we give the reader a brief overview of the currently most widespread quantum computational models. Then, in section 4, we tackle the concept of quantum speedup. Subsequently, in section 5, we move to the core topic of this survey, i.e., QNN approaches. Finally, the conclusions are drawn in section 6. In Table 1, we report a summary of the notations used throughout the manuscript, while a summary of the used abbreviations is available in Table 5 in the Appendix A.

OTHER SURVEYS
In the literature, many review papers have been devoted to the realm of quantum computation and information. Several surveys address quantum ML and its applications. However, very little has been said about QNNs, and reviews that include this topic often cover it marginally or lack a detailed discussion on the most recent achievements. For this reason, we devote our work mainly to reviewing Variational Learning approaches which encompass Quantum Approximate Optimization Algorithms, Variational Quantum Eigensolver, Quantum Perceptrons, and Quantum Neural Networks. linear systems [107], PCA [122], and variational generation for adversarial learning [151], just to cite some. Biamonte et al. [30], and Schuld et al. [164] gave an overview of the achievements and challenges of quantum-enhanced machine learning algorithms, considering aspects of both gate and adiabatic computational models. Basic algorithms for QML, such as Grover's algorithm [80], quantum state "similarity" estimation, and the HLL algorithm [85], are reviewed in [197]. The authors also illustrate how these algorithms (based on the quantum gate model) have been used in literature to speed up standard ML algorithms, like Support vector machine, -means clustering, PCA, and Linear discriminant analysis. Furthermore, in [2,3] the authors review quantum solutions for binary classification and k-Nearest Neighbour problems. Ciliberto et al. [50] covered quantum machine learning approaches with emphasis on theoretical aspects, such as computational complexity analysis, discussion on limitations of QML, and challenging aspects like learning under noise. Theoretical aspects of ML using quantum computers are also discussed in [15]. Many other survey papers overview quantum algorithms for ML and their applications, see for example [9,29,64,65,125,148,162] and references therein.
Benedetti et al. [22] reviewed parametrized quantum circuits, which are arguably the quantum analogous of NN models. The Variational Quantum Algorithms, which employ parameterized quantum circuits, are also discussed in [44].
Kamruzzaman et al. [103] reviewed the status of (theoretical) Quantum Neural Networks and their limitations. Allcock and Zhang [13] focused on quantum generalizations and applications of some popular neural-network concepts, like Boltzmann machine, generative adversarial networks, and autoencoders. Recently, Magini et al. [124] gave a concise overview of the main directions that have been taken to develop quantum artificial neural networks.

QUBIT AND QUANTUM COMPUTATIONAL MODELS
In this section, we first introduce the concept of qubit and then give a brief overview of the two main paradigms of quantum computation: the Gate Model (GM), or Universal Quantum Computing, and the Adiabatic Quantum Computation (AQC). A comprehensive analysis of the mentioned approaches is out of scope for our work. However, in what follows, we try to empower the reader with a basic grasp of those two types of computations.

Qubit
The qubit, short for "quantum bit", represents the fundamental unit of information for quantum devices. It is the quantum correspondent of the bit. However, apart from the last part of its name, the qubit does not share much with its classical cousin. Similar to the bit, which assumes two values only (0 or 1), the qubit can be "observed" (i.e., measured) in two possible states, typically indicated as |0⟩ and |1⟩. However, differently from the bit, a qubit can also be in a so-called superposition of states before the measurement. Intuitively, that means that the qubit can be in either the state |0⟩ or |1⟩ with a certain probability. Notwithstanding, when measured, the qubit state "collapses" into one of them. The qubit is a rather abstract concept since such an object does not exist in the real world. Instead, we relate it with artificial atoms, i.e., physical systems able to emulate the behavior of an atom. Specifically, we are typically interested in emulating the behavior of two-state systems that satisfy a certain number of hard constraints such as small dissipation and isolation from the environment.
From a mathematical point of view, a qubit state | ⟩ is a unit vector in C 2 . The symbol |·⟩ we just used is inherited from Dirac [60] formalism 2 employed in the quantum mechanics. In a nutshell, the object | ⟩, called ket, represents a vector in a Hilbert space. Similarly, ⟨ |, called bra, is defined as the adjoint (or dual) of such a ket. We remind the reader unfamiliar with the Dirac formalism that the symbol within a ket or bra, e.g., the " " used above, can be an arbitrary label (e.g., a letter or a number). Note that often binary labels are used, especially to indicate the vectors of a space basis.
For example, the basis 1 0 , 0 1 of C 2 , referred to as the computational basis, is typically indicated as {|0⟩, |1⟩}, or sometimes also as {|↑⟩, |↓⟩}. Using this formalism, a qubit state can be represented as | ⟩ = |0⟩ + |1⟩, where and , called amplitudes, are complex numbers such that | | 2 + | | 2 = 1 3 . Therefore, a qubit state is a coherent superposition of the computational basis states |0⟩ and |1⟩. However, this does not mean that a qubit has a value between |0⟩ and |1⟩ but rather that it is not possible to say whether the qubit is definitively in the state |0⟩, or definitively in the state |1⟩.
In fact, when we measure a qubit we observe either |0⟩ with probability | | 2 , or |1⟩ with probability | | 2 . After the measurement, the qubit state collapses to whatever value (|0⟩ or |1⟩) was observed, irreversibly losing memory of the former and amplitudes. Please note that different kinds of measurements exist; the one we referred to above is called measurement concerning the computational basis, which is among the most widely used. 4 We now turn our attention from the theoretical definition to a more physical one. Any physical system whose state space can be described by C 2 can serve as a qubit's physical realization. 5 These systems are referred to as quantum two-level systems, as their state can be described by a vector in a 2-dimensional Hilbert space. The state of an isolated quantum mechanical system composed by two-level system, called quantum -register, is described by a vector in a 2 -dimensional Hilbert space (H 2 ). This means that a state of an -register can represent the superposition of 2 basis states, which is the cornerstone of quantum parallelism. Formally, an n-register state can be described by the linear combination of the basis vector {|0⟩, . . . |2 − 1⟩}: If we measure the state | ⟩ with respect to the computational basis, we get one of the basis state labels with probability ( ) = | | 2 and the corresponding post-measurement state is | ⟩. Usually, binary labeling is used to denote the 2 basis states. For example, the state of a two-qubit system can be expressed as | ⟩ = 00 |00⟩ + 01 |01⟩ + 10 |10⟩ + 11 |11⟩ with , | | 2 = 1. The composition of two or more quantum systems is represented by the tensor product. For example if two systems A and B have the states | ⟩ = |0⟩ + |1⟩ and | ⟩ = |0⟩ + |1⟩, respectively, then the system C composed by A and B has the state | ⟩ = | ⟩ ⊗ | ⟩ = |00⟩ + |01⟩ + |10⟩ + |11⟩. Finally, note that a state | ⟩ ∈ H 2 of an -register is called entangled if it that cannot be decomposed as the tensor product of single qubit states.

Gate Model
At the heart of the GM, there is the concept of circuit model, i.e., a sequence of building blocks that realize elementary operations. Such building blocks are called gates. Thus, a gate encodes a well-controlled operation acting on a single qubit or a subset of qubits [134,146] in a given system. When acting on more than one qubit, these gates can give rise to the phenomenon of entanglement establishing a strong correlation among qubits. 3 since a qubit has more states available than simply two levels, often it is useful to visualize it as a point on the so-called Bloch sphere, which is a unit sphere in a three-dimensional Euclidean space with the north and south pole corresponding to the computational basis states |0⟩ and |1⟩, respectively (see subsection B.4 in the Appendix ). 4 other measurements are discussed in the appendix (subsubsection B.2.3). 5 some physical realizations of qubits are discussed in the appendix (subsection B.3).
As a direct consequence of the postulates of quantum mechanics 6 , quantum gates are represented by unitary operatorsˆ, i.e.,ˆ †ˆ=ˆ [18,134]. This property automatically translates into saying that all the quantum gates must be reversible, unlike classical logic gates. However, there are exceptions to such a rule. For example, measurements are transformations on qubits that are allowed not to be reversible.
Being unitary operators, gates can be represented in different ways. Although it might be easier to formalize them as matrices, typically the Dirac notation, or the outer product among state vector, is leveraged: where {| ⟩} =0,...,2 −1 is the currently used basis. Hence, in the simplest case of a single-qubit gate, it is represented by a 2 × 2 unitary matrix whose coordinates are completely specified by its action on the basis states. More formally, single-qubit gates describe transformations that belong to the Lie group of 2 × 2 unitary matrices with determinant 1, called special unitary group of degree 2 ( (2)) [18].
A particularly useful set of one-qubit gates are the Pauli gates 7 that, together with the identity operator, span the vector space formed by all one-qubit unitary operators. In other words, any one-qubit gate can be expressed as a linear combination of the Pauli gates. It is worth mentioning that the Pauli gatesˆ,ˆ, andˆcorrespond to rotations around the -, -and -axes of the Bloch sphere, respectively. As an example, theˆmatrix represents an operation called the "bit flip", or NOT gate, which maps the |0⟩ to |1⟩ and vice versa, in the computational basis: Interestingly, we can notice that the operation in Equation 3 resembles the NOT gate in classical computations.
Apart from the single-qubit gates, there are unitaries that involve two or more qubits. Perhaps, the most famous ones are the Controlled-NOT (CNOT) and Toffoli gates. Note that many multi-qubit gates are designed to perform "controlled" operations, meaning that an operation is executed on a qubit, named the target qubit, if another qubit (called control qubit) is in a specific state. For example, the CNOT gate applies the NOT gate operation to the target qubit only when the control one is in the state |1⟩. It is formalized as: and | ⟩ are the control and target qubit, respectively.
An exciting result coming from the GM paradigm is that not all gates are "fundamental" for computations. Indeed, as it happens in the classical case, there is a set of quantum gates, named universal, that can be used to approximate to arbitrary accuracy any quantum circuit [134]. The Hadamard (H), Controlled-NOT (CNOT), phase (S), and /8 (T) gates constitute a universal quantum computation set of gates. Although the S-gate can be constructed from the T-gate, it is typically considered an element of the universal set due to its extensive usage in fault-tolerant circuit construction.
As mentioned above, a circuit is made of a sequence of gates acting on a given set of qubits. To characterize a circuit, two metrics are typically reported, namely, the width and the depth. The first one refers to the number of qubits involved in the calculations, while the second one represents the longest path in the circuit, resembling the largest number of operations applied to a given qubit from the beginning to the end of the computations. In Figure 1, we report an example of a circuit in which we applied single-and two-qubit gates. Specifically, the circuit represents one of the most famous algorithms in the scientific community called the Deutsch's algorithm [59], which determines if a given As we can see from Figure 1, a quantum circuit is depicted as a set of qubits, represented by lines, to which the gates (squares) are applied, and where the order from left to right represents the flow of time. However, it is worth noting that there is not an actual "flow" of the qubits in a quantum device. Indeed, the state of qubits at each given time represents the state of the QPU, while gates are the transformations applied to them to change the device's status.

Adiabatic Quantum Computation
In the previous section, we introduced the GM in which a qubits' state is evolved by applying a series of gates. In such a design, the time evolution is discretized since the different operations are performed at subsequent time instants. Quite differently, the Adiabatic approach to quantum computation [71] leverages a time-continuous evolution of the qubits' state according to the Schrödinger equation [159]. Moreover, it does not require nor allow to apply any transformations, such as gates, directly on qubits. Instead, everything is encoded into a quantum operator, called Hamiltonian (ˆ), that describes the forces to apply to a given system of qubits to move it into the desired state over time. However, it is possible to show that these two approaches are polynomially equivalent [7,71] by using the technique introduced in [121]. Such a goal is reached by discretizing the evolution time interval and applying the Trotter formula [71] to each time segment.
For a deep dive into the AQC, we refer the reader to the comprehensive and fascinating review by [11]. In what follows, we assume that the reader knows what a Hamiltonian is or at least its definition in classical mechanics. For our purposes, we need to consider thatˆcan be interpreted as the energy operator, in the sense that its eigenvectors are the energy eigenstates, and its eigenvalues are the energies of the corresponding eigenvectors. Hence, as it will become obvious from the following discussion, in the AQC, any problem is recast as an energy minimization one. The key ingredient of the AQC is thatˆis not constant, rather it varies with time:ˆ→ˆ( ). The AQC derives its name from the Adiabatic Theorem [131] which asserts how to follow the evolution of a system whenˆ( ) varies slowly enough over time. 8, 9 [6, 35, 70].
We now briefly report what a "slowly enough varying" Hamiltonian means. As previously mentioned, AQC exploits the continuous time-evolution of a system of qubits within a time interval [0, ], where represents the end of the adiabatic evolution. Commonly, the varying Hamiltonian is expressed as a one-parameter function, given byˆ( ) 1]. Note that the eigenvalues ofˆ( ), indicated as ( ), represent different energy levels of the system ordered with increasing values of ≥ 0 (e.g. the ground state energy is 0 ( )). To ease the understanding of 8 for the formal definition and proof of the Adiabatic Theorem, we refer the reader to subsection B.7 in the Appendix. 9 the Adiabatic Theorem states that if a system is initially in the ℎ state of well-defined energy, it will stay in this state when the Hamiltonian is changed sufficiently slowly [31]. However, we focus our attention on the ground state because, in adiabatic quantum computing, a problem is typically encoded as a Hamiltonian whose ground state is the problem solution.
the following concepts, we indicate the eigenstate forˆ( ) related to the energy eigenvalue ( ) as | ( )⟩, where we stressed the time dependence through the parameter . Therefore, the instantaneous eigenstates forˆ( ) are the states satisfyingˆ( )| ( )⟩ = ( )| ( )⟩. For example, the ground state ofˆ( ), | 0 ( )⟩, is characterized bŷ ( )| 0 ( )⟩ = 0 ( )| 0 ( )⟩. Given such definitions and restricting our interest to the ground state ofˆ( ), the adiabatic theorem ensures that if it is guaranteed that the difference among the ground and the first excited state energies, Δ = min 0≤ ≤1 1 ( ) − 0 ( ) , remains large enough throughout the entire adiabatic evolution, then the system is likely to lie in the ground state of the instantaneous Hamiltonian. Specifically, such a requirement directly affects the length of the adiabatic evolution since ∝ Δ −2 . As mentioned above,ˆ( ) is interpolated between an initial (ˆ) and a final (ˆ), also called the problem, Hamiltonians. Specifically, the "problem" operator encodes the solution to a given problem in such a way that its ground state configuration, | 0 ( = 1)⟩, expresses the solution to the optimization problem at hand.
Concerning the AQC, the strategy is first to prepare the qubits in the ground state ofˆ, and then the operator is evolved towardsˆby following the so-called adiabatic pathˆ( ) = (1 − )ˆ+ˆ. From a mathematical perspective,ˆand can be defined in terms of the Pauli-matrices. Concerning a system made by -qubits, the initial Hamiltonian can be defined as:ˆ= To obtain such a state, one couples a magnetic field, called the transverse or disordering field, in the -direction to each quantum bit. The choice of such a shape forˆrelates to specific symmetries that the operator must satisfy to allow for the sought computations. For example, such a choice is mandatory to avoid the so-called "level crossing" [71]. Moreover,ˆis typically expressed in terms of theˆoperator so that the final configuration of the qubit system is represented by eigenvectors ofˆ, i.e., each qubit can be found either in the |0⟩ or |1⟩ state. Such a configuration is nothing more than a classical sequence of bits. Thus, by starting from an initial quantum superposition, the adiabatic process evolves the state into a classical configuration which can then be read to return the solution to the problem at hand. One of the first "proof of concenpt" to show the effectiveness of such a computation model was first realized in [70] for solving instances of the satisfiability problem. Apart from such a first example, many optimization problems can be recast as an energy minimization problem, such as the Grover search algorithm [71,72,80]. We report in Figure 2 a schematic representation of an adiabatic path representing the Deutsch's algorithm [59]. The reader might compare such an image with Figure 1 to notice the difference between the two designs.

THE QUEST FOR QUANTUM SUPREMACY
The first claim about an attainable advantage with a hypothetic quantum computer upon a classical one is attributed to Feynmann [73]. However, at that time, such an idea concerned quantum simulations of physical systems rather than universal computations. Indeed, considering the simulation of a dynamical system characterized by = 2 degree of Manuscript submitted to ACM freedoms, one can immediately see that a classical simulation living in a 2 -dimensional Hilbert space requires resources that grow exponentially with the size of the problem, while only polynomially if using a quantum device [121].
However, soon after, researchers started to think about gaining an advantage over generic classical computations by leveraging quantum effects. Thus, the concepts of "quantum advantage" (or "quantum speedup") and "quantum supremacy" [86,145] appeared in the literature. Despite the fact that both terms refer to the same principle, loosely speaking, one can distinguish them considering what follows. The first one typically refers to the ability of a quantum computer to perform a given computation faster than a classical one. Differently, the second one assesses the ability of a quantum computer to find solutions to a problem not resolvable by a classical computer (or at least not in a reasonable amount of time). Thus, even though both terms express the same principle, sometimes they are used with slightly different purposes.
One of the most important phenomena for quantum computations is the entanglement, which allows propagating the action on a qubit to others and compressing the resource requirements to describe a given state. Highly entangled states cannot be simulated efficiently by classical systems, therefore proving the advantage of quantum devices over classical ones. Interestingly, assuming the existence of computational tasks beyond any classical computer's capability that can be solved with a universal quantum computer, then in that case, it could be possible to refute the "extended Church-Turing thesis". As we mentioned previously, concerning an algorithmic point of view, the size of the Hilbert space grows exponentially for a classical algorithm while only polynomially in the quantum case. Moreover, quantum phenomena such as entanglement, superposition and tunnelling, help to navigate such a vast space and should allow for a quantum speedup [4,85,121]. Concerning the GM, Deutsch [58] was among the first ones to show that a quantum circuit can be built of reversible quantum logic gates to compute any classical function defined on bits. In 1992, the first example of quantum advantage in computation was formalized in [59] in which the authors showed that for a specific class of problems, a quantum device required exponentially less time to solve them than any deterministic classical approach.
When talking about quantum advantage or speedup, two algorithms are typically cited: Grover's search [80] and Shor's factoring [171] algorithms. However, there is a fundamental difference in how they obtain the advantage. Indeed, while for the first one the "modest" quadratic speedup over the classical formulation is mathematically demonstrated, concerning the second one this is not the case. Specifically, all that can be said about Shor's algorithm is that it reaches an exponential speedup over the most efficient classical analogous available today. However, nothing forbids that a new classical formulation might reduce or eliminate such a gap.
Generally speaking, there are several definitions of quantum speedup. "Provable quantum speedup" stems from the existence of a mathematical proof that there is not any classical algorithm that can perform better than the quantum one [25,80]. A different concept is expressed by the "strong quantum speedup" [139] which considers the performance of the best classical algorithm. Shor's algorithm [171] is an instance of such a definition. Indeed, although classical algorithms require super-polynomial cost in the number of digits, the proof of an exponential lower bound for classical factorization has not been found yet. Thus, one typically adopts the concept of quantum speedup by referring to the comparison among the best known classical algorithm, which might not be the best possible one, and its quantum counterpart.
Another class of problems probed to prove and harness a quantum advantage is the sampling problems' class [86], such as: constant-depth circuits [181], boson sampling [1], and random quantum circuits containing commuting and non-commuting gates [33,38,169]. Such examples are somehow in between the factoring algorithm [171] and analog quantum simulators [49,51,74].
In [153], the authors performed experiments on a "D-Wave Two" quantum annealer. As a benchmark, they tasked the QPU with finding the ground state of a 2D planar graph of an Ising spin glass model, which is known to be an NP-hard problem [17]. The authors compared the results from simulated annealing [112], simulated quantum annealing [127,157], and the actual quantum device from D-Wave [26,84,98,99]. However, they did not observe any evidence for a genuine quantum speedup from the experiments. Recently, Google claimed to have reached quantum supremacy [16]. In their work, the authors proposed an ad-hoc experiment involving a QPU containing 53 qubits and a circuit depth of 20 and claimed that a classical computer would have required thousands of years to simulate the obtained results. However, several research groups immediately replied to such a claim by showing that it was possible to simulate such a quantum computer in just slightly more than two days [140] or even less [199]. Such profoundly different results witness the difficulty that researchers typically face when trying to estimate the real power of Noisy Intermediate-Scale Quantum (NISQ) devices.
We conclude this brief overview over the concept of quantum supremacy and the various attempts to harness it by observing that, despite the tremendous efforts that scientists are devoting to embracing the quantum phenomena, such a goal is still far from being reached. Moreover, it is clear that any claim about quantum supremacy must undertake a detailed analysis based on the most recent achievements in classical simulation algorithms [39,102,147].

QUANTUM LEARNING
The research on Quantum Neural Networks is a quest started more than twenty years ago [14,68,163,165]. Studies from Behrman et al. [21] and Toth et al. [184] are examples of seminal works that introduced the concept of QNNs.
Behrman et al. [21], described a mathematical model based on quantum dot molecules and showed by simulations that such a model was able to realize any classical logic gate. Instead, Toth et al. [184], proposed a more biologically inspired architecture, where quantum dots were coupled to form a cellular structure in which near-neighbor connectivity allowed the information to flow. The realization of a quantum analogue of an Artificial Neural Network (ANN), as an algorithm able to combine features from both the classical and quantum worlds, is a long-standing dream for the scientific community [163]. In the classical realm, a feedforward ANN is a universal approximator of continuous functions [95]. Thus, a QNN should at least satisfy such a requirement.
Classically, an ANN is made by a sequence of layers. Each of them applies a specific mathematical transformation to its input and produces an output taken as input by the subsequent layer in the network. Moreover, the various layers are typically interleaved by non-linear functions to enhance the ANN representation power. Indeed, if only linear operations are considered, one could reduce the entire network to a single affine mapping. ANNs owe their name to their structure loosely inspired by the human brain. Moreover, still based on a biological analogy, the basic building block of ANNs is the artificial neuron, called perceptron, a computational model proposed initially by Rosenblatt [154].
These algorithms typically deal with large amounts of data aiming at finding patterns to describe them by surfing the parameters' space. Thus, they are perfect candidates to leverage the advantage of quantum phenomena that allows navigating more efficiently high dimensional spaces.
It is worth mentioning that classical deep ANNs have been exploited to solve quantum problems in disciplines such as chemistry [118,144,167] and physics [40,47,135] among others. In such contexts, a DL model can be trained, for example, to predict the quantum mechanical wavefunction for a given many-body system [42], to predict the interatomic potential energy surfaces [176], to identify distinct phases of matter and the transitions between them [43], to mention some. In spite of this topic being fascinating, a detailed discussion about these techniques is out of scope for our work. Thus, in what follows, we will focus on approaches based on quantum models only.
Concerning the quantum realm, in 1994, Lewenstein [117] was among the first ones to propose a quantum-mechanical perceptron as an ideal basic element that processes input quantum states through unitary transformations. Since then, several studies have been conducted to formulate QNNs as made up of basic cells such as QPs or as entirely new computing architectures. Indeed, the design of QNNs can be much different from that of classical ANNs meaning that the first ones do not require to be made of several layers of QPs. For example, they can be realized as variational circuits (see subsection 5.4) without relying on direct implementation of a basic cell resembling the quantum analogue of a perceptron.
In the following sections, we first introduce the concept of quantum embedding (subsection 5.1) and then we describe variational algoritmhs (subsection 5.2). Subsequently, we report several designs for QPs (subsection 5.3) and QNNs (subsection 5.4). Finally, we discuss trainability issues, e.g., barren plateaus (subsection 5.5). We focus primarily on the GM computational paradigm since it is currently the most widely adopted approach in such a context, and it offers a closer analogy to classical approaches, especially from a computer scientist's point of view. We also report some interesting formulations based on AQC. We leave a more detailed analysis of the results obtained with the AQC paradigm for future work.

Encodings of data
A quantum machine learning algorithm 10 , such as a QNN, accepts quantum states as input by its nature. Hence, classical data must be translated into quantum states and transferred from a classical memory to the quantum device, a process called "state preparation".
The basis and the quantum amplitudes encodings [160] are two fundamental examples of quantum data embeddings.
The basis encoding (also called bit encoding) associates each data input with a computational basis state of a qubits system. This is equivalent to saying that each data item x ∈ R is first transformed to a binary string b of length and that each binary string is uniquely associated with a computational basis state of an -qubit system, |b⟩. is represented by the quantum state |0010010110⟩ of 10 qubits. Amplitudes encoding, instead, associates real vector components with quantum amplitudes with the only caveat that the original real vectors must be normalized. Formally, given a vector ∈ R , where = 2 , we have the following mapping: where | ⟩ represents the -th computational basis state for an -qubit system. For Note that the amplitude encoding requires exponentially fewer qubits than the base encoding to embed a vector into a quantum state. An entire dataset can be represented in the computational basis by considering the amplitude encoding of the concatenation of all the input data. Hence, the main advantage of this encoding is that it only requires log 2 qubits to encode a dataset of size and dimensionality [160].

Variational Algorithms
Currently, the leading approach to the optimization of quantum circuits [22] exploits Variational Quantum Algorithms (VQA) [44,81,129,142], hybrid quantum-classical algorithms that leverage both quantum as well as classical components.
Their name is inspired by the well known result from physics called the Variational Principle, which provides an upper bound of the ground state's energy of a quantum system 11 . Generally speaking, the Variational Principle gives a recipe to construct a trial state (i.e., a quantum state parameterized by learnable parameters) for optimization purposes.
From a computer science point of view, one can formulate a VQA as follows. A hybrid quantum-classical architecture comprising a set of quantum operations, the ansatz, that are applied to the initial qubits state to produce a final state, called trial state. The ansatz is characterized by a set of gates controlled by some classical parameters whose values are modified with the help of a classical procedure, e.g., gradient descent. Therefore, the design of such algorithms typically relies on three main components. The first one is the ansatz, also referred to as "parametrized quantum circuit" or "variational circuit", which sometimes includes the embedding of classical data into a quantum state. Subsequently, one must specify an objective function L , whose value is used to drive the optimization procedure, and finally a classical optimization procedure is used to modify the ansatz's parameters that define the transformations applied on qubits. Hence, it is clear that VQAs represent the practical embodiment of the idea of training quantum computers as we train classical neural networks. We report in Figure Fig. 3. Schematic view of a VQA. The quantum device implements a parametrized quantum circuit ( ) that given an initial state |Ψ 0 ⟩ prepares the trial state |Ψ( ) ⟩ = ( ) |Ψ 0 ⟩. Observable measurements on the trial state are used to return estimates of expectation values (e.g., the state of a certain qubit), which are used to evaluate an objective function L . A classical optimization algorithm (e.g., gradient descent) is employed to update the circuit parameters.
The first application of such class of methods is the Variational Quantum Eigensolver (VQE) [104,142,191], initially proposed to find the lowest energy state (the ground state) of a quantum system. Given the Hamiltonianˆof a system, the Variational Principle tells us that the ground state | * ⟩ corresponds to the normalized state | ⟩ that minimizes the quantity ⟨ |ˆ| ⟩ expressing the energy of the system in the state | ⟩. That is | * ⟩ = argmin | ⟩ ⟨ |ˆ| ⟩, and = ⟨ * |ˆ| * ⟩. Therefore, the ground state can be approximated by optimizing the parameters of an ansatz ( ) in order to minimize the expectation value L = ⟨ ( )|ˆ| ( )⟩, where | ( )⟩ = ( )|Ψ 0 ⟩ for a given initial state |Ψ 0 ⟩. In other words, given * = argmin L , the state | ( * )⟩ approximates | * ⟩ and L * provides an upper bound for . VQE can solve general optimization problems as long as the problem is formulated using an Hamiltonian 11 we refer the interested reader to subsection B.6 in the Appendix where we report the formal definition and proof of the Variational Principle.
such that its ground state corresponds to the solution of the original problem. For example, to find the binary solution that minimizes a cost function ( ) one may first find an Hamiltonian that encodes the cost function ( ), i.e. | ⟩ = ( )| ⟩, and then use the VQE to approximate the ground state of this Hamiltonian.
The Quantum Approximate Optimization Algorithm (QAOA) [69] is a famous variational algorithm that defines an ansatz to generate approximate solutions to a given problem. It was initially proposed to solve combinatorial optimization problems, and then it was generalized as a standard ansatz [44,83]. QAOA finds the trial state by repeatedly applying unitary evolutions according to a two-terms Hamiltonian [69], namely,ˆandˆ. The first one encodes the classical cost function ( ) to be minimized, and it is typically diagonal in the computational basis. In contrast, the second one is the mixing Hamiltonian which coherently moves the system through different configurations in the Hilbert space seeking the ground state ofˆ. Typically,ˆis a global transverse field, i.e.,ˆ= − ˆ. Thus, given the problem Hamiltonianˆ, one can use QAOA to find its ground state by applying the evolution operatorˆ: where the set of angles { , } are the parameters to be optimized. Starting from an initial state | 0 ⟩, e.g., the uniform superposition of the basis states ⊗ |0⟩, one can evolve it by using Equation 5 and obtain | ( , )⟩. The parameters are then adjusted such as to minimize the objective function expressed as: The parametrized evolution operatorˆ( , ) can be viewed as a layerized variational circuit: A fundamental ingredient for QAOA is the parameter in Equation 5 since it regulates the precision of the approximate solution. Growing means reducing the "distance" from the optimal solution. However, for large , decoherence starts to play a significant role in the computation since the ansatz will contain more gates. We recommend the reader the original paper [69] for a complete discussion about the impact of the parameter and QAOA in general.
In the following two sections, we move to the topics of QPs and QNNs. Unlike classical learning approaches, quantum ansatzes are typically specialized to build perceptrons or represent an entire network. From such an observation, we first delve deep into the QPs and then to full QNNs proposals.

Quantum Perceptron
From the classical perspective, a perceptron is a computational model characterized by a non-linear response to its input and is parametrized as: where x ∈ R is the input, w ∈ R and ∈ R are learnable parameters (weights and bias), and is a non-linear activation function. Depending on the value of , the perceptron is said to be either active or at rest. One of the simplest forms of an artificial neural network is the Multilayer Perceptron (MLP), i.e., a network composed of multiple layers of perceptrons (called inner-product or fully-connected layers). Each layer is composed of many perceptrons whose action is a linear projection of the input followed by a non-linear element-wise activation function, i.e., where W ∈ R × and b ∈ R are the weight matrix and the bias vector, respectively. The fully-connected layer is one of the basic building blocks for Deep Neural Networks, so the perceptron can be considered a "fundamental unit" of classical NNs. Therefore, the research in the field of QNN initially focused on defining quantum algorithms capable of reproducing the functionalities of classical perceptrons, e.g., for classification tasks.
The implementation of the Quantum Perceptron is still an active research topic. We know that classically there is one reference implementation of such objects (Equation 8). However, due to the intrinsic differences among the available quantum hardware platforms, different formulations report advantages and disadvantages depending on the context in which they are used, thus allowing for distinct implementations equally valid in the respective fields of application. To ease the reader's understanding of the current state-of-the-art QPs, we report a summary in Table 2. It should be noted that some approaches have been implemented on real QPUs (e.g., IBM Q 5), while other methods have only been tested using numerical simulations so far.
Due to the linear, unitary, and non-dissipative nature of the quantum transformations, one of the hardest challenges to design a QP resides in designing and implementing non-linear activation functions for quantum computations. Schuld et al. [165] proposed one of the first designs for a quantum perceptron model imitating the step-activation function of a classical binary perceptron. Their main idea was to encode a normalized version ∈ [0, 1) of the input signal w x into the phase of a quantum state of qubits and then use a phase estimation algorithm with a precision of , implemented using an inverse quantum Fourier transform. They proved that measurements on the first qubit of the output quantum state could have been used to estimate if was larger than 1/2, hence reproducing the behavior of a step activation function.
In 2017, Cao et al. [41] proposed an approach based on the Repeat-Until-Success (RUS) [32,138] technique to synthesize quantum gates. As we mentioned previously, the classical perceptron evaluates the non-linear transformation = (w x + ) that, once thresholded, defines the activation status of the perceptron. In such a context, = w x + is the input signal to the neuron and represents the non-linear operation whose output lies in [ , ], where typically = −1 or 0 and = 1. In [41], such an operation is mapped to the quantum realm by defining the perceptron as a qubit whose state is defined as:ˆ( 2 + 2 )|0⟩ = cos( 4 + 4 )|0⟩ + sin( 4 + 4 )|1⟩, where ∈ [−1, 1] andˆrepresents a rotation around the -axis as described byˆ. Note that as far as the extreme values of are concerned, they recover a classical behavior being the system in the state |0⟩ or |1⟩. Instead, in all other cases it behaves quantum-mechanically.
The quantity = ( ) = (w x + ) is computed using a quantum circuit where the input data is encoded into a state vector | ⟩ = | 1 · · · ⟩ acting as the control register over an ancilla qubit. Initially, a rotation (2 ) conditioned on | ⟩ followed by a rotation (2 ) is applied on the ancilla qubit, that is equivalent to say that (2 ) is applied on the ancilla qubit conditioned on the state | ⟩. Then, the non-linearity is realized by using the rotation (2 ( )) which is approximated by a class of RUS [138] circuits each implementing a rotation with an angle equal to arctan(tan 2 ( )).
Repeating the RUS circuit times gives a rotation with an angle arctan(tan 2 ( )) that is a sigmoid-like, tangent-based, non-linear function. Stemming from the same approach, Hu [96] implemented a different RUS circuit to represent a sigmoid-based activation function, ( ) = arcsin( √︁ sigmoid( )), thus avoiding the drawbacks of using periodic Table 2. Summary comparison among various quantum perceptron proposals. The symbol "nr" stands for "not reported" meaning that the authors did not explicit reported that specific information. In the column "Impl." we reported whether the method was implemented on an actual QPU or tested using numerical simulations only (reporting in brackets the library used when the information was available).

Work Formulation
Impl. Code

Using the Gate Model
Schuld et al. [165] Quantum circuit using inverse quantum Fourier transform to reproduce a step activation function. The input is encoded into an -qubit state Simulation ✗ Cao et al. [41] Circuit based on RUS (using a tangent-based non-linear function). The -dimensional input data is encoded into an -qubit state Simulation ✗ Hu [96] Circuit based on RUS (using a sigmoid-based non-linear function). The -dimensional input data is encoded into an -qubit state.

IBM Q 5 ✗
Du et al. [62] Variational quantum circuit based on a parametrized Grover oracle learned with a classical optimization approach.
Wiebe et al. [194] Quantum search problem, based on the version space interpretation of the perceptron, based on the Grover's algorithm.
nr ✗ Liu et al. [120] Given a supervised dataset, construct the weights' matrix as the tensor product of input and target and then apply SVD to it.
nr ✗ Wiersema et al. [195] Qubit represented as a density matrix that describes the classifier as a Boltzmann-like finite-temperature distribution.
Tacchino et al. [180] Quantum variational circuit in which the -dimensional input datum is encoded into the coefficients of an -qubit state ( = 2 ).

Using the Adiabatic Model
Soloviev et al. [178] Perceptrons and synapses are implemented as JJ-based superconducting circuits. functions, such as the tangent. Wiebe et al. [194] adopted a very interesting and completely different perspective.
Indeed, the authors reported that the only way to unveil a quantum advantage is to formalize approaches that do not try to emulate classical algorithms in the quantum world. Instead, a new point of view is needed. Thus, they based their approach on the Version Space interpretation of the perceptron and Grover's search algorithm [80]. Specifically, they exploited a general method referred to as amplitude amplification [37] for which Grover's algorithm represents a particular case. Using the Version Space, the authors posit the problem of training a perceptron as a search problem (i.e., search for the optimal parameters rather than "learn" them), hence fully exploiting the quantum advantage offered by the amplitude amplification technique. Wiersema et al. [195] exploited the formalism of density matrices to cast the perceptron learning problem in terms of a quantum cross-entropy: where (x) is the empirical distribution of observing the datum x ∈ {1, −1} , x ≡ |Ψ⟩⟨Ψ|, with |Ψ⟩ = √︁ ( |x)|0⟩ + √︁ (− |x)|1⟩, is the density matrix associated with the distribution ( |x) of observing the label ∈ {1, −1} given x, while x ≡ (x, w; y) is the density matrix associated to the model conditional distribution ( |x; w). Note that such Manuscript submitted to ACM a formulation resembles the classical cross-entropy objective: which embodies the perceptron's description, represents a finite-temperature description of a qubit formulated as: where the inverse temperature is set equal to −1,ˆare the Pauli matrices, the term ℎ ∈ R is evaluated as w x (i.e., for each Pauli matrix there is one set of weights), and is such that Tr{ x } = 1. Thus, there is a direct dependence from the classical input data of the perceptron state. When the dataset {x} is linearly separable, the convergence point is similar to the classical case. Instead, in the presence of label noise, there is a contribution also from Pauli operators other thanˆthat allow the quantum perceptron to perform better than its classical counterpart.
Torrontegui et al. [183] based their approach on the similarity between the classical perceptron and a single qubit interpreted as a two-levels system. Indeed, they implemented a perceptron with sigmoid non-linear activation as a qubit that undergoes a unitary transformation, parametrized by an external input fieldˆ, given bŷ where is the label of -th perceptron of a layer,ˆis the quantum field generated by neurons in earlier layers,ˆandâ re the Pauli matrices, and (·) encodes the non-linear activation function. The authors formalized the implementation of such a model using an adiabatic process. Specifically, they proposed implementing the system's evolution as a single adiabatic path for an Ising model of interacting spins. The authors proved that such a perceptron acts as a universal approximator with such a definition. Moreover, they report how it is possible to treat such objects as the basic building block for large neural networks.
In Liu et al. [120], the authors proposed a quantum algorithm based on unitary weights, suitable to implement a one-iteration learning approach. The weights' operator is constructed as the tensor productˆ= | ⟩ ⊗ ⟨ |, The current-phase relation is used to implement the non-linear response in such a design. Instead, the system's input is supplied through magnetic fluxes to which the circuits, operating at a few kelvins, couple.
Finally, in 2019, Tacchino et al. [180] proposed a binary perceptron model (where both input and weight vectors are limited to binary values) that can be implemented on NISQ [146] devices. To reduce the resource demand of the algorithm, the authors formalized a new procedure to generate multipartite entangled states based on the so-called hypergraph states [75,155]. First, the input vector x ∈ {−1, 1} and the weight vector w ∈ {−1, 1} are encoded on the quantum states | x ⟩ and | w ⟩, respectively, using = log 2 qubits (see Eq. Equation 4). Then, given the preparation state as |0⟩ ⊗ , the authors evolved the state employing two unitaries,ˆx andˆw, such thatˆx|0⟩ ⊗ = | x ⟩ and w | w ⟩ = |1⟩ ⊗ . The authors proved the final N-qubit state | x,w ⟩ =ˆw| x ⟩, obtained by applyingˆw afterˆx, contains the scalar product among the input and the weight vectors, up to a normalization factor, in its last amplitude coefficient, i.e., ⟨1 · · · 1| x,w ⟩ = ⟨ w | x ⟩ = 1 w x. Finally, the non-linearity was implemented as a measurement whose output represented the probability for the system of being activated by the given input pattern. The authors successfully proved that for any given weight vector, w, the perceptron was able to single out from the 16 possible input patterns only i = w (with output probability of 1, i.e., the perfect activation of the neuron), while all other inputs gave outputs smaller than 0.25.

Quantum Neural Networks
Before delving deep into the most recent achievements in the field of QNNs, we briefly mention a few seminal works that, although published several years ago, played a relevant role in advancing the discipline at hand. One of the first proposals concerning a hybrid algorithm for QNNs, dates back to 2000 with the work of Narayanan and Menneer [133].
Using simulations, the authors compared the performance of a classical artificial network in which, from one experiment to the following, different components were substituted with their quantum analogous. Interestingly, the authors found that a fully quantum network did not report any tangible advantage compared to a "partially" quantum one. Ventura and Martinez [185] proposed a quantum version of associative memories to harness the exponential increase in storage capacity, originated from superposition effects that quantum mechanics allows, concerning classical algorithms. To their aim, the authors exploited a generalization of Grover's algorithm [80] able to fulfill the search and completion tasks.
Parametrized Quantum Circuits (PQCs), which include variational quantum algorithms, are the most commonly adopted choice to design quantum neural networks. In such a context, the qubits' states are manipulated by unitaries characterized by a set of parameters that are classically learned at training time. Finally, the value of a given observable is accessed through a measurement that outputs a discrete quantity. Even if we focus on those algorithms in what follows, it is worth mentioning that another class of quantum algorithms exploits continuous variables, namely Continous-Variable (CV) [34,110] models. These models do not use qubits to store quantum information. Instead, they exploit continuous degrees of freedom such as the amplitude of the electromagnetic field, thus making them suited for photonic hardware.
In what follows, we review the state-of-the-art QNN models. To ease the comparison among the analyzed approaches, we report in Table 3 a summary of the most relevant properties of the models.

5.4.1
The expressivity of PQC models. Classical feedforward networks have a layered structure where each layer processes the previous layer's output using functions depending on some trainable parameter. Since a layered structure with learnable parameters also characterizes PQC-based algorithms, they have been widely used as quantum versions of neural networks. Typically, each layer of a PQC accounts for three types of blocks: data-encoding circuit blockŝ (x) (depending on the input variable), parametrized circuit blocksˆ( ) (depending on a set of trainable parameters), and non-parametrized circuit blocksˆ(e.g., entangling operators). Given these blocks, a PQC with layers is usually described by a unitaryÛ (x; Θ) = ˆˆ( )ˆ(x). The prediction from such a model is then evaluated as the expectation value of an observable , over runs of the circuit, with respect to the final state of the quantum circuit, e.g., out (x; Θ) = ⟨ 0 |Û (x; Θ) †Û (x; Θ)| 0 ⟩ where | 0 ⟩ is some initial state of the quantum system. Different choices for the implementation of the data encoding, parametric and non-parametric circuit blocks, and their mutual interaction Table 3. Comparison among the analyzed quantum neural networks designs. The symbol "nr" stands for "not reported" meaning that the authors did not explicit reported that specific information or that the specific attribute cannot be applied to the referred work.
Inf. Theory Q Romero et al. [152] QAE for quantum data compression Test Datasets: we reported only real-world and publicly available datasets (synthetic datasets are omitted in the table), * a subset of the dataset is used.

+ We only considered the Quantum Graph Recurrent Neural Networks
Manuscript submitted to ACM correspond to different QNN architectures. Hereafter, to ease the notation, we merge the non-parametrized blocks with the parametrized ones, i.e., we will useˆ( ) to denote the compositionˆˆ( ) referred to as a trainable block.
An important difference between this type of QNNs and classical NNs is that the information does not flow from one layer to the next only. Instead, the input data x is uploaded at each input layerˆ(x) (see, e.g., Figure 4a). Such a design directly impacts the class of functions the PQC can approximate [76,141,166]. linearly with the number of data encoding repetitions: the more times the original data is loaded, the richer the frequency spectrum will be. Finally, note that there are mainly two ways of repeating the data encoding: in series (by uploading several times the inputs while quantum computations proceed, as discussed above) or in parallel (by repeating the encoding on different subsystems at the same layer, so the depth of the circuit will be less but more qubits will be used) [166]. Connections between Fourier series and PQC models with repeated data encoding have also been established by Gil Vidal, and Theis [76].
Similar results on the expressivity of PQCs were reported by Perez-Salinas et al. [141], who showed that it is possible to use a single-qubit quantum circuit to implement a universal quantum classifier. The Universal Approximation Theorem (UAT) [95] represents a milestone in the context of machine learning. Practically speaking, it ensures that a neural network comprising a single hidden layer with enough units can approximate any continuous function.
Interestingly, Perez-Salinas et al. [141] presented an equivalence between UAT and the data re-uploading strategy. In their framework, data are uploaded by single-qubit rotations followed by another set of unitaries representing the evolution of the quantum state. These two operations can be compressed into one by defining quantum layers with tunable parameters as:L ( ) =ˆ( + w x), where x represents the input data, w are weights that play a similar role as in artificial neural networks, while are a set of variational parameters. Although, at first sight, it seems that there is no non-linearity in such a construction, the authors reported that they come directly from the structure of the used In 2018, Schuld et al. [161] proposed a low-depth parametrized quantum circuit acting as a binary classifier. Their approach is particularly relevant since it was among the first ones to apply a quantum ansatz to solve machine learning problems using variational circuits. The proposed circuit uses a data-encoding blockˆ(x), based on the amplitude encoding technique, to embed classical input data into the state of a quantum system, which is then evolved by a trainable ansatzˆ( ). The circuit ansatz accounted for a set of single-and single-controlled qubit gates divided into the so-called "cyclic codes". Once the initial state is evolved, the first qubit is measured, and the probability of finding it in the state |1⟩ is used to infer the input label. Specifically, such a probability is given by: Instead, Beer et al. [20] proposed a quantum MLP, also referred to as quantum dissipative neural network. In their model, each quantum perceptron is formulated as a unitary operator acting on input and output qubits. The idea was to stack this kind of perceptrons to form "layers" which emulated the behavior of a classical deep model. Each layer interacts with the following one before being dissipated (discarded); the interaction is implemented by coupling the qubits in the -th layer to some ancillary qubits in the ( + 1)-layer (Figure 4c). The overall QNN can be then described as follows:Û = ˆ, whereˆ= ˆr epresents the set of unitaries of layer , i.e., the QPs of that specific layer. The QNN is applied to the input state, in and produces an output state, out . To assess the performance of their approach, the authors used numerical simulations with a QNN concerning input and output spaces of 2 and 3 qubits. To train their model, they employed a gradient descent technique, while as a loss function, they employed the fidelity among the output state and the desired (ground-truth) state averaged over the training data.
Manuscript submitted to ACM Graph Neural Networks [179] were introduced to exploit the geometrical structure in the input data and accounted for the application of neural networks to acyclic graphs. Based on a similar principle, a general PQC was proposed by Verdon et al. [187], the authors train the model parameters by using the Adam [111] optimizer to minimize the average infidelity : where the | ⟩ = −ˆt arget | 0 ⟩ is the state evolved from | 0 ⟩ for a time ,ˆ(Δ, )| 0 ⟩ is the state obtained by evolving | 0 ⟩ according to the QGRNN for ∼ /Δ iterations with Δ being a hyperparameter determining the Trotter step size, and the average is evaluated over batch sizes of different times .

CNN-inspired Quantum Neural Networks. Convolutional Neural Networks (CNNs) are feedforward networks
particularly suitable for data that have a grid-like topology such as images or time-series data. This class of models owes its name to the use of convolutional operations in at least one of their layers. A convolutional layer takes a tensor as input and processes it using a set of trainable kernels (filters) that slide along the input tensor, computing the inner product between the filter and the input entries, producing as output the so-called feature map. Each element of the output is then transformed by applying a non-linear activation function. Finally, a pooling operation is typically used to replace portions of the feature map with a summary statistic (e.g., max, average, sum) -which makes it more manageable in the successive layers (since the dimension is reduced), the network more resilient to small input changes, and helps to avoid overfitting issues. Convolutional layers can then be followed by fully-connected layers where each perceptron receives information from all the units of the previous layer. This latter typology of a layer is also used to produce the output of the network, e.g. a vector containing the probabilities that the input data belong to certain classes for classification tasks or real values for regression tasks.
Given the success of deep CNNs on the task of pattern recognition [113,114], many researchers proposed quantum algorithms inspired by these models. In [106,119], the authors proposed quantum models, running on an actual quantum hardware, capable of performing all the classical CNN operations. The main limitation of these approaches is that they usually require a QRAM [77] to interface classical data and quantum states and may suffer from classical data encoding bottleneck. Moreover, the overall algorithm may require subroutines that are not easy to implement with many qubits (e.g., quantum phase estimation). For example, Li et al. [119] proposed the Quantum Deep Convolutional Neural Network (QDCNN), based on a PQC, whose architecture consists of several successive quantum convolutional layers and a quantum classifier layer (Figure 4b). Their model does not implement a quantum version for the pooling layer, however the dimension of the generated feature maps is reduced by preserving some qubits in the location register and disentangling the other ones. Quantum measurement is performed to output a category prediction, and a hybrid quantum-classical training algorithm is used to optimize the parameters of the circuit. The classical input (e.g. an image) and the convolutional kernels are prepared using a QRAM. Similarly, Kerenidis et al. [106] proposed to vectorize the convolution operations in a CNN to exploit linear-algebra subroutines to carry out the calculation exploiting quantum phenomena. To their aim, the authors proposed algorithms both for the forward and backward passes while training a QNN. Data and convolutional kernels are stored into QRAMs registers that are resumed when needed.
Henderson et al. [92] proposed a hybrid classical-quantum approach to empower a classical CNN with layers containing filters implemented as quantum circuits, called quanvolutional layers, that can be stacked on top of any layer of a traditional CNN (see Figure 4d). The resulting architecture was named Quanvolutional Neural Network (QVNN).
Such a proposal stemmed from the observation that, on the one side, machine learning algorithms benefit from random non-linear features in terms of accuracy and training time, while on the other, quantum circuits can model complex functional relationships. Each quanvolutional layer is characterized by a certain number of filters, designed as a sequence of single-or single-controlled qubit gates, and produces feature maps by locally transforming input data. Specifically, each circuit is built by considering each qubit as a node in a graph and assigning what the authors named a "connection probability" between each pair of qubits. A key difference of such a formulation of QVNN compared to others is the lack of a variational tuning of the circuits' parameters (they use random quantum circuits in the quanvolutional filters).
Interestingly, such an architecture does not require a QRAM, thus reducing the overhead due to its usage, and it is compatible and directly integrable with existing classical architectures. QVNNs were tested against the MNIST dataset using the QxBranch Quantum Computer Simulation System, and, unfortunately, the authors did not find any advantage over the classical counterpart. Indeed, the performance of the QVNN was indistinguishable from that of a model using a classical random non-linear transformation instead of the quanvolutional layer. In Grant et al. [78], the authors exploited the tensor networks' hierarchical structure to represent quantum circuits 12 .
Specifically, they employed tree tensor networks [170] and the multi-scale entanglement renormalization ansatz (MERA) [188] to implement a quantum classifier subsequently tested against the Iris, MNIST, and synthetic quantum datasets. The classifiers were implemented on the ibmqx4 quantum computer and trained using the mean squared error loss evaluated between the circuit's output and the expected one. Specifically, concerning the first quantity, it represented the outcome of a measurement on the target qubit: ⟨ˆ(| ⟩)⟩ = ⟨ |ˆ †ˆˆ| ⟩. To minimize the objective function, the authors employed the Adam [111] optimizer. As an example of hierarchical structure, the tree tensor networks-inspired circuit begins by applying a set of two-qubit unitaries among the nearest qubits. After each operation, one qubit is discarded, thus halving the total number of qubits to the next layer. After a sequence of such unitaries, the state of the final remaining qubit is measured.

5.4.3
Hybrid approaches for different learning paradigms. Jaderberg et al. [97] proposed a VQA to solve a classification task by employing a self-supervised training approach [55]. They formulated the ansatz to be part of a hybrid encoder network tested against the classification task on five classes of the CIFAR10 dataset. To train the model, the authors exploited the contrastive learning procedure [89,136] applied to the representations generated by their models. As the objective function, they used the normalized temperature-scaled cross-entropy loss [177]. As mentioned, the QNN was part of a hybrid network whose first part was classical. Specifically, the overall architecture began with a ResNet-18 [90] followed by a single-layer convolutional network that compressed the features representation down to an eight-dimensional vector, v, such that it was encoded in an entangled state | v ⟩ comprising = 8 qubits. The quantum data encoding is achieved by applying a single qubit rotationˆ(as generated by theˆoperator) to each qubit of the register, i.e., | v ⟩ = ⊗ˆ( )|0⟩, where is the -th element of the -dimensional features vector v. The state is evolved using an ansatz made by parametrized controlled qubit rotations. Finally, to get the output from the QNN, the authors measured the qubits state in the computational basis. The authors tested their approach on quantum state vector simulations and real quantum devices. In both cases, they observed that the hybrid architecture reported performance similar or higher than the full classical implementation, thus giving an interesting hint about the capability of such hybrid models applied to computer vision-related tasks.
Transfer learning is a widely used technique in deep learning that exploits pre-trained neural models on a different task. Interestingly, such an approach can be "hybridized" by combining classical and quantum algorithms. Indeed, Mari et al. [126] considered three different scenarios for transfer learning, namely, classical-to-quantum, quantum-to-classical, and quantum-to-quantum. In what follows, we focus on the classical-to-quantum configuration. In such a case, a ResNet-18 [90] network trained on the ImageNet dataset [57], was used as a features extractor for the quantum circuit.
The parametrized quantum circuit was interpreted as made by three components: = M • Q • E. The first component, E, represents the quantum embedding of the classical features vector into the Hilbert space depending on x. The second one, Q, is a variational circuit of a given depth composing single and two-qubit gates, while the M term represents the measurement on the circuit's output that maps a quantum state to a classical vector. The authors trained such a circuit to classify images of bees and ants utilizing the Adam [111] optimizer and the cross-entropy loss. The authors assessed the performance of their models through numerical simulations by using the PennyLane framework and then implemented the quantum circuit on the ibmqx4 and the Aspen-4-4Q-A quantum processors from IBM and Rigetti, respectively. The authors' results demonstrated that there was already the possibility to investigate efficient algorithms to process high-resolution images in the NISQ era [146] concerning the classical-to-quantum transfer learning scheme.

Other approaches.
Differently from the variational approach, a quantum generalization of the classical feedforward neural networks was proposed by Wan et al. [190]. The quantum neural network accounted for a reformulation of the perceptron model by making each transformation reversible and unitary. Although different from VQA, such networks have shown, by employing numerical simulation, to be trainable using gradient descent for a given objective function. In such a case, the goal was to minimize the difference between the quantum circuit's output and the expected one. A viable way to physically realize such an architecture was employing quantum photonics. For example, to prove the effectiveness of their approach, the authors proposed a quantum autoencoder to compress two quantum states with high accuracy.
Quantum State Discrimination (QSD) [19] requires the design of measurements to optimally identify the that represents the unknown state of a quantum device which is believed to belong to a non-orthogonal set of states { }.
Such a capability has a broad range of applications. For example, quantum state discrimination by itself plays a crucial role in quantum information processing protocols and is used in quantum cryptography [24], quantum cloning [63], quantum state separation, and entanglement concentration [19]. Typically, distinguishing among non-orthogonal states is challenging, and specialized measurements are required. An exciting approach for solving such a problem was proposed by Chen et al. [46], who formalized a framework to learn to simulate the unknown structure of the generalized quantum measurement, or Positive-Operator-Value-Measure (POVM) [134]. The circuit designed by the authors contained only a single-qubit and CNOT gates, and the proper non-linearities were introduced by measuring qubits states. Specifically, they assessed the performance of their circuits by employing numerical simulation on classical machines and by using the Adam [111] optimizer to learn the circuit's parameters. However, differently from the previously cited variational approaches, in the current case, the authors aimed at maximizing their circuit's generalization performance given a specific range of the parameters rather than looking through all the available space.
Classical AutoEncoders are tasked with data reduction, i.e., given an ( + )-bit representation, the encoder generates a compressed -bit datum with the same discrimination power of the original input. Romero et al. [152] proposed a Quantum AutoEncoder (QAE) as a solution to the compression problem concerning quantum states represented by ( + )qubits. Specifically, they were meant to facilitate quantum data compression, especially concerning quantum simulations.
Such models should recognize patterns due to quantum phenomena, such as superposition and entanglement, not accessible to classical algorithms. The authors tested their architecture to compress ground states of the Hubbard model and molecular Hamiltonians. The task of a QAE was then to get rid of a certain number of qubits, say while maintaining the same amount of quantum information in the remaining ones. Thus, the QAE must entangle qubits, and for such an aim, the authors exploited unitary gates to build the variational circuit (Figure 4e). As a cost function, QAE leverages the expected fidelity [134], (| ⟩, ), which quantifies the deviation from the initial state | ⟩ to the output one . The authors used a hybrid approach to train the model in which the quantum device took care of the state preparation and the measurement while the optimization procedure was carried out through classical techniques.
The authors exploited two different optimizers to train the circuit, namely, L-BFGS-B and Basin-Hopping. After training, the authors proved that QAE could compress the ground state of the two-sites Hubbard model from 4 qubits down to 1 with an error of below 10 −4 . It is relevant to notice that such a compression allows moving from a space of size 2 4 to one of size 2.
Manuscript submitted to ACM Finally, it is worth mentioning that there exist approaches based on an optimization procedure different from the variational approaches we have seen so far. For example, Silva et al. [53,173] proposed a learning algorithm in which all the input patterns were presented concurrently in superposition to the neural networks. Their model was based on a quantum version of the RAM-based neural networks [12], termed quantum-RAM (q-RAM) node [54], a weightless network in which the learning procedure consisted upon simply writing the proper output value in the corresponding look-up table entries of the q-RAM node. The model was designed as a quantum circuit based on the gate model paradigm. Moreover, the authors commented on the possibility of stacking mode nodes to form a deeper network. A fundamental step in the training procedure accounted for Grover's algorithm [80] applied to the input until the desired output was returned.

Trainability Issues and Barren Plateau
As discussed in the previous section, the most common approach for implementing a QNN is using a PQC characterized by a set of parameters that are typically learned by optimizing a cost function ( ) using a classical optimization algorithm. One of the main issues with the trainability of a QNN is that the gradient of the cost function with respect to may exponentially vanish during the training as a consequence of the progressive flattening of the cost function's landscape -a phenomenon referred to as barren plateaus. This phenomenon is particularly present when using a random initialization for "deep" circuits, or more precisely, for circuits characterized by a number of layers that depend polynomially on the number of qubits [128]. Indeed, McClean et al. [128] stated that for a wide class of PQCs with a sufficient depth and number of qubits, the probability that the gradient along any reasonable direction is non-zero to some fixed precision is exponentially small as a function of the number of qubits. In other words, the expected value of ( ) is zero and its variance decreases exponentially with . Unfortunately, the region where the gradient is zero does not correspond to local minima of interest. Instead, it corresponds to a large plateau of states for which no interesting search direction can be pursued to exiting the barren plateau, or more precisely, an exponential precision is required to determine a minimizing direction in order to navigate the cost landscape. The use of a shallow circuit does not solve the problem a priori because as long as a global cost function is used, i.e. a measure obtained using an observable that involves all the qubits of the circuits, the barren plateau can occur in any layer of the circuit [128]. This is one of the main differences between the gradient vanishing problem for classical DNNs versus QNNs: in the case of classical DNNs, the gradient can vanish exponentially in the number of layers, while for QNNs, the gradient can vanish exponentially in the number of qubits [128].
Cerezo et al. [45] showed that a promising approach to overcome the problem of exponentially vanishing gradients is to use shallow circuits with local cost functions, which are constructed by observing only a limited set of qubits at a time. Indeed, they demonstrated that local cost functions lead, at worst, to a polynomially vanishing gradient if the circuit depth is (log ), with being the number of qubits, and therefore the circuit will be trainable with a polynomial scaling with the system size (i.e. a polynomial number of shots per optimization step are needed to estimate the gradient). However, note that using local cost functions alone is not sufficient to avoid the problem: deep networks with local observables can still give rise to barren plateaus.
Interestingly, some QNNs have not exhibited barren plateau in the cost function landscape, such as dissipative quantum neural networks with shallow local perceptrons [20,168] and Quantum Convolutional Neural Networks [52,78,143]. Moreover, several strategies have been proposed for mitigating barren plateaus, including the use of local cost function [45], specific parameter initialization/pre-training [79,186], clever problem-inspired ansatz [28,83], layerwise learning [175], and parameter correlation [94,189], just to name a few.
All the methods mentioned so far concern the problem of noise-free barren plateaus since they do not consider quantum hardware noise. However, a different kind of barren plateau can occur in the presence of noise. Wang et al. [192] provided an analytical study on noise-induced barren plateaus for a generic variational ansatz. They proved that in the presence of local noise, the magnitude of the gradient decays exponentially with the depth of the circuit. Therefore, while noise-free barren plateaus are mainly connected to the structure of the ansatz (number of qubits, parameter initialization, locality of the cost function), the noise-induced barren plateaus are mainly affected by the ansatz depth.
Moreover, for noise-induced barren plateaus, the magnitude of the cost function's minimum is also flatting, in addition to its gradient. Due to the diverse nature of noise-induced barren plateaus, strategies to mitigate noise-free barren plateaus, e.g. [45,79,175,189], do not solve the problem, and the strategies to adopt should rather rely on simplifying the complexity of the circuit by reducing its depth or reducing the hardware noise level. 98.86 ± 0.07 * multilabel classification problems casted as a set of "one-versus-all" binary classification subtasks; fivefold cross-validation with one repetition was carried out • approximate accuracy estimated from the graph of test accuracy versus training iterations available in the original paper ★ four distinct binary classification tasks were considered in the paper

Final Remarks
As we have seen, PQC currently represents the most adopted approach to QNNs. Perhaps, their capability to allow implementing learning circuits on NISQ devices and the similarities between the ansatz and the architecture of classical neural networks are two of the reasons for the variational algorithms' success. However, other research lines try to exploit intrinsic properties of quantum systems without the necessity of emulating a classical procedure, such as in the work of [194]. Hence, as we have seen so far, the research for QNNs that one day will be applied to real-world problems represents a very exciting and active field.
Unlike the literature on classical neural networks, identifying state-of-the-art QNNs for solving a given task is not easy. This issue is mainly due to the lack of benchmark procedures for comparing these algorithms. Many approaches have only been tested on synthetic data generated by authors, and only a few have publicly released their code. Furthermore, the use of different implementations (simulations or different QPUs) makes a direct comparison among the various approaches very difficult. Even when the same benchmark dataset is used, see, for example, the MINST dataset for handwritten recognition in Table 4, it is not easy to compare the performance of the various methods because there is a variety in the number of samples used, the classes or the specific tasks considered.
Manuscript submitted to ACM Due to the limitations of current quantum platforms, several works have remained only theoretical, waiting for QPUs to become powerful enough and less noisy to allow the exploitation of a large number of entangled units. Indeed, the dragon of decoherence is always on the hunt and, unfortunately, qubits easily fall prey, thus strongly limiting the number of computations that can be performed. Perhaps, future quantum devices will open the frontiers to a new generation of neural networks based on quantum phenomena.

CONCLUSIONS
Artificial intelligence has played a central role in academic and industrial debates in the last couple of decades, especially concerning machine learning and neural network techniques. Although these algorithms have shown an incredible generalization capability when adequately trained to solve a given problem, how and why these algorithms behave as they do is a topic that still dazzles scientists worldwide. Nevertheless, notwithstanding astonishing results, these algorithms will always and irremediably be tied to the classical interpretation of the real world.
Differently, quantum computations might overcome such a limit and exploit phenomena such as superposition and entanglement that belong to the quantum realm. Although still at their dawn, quantum technologies represent a promising and fascinating alternative to classical computing techniques, perhaps realizing practical quantum supremacy soon.
Stemming from these considerations, we conceived this survey to offer both the neophyte and the more experienced reader insights into several fundamental topics in the quantum computation field such as the qubit, the Gate Model, and the Adiabatic Quantum Computation paradigms, to mention some. Moreover, noticing that the literature lacks a detailed discussion about the latest achievements concerning Quantum Perceptrons and Quantum Neural Networks, we gather, analyze and discuss state-of-the-art approaches related to these topics. From our work, it is clear that quantum neural networks and algorithms, in general, are still far from proving decisive supremacy over the classical ones. Too often, such a goal has been claimed. Nevertheless, it is still an open quest. It is not clear if quantum devices will replace classical chips, becoming the core of a new generation of personal computers. However, recent improvements for quantum hardware and algorithms have opened the gates towards a new world, the quantum world, characterized by phenomena that were completely unknown until the last century.

ACKNOWLEDGMENTS
The work was partially supported by H2020 project AI4EU under GA 825619, by H2020 project AI4Media under GA 951911, by WAC@Lucca funded by Fondazione Cassa di Risparmio di Lucca. The Dirac or bra-ket notation [60] is a convenient and concise formalism introduced by the physicist Paul Dirac in quantum mechanics to aid algebraic manipulations on Hilbert spaces and their dual space. Before describing this notation, we recall the inner product and Hilbert space concepts.

A APPENDIX: TABLE OF ABBREVIATIONS USED
An inner product on a complex vector space V is a binary operation (·, ·) : V × V → C that satisfies the following properties 13 is defined as =1 * , which is equivalent to the matrix multiplication between the transpose-conjugate vector of v and the vector w.
An inner product naturally induces a norm ||v|| = √︁ (v, v) and a metric (v, w) = ||v − w|| on the vector space on which it is defined, which is referred to as an inner product space. A Hilbert space H is a inner product space which, as a metric space, is complete. In the notation introduced by Dirac • a particular Hilbert-space vector, specified by a label , is denoted by | ⟩ (instead of using arrows ì or boldface letters v); • the dual vector to a Hilbert-space vector specified by the label is denoted by ⟨ | (instead of using notation like v † ); • the inner product between a pairs of vectors, labeled as and , is compactly denoted by ⟨ | ⟩ (rather than (| ⟩, | ⟩) or other common notations like (v, w) used above).
Originally, Dirac proposed the words "bra" and "ket" as the names for new symbols ⟨ and ⟩. However, we follow the common convention of using the terminology ket for vector | ⟩ and bra for dual vector ⟨ |. It is worth mentioning that two general rules in connection with the Dirac notation should be noted: "any quantity in brackets ⟨ ⟩ is a number, and any expression containing an unclosed bracket symbol ⟨ or ⟩ is a vector in Hilbert space" [60]. Therefore | ⟩ is simply a vector in a Hilbert space, where is the label of the vector, and ⟨ | is the complex conjugate transpose of | ⟩, which is a vector of a different Hilbert space 14 . For example, for finite vector spaces, the ket | ⟩ can be identified with a column vector and the bra ⟨ | with a row vector. Therefore, expressions like | ⟩ + ⟨ | are not valid as they have no meaning.
Note that the inside the symbols | ⟩ and ⟨ | is just a label and that any label is valid, although, in the context of 13 We follow the convention of defining an inner product as linear in the second component and conjugate-linear in the first component, as typically done in physics. However, please note that in mathematics, an inner product is often defined as linear in the first coordinate and conjugate-linear in the second coordinate. 14 Actually, the symbol ⟨ | is more generally used to denote a functional : H → C specified by the label . We recall that for a complex vector space V all the continuous conjugate-linear functional : V → C form the so-called dual space V * . By the Riesz representation theorem, any element of the dual space of a Hilbert space can be represented as an inner product to some fixed vector. Therefore when working on Hilbert space H we can identify H with its dual H * by using the one-to-one antilinear correspondence between vectors and continuous linear functionals | ⟩ → where is the functional In other words, ⟨ | is the functional (named above) associated to the vector | ⟩. Moreover, using Dirac notation, the inner product between two vectors with labels and can be easily obtained by the graphically joining of ⟨ | and | ⟩.
Ifˆ: V → W is a linear operator represented by the × matrix = ( ), i.e. ,ˆ(| ⟩) = =1 | ⟩, then for any ket | ⟩ = =1 | ⟩ we haveˆ(| ⟩) = =1 =1 | ⟩, which is compactly denoted by | ⟩. Note that any matrix can be expressed as a linear combination of outer product between the basis vector: = =1 =1 | ⟩⟨ |. acting on a finite-dimensional Hilbert space V is diagonalizable, and the set of its eigenvectors forms an orthonormal basis of V (spectral theorem). When discussing the Hamiltonian of a system, observables, and measurements, these Manuscript submitted to ACM properties should be kept in mind. For more in-depth coverage of the basic concepts of linear algebra on Hilbert spaces, the reader is referred to [134? ].
Another operation often used in quantum mechanics is the tensor product, which is crucial to describe quantum states of multiparticle systems. The tensor product of V and W, denoted by V ⊗ W, is a -dimensional Hilbert space for

B.2 Postulates of Quantum Mechanics
The four postulates of quantum mechanics describe the behavior of an isolated system, which is an ideal physical system that does not interact with its environment. In particular, they describe the state space of an isolated system, its evolution with time, how information is extracted from the isolated system by the interaction with an external system, and the state of a composite system in terms of its parts.

B.2.1 State Space.
Postulate 1. At each instant, the state of any isolated physical system is completely described by a unit vector of a Hilbert space, also known as the state space.
Quantum mechanics does not describe which is the state space of a particular physical system but rather assure us that there exists a Hilbert space whose vectors can be used to describe each state of the system. Interestingly, as the space of the states comes equipped with an inner product, we have a way to associate a complex number to any two states, and thus, as we see later, a way to extract information from an isolated system. In theory, the state space may be infinite-dimensional. However, realistic models of quantum computation usually use states described by vectors in a finite-dimensional Hilbert space. In subsection 3.1, we introduce the concept of a qubit, the basic unit of quantum information. Here, we observe that any physical system whose state space can be described by C 2 can serve as a physical realization of a qubit. We referred to these kinds of systems as quantum two-level systems, as their state can be described by a vector in 2-dimensional Hilbert space. A further discussion on the physical realization of the qubit is provided in subsection B.3.

B.2.2 Time
Evolution. Since a physical system changes in time, a state vector of the system can be viewed as a function of time, i.e. , ( ). The second postulate of quantum mechanics asserts that the evolution of a state vector of an isolated system is linear, and it can be described using unitary operators: Postulate 2. The time evolution of an isolated system is described by a unitary operator. That is, the state | ( 1 )⟩ of the system at time 1 is related to the state | ( 2 )⟩ of the system at time 2 by an unitary operatorˆwhich depends on the times 1 and 2 : Note that the postulate does not specify how to computeˆ. Instead, it asserts that for any discrete-time evolution of the isolated system, there exists such a unitary operator that describes the dynamics of the system state. Some models, like the Adiabatic Quantum Computing (discussed in subsection 3.3), allow for continuous-time evolution of an isolated system described by the Schrödinger equation whereˆis the Hamiltonian of the system, which is a Hermitian operator representing the total energy function for the system, and ℏ is the Planck's constant. Considering a time-independent Hamiltonian, we have | ( )⟩ =ˆ( )| (0)⟩ whereˆ( ) = exp (−ˆ/ℏ), and | (0)⟩ is the state at = 0. Thus, for those cases, the Evolution Postulate follows from the Schrödinger equation. Also, note that for the case of a two-level system, the Evolution postulate assures us that a specific discrete-time evolution of the system is described by a unitary operator on C 2 acting on a qubit, referred to as one-qubit gate.

B.2.3 Measurements.
When we perform any measurement/observation on an isolated system, we interact with the system itself, which is no longer isolated, and thus the Evolution Postulate is no longer appropriate for describing its evolution. The following postulate explains the effects of measurements of quantum systems: Postulate 3 (Quantum measurement). A measurement with outcome set is a collection of | | measurement operators {ˆ} ∈ , each acting on the state space of the system being measured, that satisfy the completeness equation whereˆis the identity operator and each index refers to an outcome that may occur in the measurement process.
If the quantum system is in the state | ⟩ immediately before the measurement, then the probability of outcome ∈ is given by ( ) = ⟨ |ˆ †ˆ| ⟩ = ∥ˆ| ⟩∥ 2 (19) and the corresponding post-measurement state | ⟩ is Note that the completeness equation expresses that, for any state | ⟩, the probabilities of all the outcomes sum to 1: ∈ ( ) = ∈ ⟨ |ˆ †ˆ| ⟩ = 1. Moreover, two states that differ only by a global phase, e.g. , | ⟩ and | ⟩ are equivalent as the statistics of any measurement we could perform on the state | ⟩ is the same as the one for the state | ⟩ since ∥ˆ| ⟩∥ 2 = ∥ˆ| ⟩∥ 2 for every . The operatorsˆ †ˆare called Positive-Operator-Valued Measure (POVM).
An important class of measurements consists of projective measurements based on orthogonal projections. We recall that a projection is an operatorˆsuch thatˆ2 =ˆ, andˆ † =ˆ. A projective measurement is a collection of orthogonal projections {ˆ} that decompose the Identity operator asˆ= ˆ. Such a measurement outputs with probability ( ) = ⟨ |ˆ| ⟩ and leave the system in the normalized stateˆ| ⟩ ∥ˆ| ⟩∥ . Projective measurements are often described in terms of an observable, which is a Hermitian operator on the state space of the system being observed. From the Spectral Manuscript submitted to ACM Theorem, we know that any observableˆhas a spectral decompositionˆ= ˆ, whereˆis the orthogonal projection onto the eigenspace ofˆwith real eigenvalue . The possible outcomes of the measurements are the eigenvalues ofˆ, where the probabilities of getting result is given by ( ) = ⟨ |ˆ| ⟩= ∥ˆ| ⟩∥ 2 . The expected value of a projective measurement of a state | ⟩, i.e., the expected value of the observableˆ, can be easily calculated as Equivalently, for a system in a pure state represented by the density = | ⟩⟨ | the expected value of the observableĉ an be expressed as ⟨ˆ⟩ = Tr{ˆ}, where Tr is the trace operator.
We also mention that the term orthogonal measurement is used to refer to a projective measurement whose operators are of the typeˆ= | ⟩⟨ | where {| 1 ⟩, . . . , | ⟩} is an orthonormal basis of the state space. When measuring the state | ⟩, the probability of outcome ∈ {1, . . . , } is ( ) = |⟨ | ⟩| 2 (23) and the corresponding post-measurement state is | ⟩ = | ⟩. For example, if the state | ⟩ = | ⟩ is provided as input to the orthogonal measurement, it will output label with probability | | 2 and leave the system in state | | | ⟩, which is equivalent to the state | ⟩ since | | has modulus one.
Note that the space state basis used in the measurement can play a crucial role in gaining measurement information.
For example, the Hadamard basis states |+⟩ = 1 Postulate 4. The state-space of a composite physical system is the tensor product of the state spaces of the individual component physical systems. Moreover, if physical system are treated as one combined system and | ⟩ is the state of the -th system, then the state of the composite system is The joint state is often compactly denoted as | 1 , . . . , ⟩ where the symbol ⊗ is omitted. The postulate asserts that given the states of the component systems, we can compute the state of the combined system by using the tensor product. However, not all states of a combined system can be separated into the individual components' tensor product: A combined system is called entangled if it cannot be expressed as the tensor product of states of the component systems.

B.3 Physical Realization of Qubits
Since the qubit represents the fundamental information carrier for a quantum computer, its physical design and realization represented a milestone concerning the outspread of such technologies. Albeit the topic is fascinating, a comprehensive discussion about all the available technologies and implementations of physical qubits is out of scope for this survey. Nonetheless, we refer the curious reader to several specialized reviews available in the literature: However, it is fundamental to state that the number of qubits itself is not a proper metric to compare the computational capability for a given quantum processor. There are available in the literature different metrics aiming at such a goal.
For example, IBM proposed the so-called quantum volume [? ? ], a quantity that can be used to quantify the largest random circuit of equal width and depth that the computer successfully implements [? ].

B.4 The Bloch Sphere representation
A qubit state is a unit vector in C 2 that, using the Dirac notation, can be expressed as | ⟩ = |0⟩ + |1⟩, where the amplitudes and are complex numbers such that | | 2 + | | 2 = 1. Since a qubit has more states available than simply two levels, often it is useful to visualize it as a point of a unit sphere in a Euclidean space. As shown in the following, quantum states can be put in correspondence with the points of the so-called Bloch sphere, which is a unit sphere in a 15 https://quantum-computing.ibm.com/ 16 https://cloud.dwavesys.com/leap three-dimensional Euclidean space with the north and south pole corresponding to the computational basis states |0⟩ and |1⟩, respectively.
An immediate geometrical interpretation of a state as a vector of a unit sphere can be obtained by the map [ , ] ∈ C 2 → [Re( ), Im( ), Re( ), Im( )] ∈ R 4 but this interpretation is not helpful for visualization purposes as it relies on a four-dimensional Euclidean space.
However, since we are interested only in unitary 2-dimensional complex vectors, we can exploit the condition | | 2 +| | 2 = 1 to reduce the parameters needed to describe the quantum state, obtaining a three-dimensional representation that can be easily visualized. Specifically, we could use the following transformation It can be easily proved that for | | 2 + | | 2 = 1 then 2 + 2 + 2 = 1. Thus, the above transformation maps a state | ⟩ = |0⟩ + |1⟩ onto a point of a unit sphere in R 3 , which is referred to as the Bloch sphere ( Figure 5). Note that any point of a unit sphere in R 3 can be described using the spherical coordinate system which further reduces the number of parameters needed to describe the quantum state. In fact, given the quantum state | ⟩ = |0⟩ + |1⟩, where = | | , = | | , and | | 2 + | | 2 = 1, the spherical coordinates ( , ) can be computed