Mixed-state entanglement from local randomized measurements

We propose a method for detecting bipartite entanglement in a many-body mixed state based on estimating moments of the partially transposed density matrix. The estimates are obtained by performing local random measurements on the state, followed by post-processing using the classical shadows framework. Our method can be applied to any quantum system with single-qubit control. We provide a detailed analysis of the required number of experimental runs, and demonstrate the protocol using existing experimental data [Brydges et al, Science 364, 260 (2019)].

We propose a method for detecting bipartite entanglement in a many-body mixed state based on estimating moments of the partially transposed density matrix. The estimates are obtained by performing local random measurements on the state, followed by post-processing using the classical shadows framework. Our method can be applied to any quantum system with single-qubit control. We provide a detailed analysis of the required number of experimental runs, and demonstrate the protocol using existing experimental data [Brydges et  Engineered quantum many-body systems exist in today's laboratories as Noisy Intermediate Scale Quantum Devices (NISQ) [1]. This provides us with novel opportunities to study and quantify entanglement -a fundamental concept in both quantum information theory [2] and many-body quantum physics [3,4]. For pure (or nearly-pure) states, entanglement has been detected by measuring the second Rényi entropy [5][6][7][8][9][10]. This has been achieved via, for instance, many-body quantum interference [7-9, 11, 12] (see also [13,14]) and randomized measurements [10,[15][16][17][18]. However, many states of interest are actually highly mixed -either because of decoherence, or because they describe interesting subregions of a larger, globally entangled, system. Developing protocols which detect and quantify mixed-state entanglement on intermediate scale quantum devices is thus an outstanding challenge.
Below we propose and experimentally demonstrate conditions for mixed-state entanglement and measurement protocols based on the positive partial transpose (PPT) condition [2,5,19]. Consider two partitions A and B described by a (reduced) density matrix ρ AB . The well-known PPT condition checks if the partially transposed (PT) density matrix ρ T A AB [20] is positive semidefinite, i.e. all eigenvalues are non-negative. If the PPT condition is violated -i.e. ρ T A AB does have negative eigenvalues -A and B must be entangled. It is possible to turn the PPT condition into a quantitative entanglement measure. The negativity N (ρ AB ) = λ<0 |λ|, with λ the spectrum of ρ T A AB , is positive if and only if the underlying * These authors contributed equally.
state ρ AB violates the PPT condition [21]. While applicable to mixed states, computing the negativity requires accurately estimating the full spectrum of ρ T A AB . We bypass this challenge by considering moments of the partially transposed density matrix (PT-moments) instead: p n = Tr[(ρ T A AB ) n ] for n = 1, 2, 3, . . ..
These have been first studied in quantum field theory to quantify correlations in many-body systems [22]. Clearly, p 1 = tr(ρ AB ) = 1, while p 2 is equal to the purity tr[ρ 2 AB ] (see Table I in the Supplemental Material [23] (SM) for a visual derivation). Hence, p 3 is the lowest PT-moment that captures meaningful information about the partial transpose.
In this letter, we first show that the first three PTmoments can be used to define a simple yet powerful test for bipartite entanglement: The p 3 -PPT condition is the contrapositive of this assertion: if p 3 < p 2 2 , then ρ AB violates the PPT condition [see Fig. 1a)] and must therefore be entangled. Similar to the PPT condition, the p 3 -PPT condition applies to mixed states and is completely independent of the state in question. This is a key distinction from entanglement witnesses [24,25], which can be more powerful, but which usually require detailed prior information about the state. While weaker than the full PPT condition, the p 3 -PPT condition relies on comparing two comparatively simple functionals and outperforms other state-independent entanglement detection protocols, like comparing purities of various nested subsystems [5,[7][8][9][10]23].  [1,2,3], [4,5,6] [1,2,3], [6,7,8] [1,2,3], [8,9,10] [1], [2] [ 1,2], [3,4] [1, 2, 3], [4,5,6]  can be used to demonstrate mixed-state bipartite entanglement with PT-moments. Separable states are PPT states and also fulfill the p3-PPT condition. Thus, quantum states which violate the p3-PPT condition must be bipartite entangled [see also Eq. (2)]. b) In our protocol, PT-moments are measured by applying local random unitaries followed by computational basis measurements. c-d) Violation of the p3-PPT condition, i.e. p 2 2 > p3, is experimentally observed for connected c) and disconnected (separated by d = 0, 2, 4 spins) d) partitions A and B at various times t after a quantum quench [10]. Dots: experimental results. Error bars: Jackknife estimates of statistical errors. Lines: numerical simulations including the decoherence model presented in Ref. [10].
The second main contribution of this letter is a measurement protocol to determine PT-moments in NISQ devices. Crucially, we employ randomized measurements implemented with local (single-qubit) random unitaries, see Fig. 1b) which are readily available in NISQ devices and have been already successfully applied to measure entanglement entropies, many-body state-fidelities, and out-of-time ordered correlators [10,[26][27][28]. In contrast to previous proposals for measuring PT-moments, our protocol does not rely on many-body interference between identical state copies [6,29,30], or on using global entangling random unitaries [31] built from interacting Hamiltonians [16,[32][33][34]. Instead, it only requires single-qubit control, and allows for the estimation of many distinct PT-moments from the same data. In particular, arbitrary orders n ≥ 2 and arbitrary (connected, as well as disconnected) partitions A, B can be measured.
While the experimental setup for our measurement protocol is reminiscent of quantum state tomography [35][36][37][38], there are fundamental differences regarding the required number of measurements (as independent state copies), and the way the measured data is processed. Without strong assumptions on the state [36,37], performing tomography to infer an -approximation of an unknown density matrix ρ AB (e.g. in order to subsequently compute -approximations of p n ) requires (at least) order 2 |AB| rank(ρ AB )/ 2 measurements [39,40]. In the high accuracy regime ( 1), our direct estimation protocol instead only requires order 2 |AB| / 2 measurements. For highly mixed states -the central topic of this work -this discrepancy heralds a significant reduction in measurement resources. Furthermore, we predict PT-moments through a 'direct' and (multi-)linear post-processing of the measurement data represented as 'classical shadows' [18]. Thus, data processing is cheap -both in memory and runtime -and can be massively parallelized. Similar to previous protocols based on randomized measurements [10,15,16,18,27,28,[41][42][43][44], this is another distinct advantage over tomography which typically requires expensive data-processing algorithms [35] or training a neural network [37].
In the last part of this letter, we demonstrate our measurement protocol and the p 3 -PPT condition experimentally in the context of the quantum simulation of manybody systems. Here, PT-moments have been shown to reveal universal properties of quantum phases of matter [22,[45][46][47][48] and their transitions [22,45,49,50]. Out of equilibrium, PT-moments allow to understand the dynamical process of thermalization [51][52][53][54], and the fate of (many-body) localization in presence of decoherence [55].
In this work, we analyze the data of Ref. [10] corresponding to the out-of-equilibrium dynamics in a spin model with long-range interactions, which was implemented in a 10-qubit trapped ion quantum simulator. In particular, we certify the presence of mixed-state entanglement via the p 3 -PPT condition [see Fig. 1(c-d), and for details below]. Furthermore, we monitor the time-evolution of p 3 and observe dynamical signatures of entanglement spreading and thermalization, which can be directly understood in the quasi-particle picture developed in quantum field theory [51,52].
Protocol-The experimental ingredients to measure PT-moments build on resources similar to the ones presented in Ref. [16] and realized in Ref. [10] to measure Rényi entropies. The key new element is the postprocessing of the experimental data [18]. As shown in Fig. 1, the quantum state of interest is realized in a system of N qubits. In the partitions A and B, consisting of |A| and |B| spins, respectively, a randomized measurement is performed by applying random local unitaries u = u 1 ⊗ · · · ⊗ u |AB| , with u i independent single qubit rotation sampled from a unitary 3-design [32,56], and a subsequent projective measurement in the computational basis with outcome k = (k 1 , . . . , k |AB| ). This is subsequently repeated with M different random unitaries such that a data set of M bitstrings k (r) with r = 1, . . . , M is collected.
From this data set, the PT-moments p n can be estimated without having to reconstruct the full density matrix ρ AB on the classical post-processing device, and with an exponentially smaller number of experimental runs M than required for full quantum state tomography. To obtain such estimates, we rely on two observations. First, each outcome k (r) can be used to define an unbiased es-timatorρ (r) of the density matrix ρ AB , i.e. E[ρ (r) AB ] = ρ AB with the expectation value taken over the unitary ensemble and projective measurements [17,18,57]. Second, the PTmoments p n can be viewed as an expectation value of a n-copy observable − → Π A ← − Π B evaluated on n-copies of the original density matrix ρ AB , Here, A , . . . , k that act on the partitions A and B, respectively.
Estimators of the PT-moments p n can now be derived from Eqs. (3) and (4) using U-statistics [58]. Replacing ρ ⊗n withρ (r1) ⊗ · · · ⊗ρ (rn) where r 1 = r 2 = · · · = r n , corresponding to independently sampled random unitaries u (r1) , . . . , u (rn) , we define the U-statistiĉ It follows from the defining properties of U -statistics thatp n is an unbiased estimator of p n , i.e. E[p n ] = p n with the expectation value taken over the unitary ensemble and projective measurements [58]. Its variance governs the statistical errors arising from finite M and is analyzed in detail below. Furthermore, a quick inspection of Eqs. (3) and (4) reveals that the summands in Eq. (5) completely factorize into contractions of single qubit matrices, Tr as in Eq. (3). Thus, given M observed bitstrings k r , one can determinep n with classical data processing scaling as M n |AB|, without storing exponentially large matrices on the classical postprocessing device.
Statistical errors-As demonstrated in Fig. 1 (c,d) and explained in detail below (see also Figs. 3 and 4), PTmoments can be inferred in practise using a finite number of experimental runs M . Here, we investigate in detail the statistical errors arising from finite M which determine the statistical uncertainty of the experimental results. To this end, we use analytical calculations and numerical simulations.
Analytically, we bound statistical errors based on the variance of the multi-copy observable in question. For p 2 = Tr[(ρ T A AB ) 2 ], our analysis reveals that the error decay rate depends on number of measurements M . In the large M regime, the error is proportional to 2 |AB| p 2 / √ M . This error bound is multiplicative -i.e. the size of the error is proportional to the size of the target p 2 -and 1/ √ M captures the expected (and unavoidable) decay rate for an estimation procedure that relies on empirical averaging. For small and intermediate values of M , the estimation error is instead bounded by 8 × 2 1.5|AB| /M . While this is worse in terms of constants, the error decays at a much faster rate proportional to 1/M in this regime. Qualitatively similar results apply for estimating p 3 = Tr[(ρ T A AB ) 3 ], but there can be three decay regimes. For large M , the estimation error is bounded by 2 |AB| p 2 2 / √ M . This again captures the asymptotically optimal rate 1/ √ M associated with empirical averaging, but the constant is suppressed by p 2 2 , not p 3 itself. For intermediate M , the error decay rate is proportional to 1/M , while an even faster rate ∝ 1/M 3/2 governs the error decay for small M . We refer to the SM for detailed statements and proofs.
To test these predictions numerically, we consider a (sub)system of N = |AB| qubits where a pure state ρ = |φ φ| is prepared. Here, A corresponds to the first N/2 qubits, and B is the complement. We numerically calculate the statistical errors for both a GHZ state, and the ground state of the transverse Ising model In contrast, the early regime decay rate is not as pronounced. This is likely due to limited system sizes -1/M 3/2 does appropriately capture the decay of red dots (largest system size con-  [10]. Different colors correspond to different times after the quantum quench with purple [0 ms] corresponding to the initial product state. For each time, M = 500 unitaries and P = 150 measurements per unitary were used. Lines: theory simulation including decoherence [10]. The ratio p 2 2 /p3, detecting entanglement according to the p3-PPT condition, is shown in Fig. 1 c) and d).
sidered) in the top left corner, but seems to be largely absent in decay rates for smaller system sizes.
What is more, Fig. 2 highlights that the number of measurement repetitions necessary to achieve a desired accuracy scales with subsystem size |AB| as 2 |AB| . This compares favorably with other approaches -like full quantum state tomography -and enables the estimation of PT-moments in state of the art platforms with high repetition rates. p 3 -PPT condition-In the following, we will apply the p 3 -PPT condition (2) on experimental data. This condition is proven in the SM based on inequalities between Schatten-p norms. Note that this condition is in particular invariant under local unitary operations (as the negativity), and therefore does not rely on a reference frame. Also, as shown in the SM, the p 3 -PPT condition and the PPT condition are equivalent for Werner states (here, they are necessary and sufficient conditions for bipartite entanglement [59]).
PT-moments and entanglement spreading in a trappedion quantum simulator-Below, we discuss the experimental demonstration of the measurement of PTmoments in a trapped ion quantum simulator. To this end, we evaluate data taken in the context of Ref. [10]. Here, the Rényi entropy growth in quench dynamics of interacting spin models was investigated. The system, consisting in total of N = 10 qubits, was initialized in the Néel state |↑↓↑↓ . . . , and time-evolved with with σ z i the third spin-1/2 Pauli operator, σ + i (σ − i ) the spin-raising (lowering) operators acting on spin i, and J ij ≈ J 0 /|i − j| α the coupling matrix with an approximate power-law decay α ≈ 1.24 and J 0 = 420s −1 . After time evolution, randomized measurements were performed, using M = 500 random unitaries and P = 150 projective measurements per random unitary.
From this data, PT-moments can be inferred [60], with results presented in Fig. 3. For the purity p 2 a) b), we observe good agreement with the theory model for up to N = 8 qubits partitions, in particular the raise of p 2 for partition sizes approaching the total system size which is expected for such nearly pure states. For 9, 10-qubit partitions, the data is not shown since the relative statistical error of the estimated data points approaches unity [61]. We however note that the measuredp 2 is slightly underestimated. This is due to miscalibration effects, imperfect realizations of the random unitaries, which tend to reduce the estimation of the overlap Tr(ρ r1 ρ r2 ). This effect is also present when measuring cross-platform fidelities via randomized measurements [27]. For the third PT-moment p 3 c) d), we observe the same kind of agreement between theory value and experimental measurements. In particular, at large partition sizes, the protocol is able to measure with high precision small values of p 3 . These small values are indeed fundamental to detect entanglement in this system: a PPT violating state has a negative eigenvalue which reduces the value of p 3 , in comparison with the purity p 2 . This effect is mathematically captured by the p 3 -PPT condition and allowed us to detect PPT violation and thus entanglement for manybody mixed states [see Fig. 1c)]. In Appendix C, we present additional simulations showing that the power of the p 3 -PPT condition, in comparison with the negativity and the condition based on purities of nested subsystems [23].
The third PT-moment p 3 does not only allow to detect mixed-state entanglement. It can also be used to study the dynamics of entanglement in various manybody quantum systems [22,[49][50][51]55]. Here, we analyze the behavior of the dimensionless ratio , which, as shown in quantum field theory, follows the same universal behavior as the negativity during time evolution with a local many-body Hamiltonian [51]. We remark that R 3 is however only well-defined for states with p 3 > 0 (see Appendix B of [23] for a counter-example). Furthermore, R 3 is not an entanglement monotone [55]. It vanishes for all product states, but can still be strictly positive for certain separable states [2,55]. Fig. 4 illustrates the time evolution of R 3 for (a) connected and (b) disconnected subsystems AB, respec- [3,4] [1, 2], [4,5] [1, 2], [5,6] [1, 2], [6,7]  tively. The appearing peaks of R 3 have been predicted and analyzed for various one-dimensional quantum systems subject to local interactions [51,52] (and have also been studied in the context of Rényi mutual information [62]). They can be understood in terms of ballistically propagating quasi-particles which describe collective excitations in the system [51,52]. In this picture, entanglement between two partitions A and B is induced by the presence of entangled pairs of quasi-particles shared between A and B. For each pair, the individual quasiparticles propagate in opposite directions and start to entangle, in the course of the time evolution, partitions that are more and more separated [51,52]. In particular, for two adjacent partitions (a), R 3 increases at early times, which is consistent with the picture of shared pairs of entangled quasi-particles entering the two partitions immediately. After a certain time R 3 reaches a maximum and starts to decrease, which can be understood as the time when the quasi-particles start to escape the region AB. For separated partitions with finite distance (b), the peaks are delayed in time as a consequence of finite speed of propagation of the quasi particles. In addition, their maximum value is lowered because of the finite life-times of quasi-particles. The latter feature is characteristic to chaotic (non-integrable) thermalizing systems [62] and is in our case further enhanced by decoherence (compare simulations with/without decoherence in the figures).
Conclusion-Our protocol extends the paradigm of randomized measurements, yielding the first direct measurement of PT-moments in a many-body system. Ustatistics provides the key ingredient there and enables us to harness a remarkable advantage over quantum state tomography in terms of statistical errors. At a fundamental level, it is therefore natural to investigate how to access new important physical quantities based on random measurement data, and with significant savings in terms of measurement and classical postprocessing over existing methods. This approach can be used to derive protocols to directly infer entanglement measures (including nonpolynomial functions of the density matrix), such as the von-Neumann entropy and the negativity.
Acknowledgments-We are grateful to Alireza Seif who pointed out interesting error scaling effects for classical shadows in a Scirate comment addressing Ref. [18]. We Ref. [10] into unitaries M and projective measurements per unitary P has been optimized for the purity estimator p2 presented in Ref. [10] which differs fromp2 defined in Eq. (5). Thus, for the present data set [10] with M = 500 and P = 150, the statistical uncertainty of thep2 is smaller than forp2 which performs best for P = O(1) and a correspondingly larger number of unitaries M .
Appendix A: The p3-PPT condition In this section we present, prove and discuss the p 3 -PPT condition. The p 3 -PPT condition is the contrapositive of the following statement about moments of positive semidefinite matrices with unit trace. Proposition 1. For every positive semidefinite matrix X with unit trace (Tr(X) = 1) it holds that tr(X 2 ) 2 ≤ tr(X 3 ). (A1) Note that Eq. (A1) resembles the following well-known monotonicity relation among Rényi entropies (see e.g., Ref. [64]): 1−n log 2 tr(ρ n ) . (A2) However, this relation only applies to density matrices, i.e. positive semidefinite matrices with unit trace. The p 3 -PPT condition, in contrast, is designed to test the absence of positive semidefiniteness. Hence, it is crucial to have a condition that does not break down if the matrix in question has negative eigenvalues. Rel. (A1) (and its direct proof provided in the next subsection) do achieve this goal, while an argument based on monotonicity relations between Rényi entropies can break down, because the logarithm of non-positive numbers is not properly defined.

Proof of the p3-PPT condition
Let X be a Hermitian d × d matrix with eigenvalue decomposition X = d i=1 λ i |x i x i |. For p ≥ 1, we introduce the Schatten-p norms denotes the (matrix-valued) absolute value. The Schatten-p norms encompass most widely used matrix norms in quantum information. Concrete examples are the trace norm (p = 1), the Hilbert-Schmidt/Frobenius norm (p = 2) and the operator/spectral norm (p = ∞). Each Schatten-p norm corresponds to the usual vector p -norm of the vector of eigenvalues λ = (λ 1 , . . . , λ d ) T ∈ R d : Hence, Schatten-p norms inherit many desirable properties from their vector-norm counterparts. Here, we shall use vector norm relations to derive a relation among Schatten-p norms. It is based on Hoelder's inequality that relates the inner product to a combination of p norms.
Fact 1 (Hoelder's inequality for vector norms). Fix p, q ≥ 1 such that 1/p + 1/q = 1. Then, The well-known Cauchy-Schwarz inequality is a special case of this fact. Set p = q = 1/2 to conclude At the heart of our proof for the p 3 -PPT condition is a simple relation between Schatten-p norms of orders p = 1, 2, 3.

Inserting this relation into Eq. (A7) reveals
which is equivalent to the claim (take the 3rd power and divide by X 2 2 ).
Proposition 1 is an immediate consequence of Lemma 1 and elementary properties of positive semidefinite matrices. Recall that a Hermitian d × d matrix is positive semidefinite (psd) if every eigenvalue is nonnegative. This in turn ensures |X| = X and, by extension, X p = Tr(X p ) 1/p for all p ≥ 1.

Discussion and potential generalizations
The p 3 -PPT condition tests the absence of positive semidefiniteness based on moments Tr(X p ) of order p = 1, 2, 3. It is natural to wonder whether higher order moments allow the construction of more refined tests. It is possible to show that every positive semidefinite matrix X with unit trace must obey tr(X p−1 ) p−1 ≤ tr(X p ) p−2 for all p > 2. (A8) As this is a direct extension of the p 3 -PPT condition (p = 3), we omit the proof. Unfortunately, we found numerically that these direct extensions actually produce weaker tests for the absence of positive semidefiniteness, i.e. there exist matrices X that violate the p 3 -PPT condition but satisfy Rel. (A8) for higher moments p ≥ 4. This is not completely surprising, since Rel. (A8) compares (powers of) neighboring matrix moments with order (p − 1) and p. As p increases, these matrix moments suppress contributions of small eigenvalues ever more strongly. In the case of partially transposed quantum states, the eigenvalues of are required to sum up to one and must be contained in the interval [−1/2, 1] [65]. Thus, the negative eigenvalues can never dominate the spectrum and high matrix moment tests for the existence of negative eigenvalues suffer from suppression effects. This observation suggests that powerful tests for negative eigenvalues should involve all matrix moments tr(X p ) up to a certain order p max . It is useful to change perspective in order to reason about potential improvments. The p 3 -PPT condition checks whether the following inequality is true: For matrices X with unit trace, we can reinterpret the matrix-valued function F 3 (X) as a sum of (identical) degree-3 polynomials applied to all eigenvalues λ 1 , . . . , λ d of X. Set p 2 = tr(X 2 ) and use tr(X) = d i=1 λ i = 1 to conclude Note that the polynomial depends on p 2 and, by extension, also on the matrix X. We will come back to this aspect later. For now, we point out that -regardless of the actual value of p 2this polynomial has three interesting properties: These properties reflect the behavior of another wellknown function -the (negated) rectifier function (ReLU): See Figure 5 for a visual comparison. Applying the (negated) rectifier function to the eigenvalues of X would recover the negativity: Hence, it is instructive to interpret F 3 (X) as a polynomial approximation to the (non-analytic) negativity function.
On the level of polynomials, the condition f 3 (x) ≤ 0 whenever x > 0 is most important. It implies that positive eigenvalues of X can never increase the value of F 3 (X) = d i=1 f 3 (λ i ). In particular, F 3 (X) ≤ 0 whenever X is positive semidefinite -as stated in Proposition 1. The p 3 -PPT condition is sound, i.e. it has no false positives.
Conversely, f 3 (x) > 0 for x < 0 implies that F 3 (X) can become positive if X has negative eigenvalues. Hence, the p 3 -PPT condition is not vacuous. It is capable of detecting negative eigenvalues in many, but not all, unittrace matrices X.
Let us now return to the (matrix-dependent) parameter choice in Eq. (A11). In principle, every polynomial of the form f We can optimize this expression over the parameter a ∈ R to make the test as strong as possible. The optimal choice is a = tr(X 2 )/tr(X) and produces a matrix 3 (X) for X fixed. If X has also unit trace, the optimal parameter becomes a = p 2 and produces the p 3 -PPT condition (A9).
This construction of PPT conditions readily extends to higher order polynomials f p (x) = a p x p + · · · a 1 x + a 0 . Increasing the degree p produces more expressive ansatz functions that can approximate the (negated) rectifier function -and its core properties -ever more accurately. Viewed from this angle, it becomes apparent that measuring more matrix moments can produce stronger tests for detecting negative eigenvalues. However, it is not so obvious how to choose the parameters a p , . . . , a 0 "optimally", or what "optimally" actually means in this context. Some well-known polynomial approximations of the rectifier function r(−x) -like Taylor expansions of s(−x) = ln(1 + e −x ) (the "softplus" function) -are not well-suited for this task, because s(−x) > 0 even for x > 0. This in turn would imply that the associated test condition may not be sound. We believe that a thorough analysis of these questions is timely and interesting, but would go beyond the scope of this work. We intend to address it in future research.

Appendix B: p3-PPT condition for Werner States
Werner states are bipartite quantum states in a Hilbert space with parameter α ∈ [0, 1] and Π ± = 1 2 (I ± Π 12 ) projectors onto symmetric H + and anti-symmetric H − subspaces of H = H + ⊕ H − , respectively [59]. Here, Π 12 = d i,j=1 |i j| ⊗ |j i| is the swap operator. We note that the eigenvalues of ρ W are thus given as λ The reduced state ρ A of qudit A is given by Using furthermore that Π T A ± = 1/2(∆ 1 ± (d ± 1)∆ 0 ) with ∆ 0 = |φ + φ + | being a projector onto the maximally entangled state and ∆ 1 = I − ∆ 0 [59], we find with eigenvalues λ 0 = (2α − 1)/d with multiplicity 1 and We note that, for any d, λ 0 < 0 for 0 ≤ α < 1/2. Thus, using the PPT condition, we find that ρ W is entangled for 0 ≤ α < 1/2. Using the explicit expression of the eigenvalues, we can furthermore determine Tr (ρ T A ) n for any n. We find for all local dimensions d Thus, for Werner states the p 3 -PPT condition is equivalent to the full PPT condition. It can be furthermore shown that Werner states are separable for α ≥ 1/2 [59]. Thus, for Werner states, the p 3 -PPT condition is a necessary and sufficient condition for bipartite entanglement. This also holds true for "isotropic" states of the form ρ = α1/d 2 + (1 − α)|φ + φ + |, which are closely related. We note that Werner states can have non-positive PTmoments. For local dimension d > 3 there exists a parameter interval [0, α * ) such that the associated Werner state (B1) obeys p 3 = Tr (ρ T A W ) 3 < 0 for all α ∈ [0, α * ). This highlights that the logarithm of PT-moments, appearing also in the ratio R 3 = − log 2 (p 3 /Tr[ρ 3 ]), need not be properly defined, justifying a claim from the previous subsection. It is difficult to use entropic arguments for reasoning about relations between (logarithmic) PTmoments.

Appendix C: Comparison of entanglement conditions for quench dynamics
In this section, we compare the diagnostic power of the full PPT-condition, the p 3 -PPT condition and a condition based on purities of nested subsystems to detect bipartite entanglement of mixed states. Specifically, given a reduced density matrix ρ AB in a bipartite system AB, we consider: 1. the PPT-condition detecting bipartite entanglement between A and B for a strictly positive negativity N (ρ AB ) = λ<0 |λ| > 0, with λ the spectrum of ρ The latter 'purity' condition was used in previous experimental works measuring the second Rényi entropy [7][8][9][10] to reveal bipartite entanglement of weakly mixed states.
To test these conditions, we consider here, as an example, quantum states generated via quench dynamics in interacting spin models. Specifically, we study quenches in the XY -model with long-range interactions, as defined in Eq. (6) in a total system with N = 10 spins. The initial separable product state is a Néel state |↑↓↑↓ . . . . As shown in Fig. 6, the negativity (red lines) detects bipartite entanglement for all partitions size and all times after the quench. The p 3 -PPT condition (blue lines) performs similar for the partitions considered in panel (b) and (c) and is thus able to detect bipartite entanglement for highly mixed states ρ AB whose purity decreases to 0.3 for the panel (b) at late times. The p 3 -PPT conditions fails however to detect the entanglement for the close-to completely mixed states of small partitions |AB| = 4 at late times, displayed in panel (a). This can be attributed to the fact that the p 3 -PPT condition only relies on low order PT-moments. The purity condition (green lines) is only useful for the detection of entanglement for large partitions AB with |AB| = 8 (panel (c)). These remain weakly mixed during the entire time evolution, since the total system of N = 10 spins is described here by a pure state.

Appendix D: Error bars for PT moment predictions
Let us first review the data acquisition procedure. To obtain meaningful information about an N -qubit state ρ, we first perform a collection of random single qubit rotations: ρ → uρu † , where u = u 1 ⊗ · · · ⊗ u N and each u i is chosen from a unitary 3-design. Subsequently, we perform computational basis measurements and store the outcome: Here, k 1 , . . . , k N ∈ {0, 1} denote the measurement outcomes on qubits 1, . . . , N . As shown in [17,18,57], the outcome of this measurement provides a (single-shot) estimate for the unknown state: This tensor product is a random matrix -the unitaries u (i) , as well as the observed outcomes k i are randomthat produces the true underlying state in expectation: Thus, the result of a (randomly selected) single-shot measurement provides a classical snapshot (D2) that reproduces the true underlying state in expectation. This desirable feature extends to density matrices of subsystems. Let AB ⊂ {1, . . . , N } be a subset of |AB| ≤ N qubits and let ρ AB = Tr ¬AB (ρ) the associated reduced density matrix. Then, This feature can be used to estimate linear properties of the subsystem in question: o = Tr (Oρ AB ). Perform M independent repetitions of the data acquisition procedure and use them to create a collection of (independent) snapshotsρ The remaining (single-shot) variance obeys the following useful relation. The formalism introduced above readily extends to predictions of higher order polynomials. The special case of quadratic functions has already been addressed in Refs. [10,[15][16][17], and Ref. [18] (for the present formalism). The key idea is to represent a quadratic function in ρ as a linear function on the tensor product ρ ⊗ ρ: This function can be approximated by replacing ρ ⊗ ρ with a symmetric tensor product of two distinct snapshotsρ (i) ,ρ (j) (i = j): There are M 2 different ways of combining a collection of M snapshotsρ (1) , . . . ,ρ (M ) in this fashion. We can predict o = Tr(Oρ AB ⊗ ρ AB ) by forming the empirical average over all of them: Here, Π AB denotes the swap operator that permutes the entire subsystems AB within two copies of the global system. We refer to Table I below for a visual derivation of this well-known relation. The swap operator is symmetric under permuting tensor factors, Hermitian (Π † AB = Π AB ) and orthogonal (Π 2 AB = I AB ). These properties ensure that the associated general estimator (D11) can be simplified considerably: By construction, E [p 2 ] = p 2 = Tr(ρ 2 ) and the speed of convergence is controlled by the variance. This variance decomposes into a linear and a quadratic part. We expand the definition of the variance: The size and nature of each contribution depends on the relation between the indices i, j, k, l [58]: 1. all indices are distinct: distinct indices label independent snapshots. In this case the expectation value factorizes completely and produces AB ) = Tr(ρ 2 AB ) 2 . This is completely offset by the subtraction of the expectation value squared. Hence, terms where all indices are distinct do not contribute to the variance.

exactly two indices coincide:
In this case, the expectation value partly factorizes, e.g. We conclude that the variance ofp 2 decomposes into linear and quadratic terms. These can be controlled via Here,ρ 1 ,ρ 2 ,ρ 3 denote independent, random realizations of the snapshotρ and we have introduced place-holders for linear (L), quadratic (Q) and cubic (C) contributions, respectively. For the task at hand, these contributions can be bounded individually and depend on the subsystem size AB: 1. linear contribution: set O = (ρ T A AB ) 2 for notational brevity. We can use Tr(Oρ T A ) = Tr(O T Aρ ) to absorb the partial transpose in the linear function. Rel. (D7) then ensures where we have also used Tr(

quadratic contribution:
We can bring We refer to Table I  The final estimate follows from exploiting Π 2 A = Π 2 B = I AB , as well as Tr ρ 2 ⊗ I 2 AB = 2 |AB| Tr(ρ 2 ). 3. cubic contribution: We can bring the cubic function 1 2 Tr 3 ) into the canonical form Tr Oρ 1 ⊗ρ 2 ⊗ρ 3 by introducing see Table I below. Here, − → Π A is a cyclic permutation that exchanges all A-systems in a "forward" fashion (A 1 → A 2 , A 2 → A 3 , A 3 → A 1 ), while ← − Π B is another cyclic permutation that exchanges all B-systems in a "backwards" fashion (B 3 → B 2 , B 2 → B 1 , B 1 → B 3 ). A staightforward extension of Rel. (D12) to cubic functions implies Two operators X and Y can be multiplied to produce another operator. This corresponds to an index contraction and is represented in the following fashion: Transposition exchanges outgoing (contravariant) and incoming (covariant) indices while the trace pairs up both indices and sums over them: We abbreviate this loop (contraction of leftmost and rightmost indices) by putting two circles at the end points of lines that should be contracted. This notation is not standard, but will considerably increase the readability of more complex contraction networks.
This basic formalism readily extends to tensor products if we arrange tensor product factors in parallel. For instance, a bipartite operator features two parallel lines on the left and on the right: The upper lines represent the system A, while the lower lines represent system B. Two important bipartite operators are the identity I (do nothing) and the swap operator Π that exchanges the systems: Rules for multiplying and contracting operators readily extend to the tensor setting. This allows us to reformulate well-known expressions pictorially. For instance, The wiring formalism is also exceptionally well-suited to capture partial operations, like the partial transpose: These elementary rules can be used to visually represent more complicated expressions -like a trace of multiple partial transposes. The wiring formalism provides a pictorial representation for such objects and a visual framework for modifying them. In particular, it is possible to bend, as well as unentangle, index lines and rearrange tensor factors at will.   ). Subsequently, the rules of wiring calculus are used to re-arrange the diagrams (center right). Translating them into formulas (very right) produces equivalent expressions that respect the desired structure.