Counter-examples to an infinitesimal version of the Furstenberg conjecture

In this paper we observe that one of our main results in ‘Optimal transport and dynamics of circle expanding maps acting on measures’ [Ergod. Th. & Dynam. Sys. 33(2) (2013), 529–548] has an interesting consequence: an infinitesimal version of the Furstenberg conjecture is false in a very strong way. More precisely, we find deformations of the Lebesgue measure on the circle which are first-order invariant simultaneously for all integer multiplications modulo 1. We also correct an error in a lemma of the mentioned article. Both the proof and the statement must be corrected, but the main results of the article are not affected.

1. An infinitesimal Furstenberg conjecture 1.1. Furstenberg conjecture. While circle expanding maps have in many respects become toy models in the category of hyperbolic dynamical systems, a prominent question concerning them is still open for over half a century. CONJECTURE 1.1. (Furstenberg) If an atomless probability measure µ on the circle S 1 = R/Z is invariant under both 2 : x → 2x mod 1 and 3 : x → 3x mod 1, then µ is equal to the Lebesgue measure λ.
In the above conjecture, one can replace 2 and 3 by two multiplicatively independent integers. Even the above case is wide open in general, though a theorem of Rudolph asserts that Furstenberg's conjecture holds for measures µ having positive entropy for one of the maps 2 or 3 [Rud90] (see also [Joh92]). Many other results related to this question can be found in the literature, among which are [HS12,BLMV09]; the interested reader can for example use the answers to the MathOverflow question [Math14] as pointers.
1.2. A differential calculus perspective. In [Klo13], we started a study of the action of circle expanding maps on the set of probability measure P(S 1 ) in a differentiable setting relying on optimal transport. This setting does not truly give P(S 1 ) the structure of an (infinite-dimensional) manifold, but it makes it possible to define differentiable maps and their derivatives. The main result of [Klo13] is that the push-forward action on measures # of a C 2 expanding circle map is derivable at its absolutely continuous invariant measure, with an explicit computation of its derivative. To see how relevant this result can be in the context of the Furstenberg conjecture, let us consider how one can approach this kind of problem in a differential geometric setting.
Furstenberg's conjecture is a strong rigidity statement; in differential geometry, a common strategy to attack such questions is to aim at weaker rigidity statements. A first weakening would be to ask whether the point known to have a given property of interest (here: λ) is, rather than unique, at least isolated among points with this property? If this stays out of reach, then can we prove that it is not possible to deform this point, i.e. to find a non-constant continuous path starting at this point inside the set defined by the given property? A further weakening is to ask for first-order rigidity, i.e. to ask whether we can use the tangent space and derivatives to prove that no C 1 deformation can exist in the considered set? In the case of the Furstenberg conjecture, we have a space P(S 1 ) and two rather rich subspaces, the sets of atomless invariant measures for 2 and 3 . Let us denote these sets of fixed measures by I 2 and I 3 ; then the conjecture is that I 2 ∩ I 3 = λ. Imagine for a moment that I 2 and I 3 are some sort of differentiable submanifolds of P(S 1 ); then the various above weakenings of Furstenberg's conjecture would take the form of the following questions: (1) is λ isolated in I 2 ∩ I 3 ?; (2) is λ the sole point in its path-connected component inside I 2 ∩ I 3 ?; (3) must a C 1 curve starting at λ and lying inside I 2 ∩ I 3 be constant?
Finally, to provide a positive answer to this third weakening, the most common approach would be to prove that the intersection I 2 ∩ I 3 is 'first-order rigid' at λ, in the sense that the tangent spaces T λ I 2 and T λ I 3 intersect trivially.
1.3. The infinitesimal Furstenberg conjecture. Since I 2 and I 3 are defined (if we forget momentarily the atomless condition) as sets of fixed points for 2# and 3# , the firstorder rigidity question would reduce to ask whether the spaces E 2 , E 3 ⊂ T λ P(S 1 ) of invariant vectors for the derivatives D λ 2# and D λ 3# intersect trivially. Even if all the above speculation turns out to be wrong (e.g. I 2 and I 3 could not be anything close to submanifolds), this last question is perfectly defined in the differential setting alluded to above, and can be considered an infinitesimal version of Furstenberg's conjecture. The main purpose of this paper is to prove that this question has a negative answer.
THEOREM 1.2. The vector space E 2 ∩ E 3 ⊂ T λ P(S 1 ) of tangent vectors at λ that are simultaneously invariant under both D λ 2# and D λ 3# is infinite-dimensional.
The vector space ∞ d=2 E d of tangent vectors at λ that are simultaneously invariant under all the D λ d# is two-dimensional.
In order to keep this paper short, and to avoid repetition, we refer to [Klo13] for the definition of derivatives and tangent spaces in the set of measures; but such a result could feel very abstract and potentially artificial, so let us give a direct corollary that contains no reference to optimal transport or abstract differential geometric setting. The idea behind this corollary goes back to an insight of Otto [Ott01] related to the point of view of Benamou and Brenier [BB00] and developed in [AGS08]: by integration, smooth test functions can serve as a kind of (weak) coordinates on P(S 1 ); for simplicity, this corollary is phrased in a restricted way, using only that ∞ d=2 E d is not reduced to 0. COROLLARY 1.3. There exists a path of probability measures (µ t ) t∈(−ε,ε) with µ 0 = λ, continuous in the weak topology, with µ t atomless for almost all t, such that for all smooth functions ψ : S 1 → R and all integers d ≥ 2.
(1) The first condition ensures that µ t depends significantly on t (in particular, it avoids the degenerate and obvious choice µ t ≡ λ), while the second condition expresses that for small t, µ t is 'almost invariant' under all the push-forward maps d# . Of course, this condition can be rewritten .
(2) This corollary is intrinsically much weaker than the theorem, as differentiability in the sense of Wasserstein distance implies differentiability of the integrals of test functions, but the converse implication does not hold. For example, a curve of the form (tµ + (1 − t)ν) t is usually not differentiable (or even rectifiable) in the differential structure induced by W 2 , while the integral of any test function depends affinely on t. Nevertheless, we do not know a simpler way to get Corollary 1.3 even when restricting d to {2, 3}. Even if the Furstenberg conjecture were false and there were an atomless probability measure µ = λ invariant by 2 and 3 , the curve (tλ + (1 − t)µ) t would not work as these measures are not positive for negative t.
(3) One could try to extend this infinitesimal argument to the construction of families of counter-examples to the Furstenberg conjecture: if one of the invariant vectors we found could be extended to a vector field preserved by both 2 and 3 , then the integral curve issued from λ would be entirely made of invariant measures for both of N; see [EF10]). One can still dream of making this approach work for finitely generated multiplicative sub-semigroups, as this case is very different from larger sub-semigroups: in the former case, the remainder in the first-order Taylor formula for d# at λ can be made uniform over the generators (for a fixed simultaneously invariant tangent vector).

Proofs
Given the results of [Klo13], the only merit of Theorem 1.2 is its statement: its proof is completely straightforward. where j is any positive integer prime with 6. In particular, E 2 ∩ E 3 is infinite-dimensional. We also get that d≥2 E d is generated by p≥1 p −1 c p and p≥1 p −1 s p and is thus two-dimensional. Note that these functions are indeed in L 2 (λ), and therefore represent tangent vectors at λ.
The proof of the corollary is also easy; it relies mostly on the following pointwise version of the continuity equation. We do not claim any novelty in the lemma below, but still provide a simple proof of the simple case we need.
LEMMA 2.1. Assume that (µ t ) t is a curve of probability measures on S 1 which is differentiable at 0 with tangent vector v ∈ T µ 0 P(S 1 ), in the sense that W 2 (µ t , µ 0 + tv) = o(t) (be reminded that µ 0 + tv := (Id +tv) # µ 0 is the image of tv by the exponential map at µ 0 ).
Then, for all smooth functions ψ, we have Note that this lemma is mostly relevant in the case when µ 0 is regular in the sense of Gigli (i.e. in dimension one, atomless) since then all 'tangent vectors' at µ 0 are indeed represented by a vector field v ∈ L 2 (µ 0 ). In a more general manifold, the same result would hold with ψ a compactly supported function, and ∇ψ instead of ψ .
Proof. First, we observe that, denoting by t an optimal transport plan from µ t to µ 0 + tv, we have so that we can use µ 0 + tv to estimate the derivative of the integral of ψ. Next, we have Note that the finiteness of v L 2 (µ 0 ) is part of the definition of the tangent space at µ 0 . Now the claimed equality follows readily from these two estimates.
Proof of Corollary 1.3. Let v ∈ d≥2 E d be a non-zero tangent vector at λ invariant under all the D λ d# , and define µ t = λ + tv. By definition of the tangent space at λ, v dλ = 0, so that v has a well-defined antiderivative. Let ψ 0 be a smooth approximation of one of its antiderivatives, so that ψ 0 v dλ v 2 dλ is non-zero. Then the pointwise continuity equation implies that Moreover, the invariance of v means that the curve ( d# µ t ) is also differentiable at t = 0, with tangent vector v. In consequence, we get for all smooth functions ψ: The weak continuity of (µ t ) is obvious, and the fact that µ t is atomless for almost all t was proved in [Klo13].

Corrigendum
The proof of [Klo13, Lemma 4.2] is flawed: the Wasserstein distance between ρλ and ρλ is independent of t, so that the right-hand side of the first inline equation should be εt + 2 −3/2 ε rather than (1 + 2 −3/2 )εt. This mistake can be corrected by estimating how well a piecewise-constant density with k pieces of equal length can approximate the given density. Then the issue is moved to the main argument: in order to ultimately get a o(t) remainder, we need to take advantage of the presence of many overlaps (as in [Klo13, Figure 2]), which only exist if k increases not too fast with respect to t. This can be ensured by adding a regularity hypothesis.
The positivity assumption may not be necessary, but at the very least simplifies the proof. Note that in [Klo13], we only used Lemma 4.2 for positive C 1 densities, so all main results stand without modification as soon as we prove the corrected lemma.
Proof. We prove the case n = 2, since the general case can then be deduced by induction.
Let ε be any positive number, and consider vector fieldsv i (i = 1, 2) that are constant on the intervals [ j/k 1 , ( j + 1)/k 1 ) for some k 1 and all j < k 1 and such that v i − v i L 2 (ρλ) ≤ ε. Note that k 1 and thev i are chosen to depend only on ε, not on t; in particular, v 1 −v 2 ∞ is finite and independent of t. Now consider any value of t, to be taken small enough a few times below. Let k = k(t) be a multiple of k 1 having the magnitude of (1/t) 1/(1+β/2) , where β is the Hölder exponent of ρ, say kt 1/(1+β/2) ∈ [1, 2]. We defineρ as the density that is constant on each I j = [ j/k, ( j + 1)/k), of valuē ρ j := k I j ρ dλ. Denoting by C the Hölder constant of ρ, we get We denote byv i ( j) the value ofv i on I j ; observe that when t is small, these values are the same on many successive intervals since k is much larger than k 1 .
Let us first bound above W(ρλ,ρλ). We consider the monotone rearrangement fixing 0 as transport plan; by definition ofρ, it preserves each I i . To simplify notation, let us bound the cost due to the mass located in I 0 , the other intervals behaving in exactly the same way. The cumulative distribution functions of ρλ andρλ are given by The monotone rearrangement is given on I 0 by the map T = G −1 • F, so that the contribution of I 0 to its cost is Since the mass lying in I 0 isρ 0 /k (for both densities), the ratio cost per mass is bounded above by C 2 ρ 2 0 k 2+2β ≤ C 2 (min ρ) 2 k 2+2β .
Since this holds in all intervals I i , the overall cost is bounded by the same value, so that W(ρλ,ρλ) ≤ C min ρ 1 k 1+β .
The same argument also yields W(ρλ + v,ρλ + v) ≤ C min ρ 1 k 1+β for any vector field v which is constant on each I j : indeed, if is a transport plan from ρλ toρλ, then (Id +v, Id +v) # is a transport plan from ρλ + v toρλ + v whose cost is not greater than the cost of (for each bit of mass moved from x to T (x) by , this new plan moves the same amount of mass from x + v(x) to T (x) + v(T (x)); the hypothesis that v is constant on each I j then ensures that v(T (x)) = v(x)). Applying this to v = t α ivi , we get Applying the same reasoning to each v = tv i separately and concatenating the corresponding transport plan also yields W α i (ρλ + tv i ), α i (ρλ + tv i ) ≤ C min ρ 1 k 1+β .