Newton and Bouligand derivatives of the scalar play and stop operator

We prove that the play and the stop operator possess Newton and Bouligand derivatives, and exhibit formulas for those derivatives.


Introduction.
The aim of this paper is to show that the play and the stop operator possess Newton as well as Bouligand derivatives, and to compute those derivatives. Newton derivatives are needed when one wants to solve equations F (u) = 0 for nonsmooth operators F by Newton's method with a better than linear convergence rate. Bouligand derivatives are closely related to Newton derivatives, and may be used to provide sensitivity results as well as optimality conditions for problems involving nonsmooth operators.
The scalar play operator and its twin, the scalar stop operator, act on functions u : [a, b] → R and yield functions w = P r [u; z 0 ] and z = S r [u; z 0 ] from [a, b] to R. The number z 0 plays the role of an initial condition. Their definition is given below in Section 6; alternatively, they arise as solution operators of the evolution variational inequalitẏ w(t) · (ζ − z(t)) ≤ 0 , for all ζ ∈ [−r, r], The play and the stop operator are rate-independent; in fact, they constitute the simplest nontrivial examples of rate-independent operators [Vis, BS, K, MR] if one disregards relays whose nature is inherently discontinuous. Due to (1c), their mathematical properties are closely related.
A lot is known about the play and the stop. Viewed as operators between function spaces, their typical regularity is Lipschitz (or less). In particular, they are not continuously differentiable. The question whether weaker derivatives (e.g., directional derivatives) exist was addressed, to the author's knowledge, for the first time in [BK] where it was shown that the play and the stop are directionally differentiable from C[a, b] to L p (a, b) for p < ∞. (This is not to be confused with the existence and form of derivatives of functions like t → P r [u; z 0 ](t), for which there are many results available.) The results below serve to narrow the gap between differentiability and non-differentiability of rate-independent operators. They are based on the same idea as in [BK], namely, to locally represent the play as a composition of operators whose main ingredient is the accumulated maximum.
The proofs of Newton and of Bouligand differentiability are rather similar. Nevertheless, we have chosen to elaborate them both, because the details are somewhat cumbersome and should not be placed as a burden on the reader.
The main results are given in Theorem 7.6 for the Newton derivative and in equation (109) for the Bouligand derivative. In Section 9, a slight strengthening of those results is proved, having in mind applications to partial differential equations in which the play or stop operator appear.

Notions of derivatives.
We collect some established notions of derivatives for mappings where X and Y are normed spaces, and U is an open subset of X. These notions are classical, but the terminology is not uniform in the literature.
Definition 2.1. (i) The limit, if it exists, is called the directional derivative of F at u in the direction h. It is an element of Y .
(ii) If the directional derivative satisfies for all functions r : (0, λ 0 ) → X with r(λ)/λ → 0 as λ → 0, it is called the Hadamard derivative of F at u in the direction h.
(iii) If the directional derivative exists for all h ∈ X and satisfies it is called the Bouligand derivative of F at u in the direction h.
(iv) If the Bouligand derivative has the form F ′ (u; h) = Lh for some linear continuous mapping L : X → Y , then L is called the Fréchet derivative of F at u and denoted as DF (u).
(v) The mapping F is called directionally (Hadamard, Bouligand, Fréchet, resp.) differentiable at u (in U , resp.), if the corresponding derivative exists at u (for all u ∈ U , resp.) for all directions h ∈ X. ✷ In the definition above, it is tacitly understood that the limits are taken in the sense "notequal 0".
The following well-known fact is an elementary consequence of the above definitions.
Lemma 2.2. If F is directionally differentiable and locally Lipschitz continuous at u ∈ U , then F is Hadamard differentiable at u. ✷ The notion of a Newton derivative is more recent. A mapping G : U → L(X, Y ), the space of all linear and continuous mappings from X to Y , is called a Newton derivative of F in U , if holds for all u ∈ U . It is never unique; for example, modifying G at a single point does not affect the validity of (5) in U . It turns out to be natural to allow Newton derivatives to be set-valued. For set-valued mappings we write "f : The following result (Lemma 8.11 in [IK]) shows that Bouligand and Newton derivatives are closely related.
Proposition 2.4. Let F : U → Y possess the single-valued Newton derivative D N F : U → L(X, Y ). Then F is Bouligand differentiable at u ∈ U if and only if the limit lim λ↓0 D N F (u + λh)h exists uniformly w.r.t. h ∈ X with h = 1. In this case, ✷ 3. The chain rule.
For the differentiability notions given in Definition 2.1 we present the corresponding versions of the chain rule. They are well known.
Lemma 3.1. If F 1 and F 2 are Hadamard differentiable at u resp. F 1 (u), then F 2 • F 1 is Hadamard differentiable at u, and the chain rule holds for all h ∈ X. ✷ Proof. See e.g. [BoS], Proposition 2.47.
Proposition 3.2. Let X, Y, Z be normed spaces, let U ⊂ X and V ⊂ Y be open. Let F 1 : U → Y with F 1 (U ) ⊂ V and F 2 : V → Z be locally Lipschitz and Bouligand differentiable at u ∈ U resp. F 1 (u). Then F 2 • F 1 is Bouligand differentiable at u, and the chain rule (8) holds for all h ∈ X.
Proof. The proposition is a special case of Proposition 9.3 given below, setting there X =X and Y =Ỹ .
Proof. See Proposition 3.8 in [Ulb11] for the set-valued and Proposition A.1 in [HK09]) for the single-valued case.
Note that the assumption "G 2 locally bounded" already implies that F 2 is locally Lipschitz.

The maximum functional
We consider ϕ : It is well known (see e.g. [Gir]) that ϕ is directionally differentiable on C[a, b] equipped with the maximum norm, and that where In particular, if u has a unique maximum at r ∈ [a, b], that is, M (u) = {r}, then Φ(u) = {δ r }, where δ r denotes the Dirac delta at r. Actually, this set Φ(u) coincides with ∂ϕ(u), the subdifferential of ϕ : C[a, b] → R at u in the sense of convex analysis. However, it turns out that Φ is not a Newton derivative of ϕ on C [a, b]. In fact, on X = W 1,1 (a, b) (and hence, on C[a, b] as well), neither is Φ a Newton derivative of ϕ, nor is ϕ Bouligand differentiable. This can be seen from the following example.
Here and in the sequel, we use the norm Then the function u + h λ attains its maximum at s = λ, and Consequently, h λ 1,1 → 0 but Thus, ϕ : (This argument is actually identically to the one given in [BK] for Thus, Φ is not a Newton derivative of ϕ on W 1,1 (0, 1). ✷ We will show that Φ is a Newton derivative of ϕ on C 0,α [a, b] for every α > 0, endowed with the norm For this purpose, we define We set B ε = (−ε, ε).
Lemma 4.2. Let u ∈ C [a, b]. For all ε > 0 there exists δ > 0 such that Proof. By contradiction. Let us assume that For every n ∈ N, we choose t n ∈ M 1/n (u) such that t n / ∈ M (u) + B ε . Then Here, d(t n , M (u)) denotes the distance of t n from M (u). Let {t n k } be a convergent subsequence, say t n k → t. Then by continuity we have u(t) = ϕ(u) and d(t, M (u)) ≥ ε, a contradiction.
Proof. If t ∈ M (u + h), we have for all s ∈ [a, b] Proof. This is a consequence of Lemma 4.2 and Lemma 4.3.
For a function f : I → R, I being an interval, we denote the modulus of continuity of f on I by If moreover Proof. The first inequality in (22) holds since ϕ is convex. In order to prove the second inequality, let r ∈ supp(µ) ⊂ M (u + h) be arbitrary. Then Since µ ≥ 0 and µ = 1, it follows that Integrating both sides of this inequality with respect to µ yields (24).
Then for every ε > 0 there exists δ > 0 such that the following holds: For every if h belongs to the corresponding spaces.
Proof. The estimate (25) immediately follows from Lemma 4.5 and Lemma 4.4. Since for all r, s ∈ [a, b] the remaining assertions follow from (25).
Then the mapping Φ defined in (13) is a globally bounded Newton derivative of the maximum functional ϕ on X.
In particular, for every u ∈ X there exists a nondecreasing and bounded respectively, for every h ∈ X and every µ ∈ Φ(u + h). Moreover, ϕ is Bouligand differentiable on X, and for every u ∈ X respectively, for every h ∈ X.
Proof. Let u ∈ X be given. According to (22), (26) and (27), for every ε > 0 there exists a δ > 0 such that for all h ∈ X and all µ ∈ Φ(u + h) This proves (28). That Φ is globally bounded follows immediately from its definition in (13) and the continuity of the embeddings from X to C[a, b].
Note that the estimates (28) and (29) are slightly stronger than required for Newton and Bouligand differentiability (the factor ρ u ( h X ) instead of ρ u ( h ∞ ) would suffice). They are motivated by applications to partial differential equations.
Since W 1,p (a, b) is continuously embedded into C 0,α [a, b] for α ≤ 1 − 1/p by Morrey's theorem, the assertions concerning differentiability on C 0,α [a, b] actually imply those concerning W 1,p (a, b). But, as the above exposition shows, a reference to that theorem would not shorten the argument and moreover introduce an additional constant in (27).

The accumulated maximum
We define the accumulated (or "gliding") maximum of a function u ∈ There arises the question whether the function F P D (u; h) is a derivative of F according to the notions in Definition 2.1. This depends on the choice of the function spaces X and Y for F : X → Y . It has been shown in [BK] But it turns out that, on the basis of the considerations of the previous section, we can show that F is Newton and Bouligand differentiable when we choose X as a Hölder or a Sobolev space.
Let us construct a Newton derivative of F ; as before, we do this pointwise in time. As an approximation For u ∈ C[a, b], let us define (39) We write µ t instead of µ(t); "weakly measurable" means that t → µ t , h is measurable for every h ∈ C[a, b].
Lemma 5.1. The set G(u) is nonempty for every u ∈ C[a, b].
Proof. Let m(t) = max M t (u). If t < s we have either ϕ t (u) = ϕ s (u) and M t (u) ⊂ M s (u), or ϕ t (u) < ϕ s (u) and m(t) ≤ t < m(s). Thus, m is nondecreasing and hence measurable.
In analogy to (17) we define Unfortunately, the "uniform" statement typically does not hold when the maximizer of u is not unique. As an example, consider so (41) does not hold. It turns out, however, that the complement of the set becomes small as δ becomes small. First, note that Moreover, we get Together with (44), the assertion follows.
Lemma 5.3. Let u ∈ C[a, b], δ > 0 and ε > 0 be given. Then for all γ ∈ (a, b], all h ∈ C[a, b] with h ∞,γ ≤ δ and all µ ∈ G(u + h) we have as well as Proof. We immediately obtain (47) from the corresponding statement (22) in Lemma 4.5. To prove (48) We may now apply the second part of Lemma 4.5 to conclude that (48) holds.
Since we obviously have According to Lemma 5.2 we choose δ > 0 such that . Then the first inequality in (49) follows from (52) and from the fact that x q + y q ≤ (x + y) q whenever x, y ≥ 0. To get the second inequality in (49), we replace in the above proof the expression We now prove that the accumulated maximum is Newton and Bouligand differentiable. For the convenience of the reader, we repeat the definition of the set-valued mapping G: An element L ∈ G(u) has the form Proposition 5.5. The accumulated maximum F : X → L q (a, b) is Newton as well as Bouligand differentiable, where X = C 0,α [a, b] or X = W 1,p (a, b). A Newton derivative G : X → L(X; L q (a, b)) is given by (53), it is globally bounded.
Thus, (55) holds for X = C 0,α [a, b]. In order to prove it for X = W 1,p (a, b), we first recall that Observing that h ∞,γ ≤ c p h W 1,p (a,γ) for some constant c p , we replace ε α + 2ε with ε 1/p ′ + 2c p ε in (58) and define ρ u by ρ u (δ) = (b − a) 1/q (ε u (δ) 1/p ′ + 2c p ε u (δ)) . With these modifications, the proof proceeds as before. (Alternatively, we might replace the second part of the proof by a reference to Morrey's embedding theorem.) To prove (56), we just have to replace F (u + h) − F (u) − Lh in (58) by F (u + h) − F (u) − F P D (u; h) and use the second inequality in (49) instead of the first.
That G is globally bounded follows from the estimate, valid for all u ∈ X and all L ∈ G(u), for some suitable constant c.

The scalar play and stop operators
The original construction of the play and the stop operators in [KDE70] is based on piecewise monotone input functions. A continuous function u : [a, b] → R is called piecewise monotone, if the restriction of u to each interval [t i , t i+1 ] of a suitably chosen partition ∆ = {t i }, a = t 0 < t 1 < · · · < t N = b, called a monotonicity partition of u, is either nondecreasing or nonincreasing. By C pm [a, b] we denote the space of all such functions.
For arbitrary r ≥ 0, the play operator P r and the stop operator S r are constructed as follows. (For more details, we refer to Section 2.3 of [BS].) Given a function u ∈ C pm [a, b] and an initial value z 0 ∈ [−r, r], we define functions w, z : [a, b] → R successively on the intervals [t i , t i+1 ], 0 ≤ i < N , of a monotonicity partition ∆ of u by z(a) = π r (z 0 ) := max{−r, min{r, z 0 }} , w(a) = u(a) − z(a) , and In this manner, we obtain operators By construction, The play operator satisfies for all u, v ∈ C pm [a, b] and all z 0 , y 0 ∈ R. Therefore, P r and S r can be uniquely extended to Lipschitz continuous operators which satisfy (64) for all u, v ∈ C[a, b] and all z 0 , y 0 ∈ R. The trajectories {(u(t), w(t)) : t ∈ [a, b]} lie within the subset A = {|u − w| ≤ r} of the plane R 2 whose boundary consists of the straight lines u−w = ±r.
Let (u, z 0 ) ∈ C[a, b] × R be given, let w = P r [u; z 0 ], r > 0. We define the sets of times where the trajectory of (u, w) lies in the interior of A, or on the right or the left part of ∂A, respectively.
An interval [t * , t * ] ⊂ [a, b] is called a plus interval for (u, z 0 ) if [t * , t * ] ∩ I − = ∅, and a minus interval for (u, z 0 ) if [t * , t * ] ∩ I + = ∅. It has been proved in [BK], Lemma 5.1, that on such intervals the play operator behaves like an accumulated maximum; more precisely, on a plus interval [t * , t * ] we have where Correspondingly, on a minus interval we have where The set I 0 is an open subset of [a, b]. Nothing more can be said in general, so I + ∪ I − can be an arbitrary compact subset of [a, b]. Nevertheless, I + and I − are separated in the sense that because u, w and z are continuous functions. Based on (70), it has been proved in [BK], Lemma 5.2, that, in a local sense, the play and the stop operator can be represented by a finite composition of operators arising from the accumulated maximum. More precisely, the following result holds.
and a δ > 0 such that every partition interval [t k−1 , t k ] of ∆ is a plus interval for all (v, y 0 ) ∈ U δ × R, or it is a minus interval for all (v, y 0 ) ∈ U δ × R. Here, is the δ-neighbourhood of (u, z 0 ) w.r.t the maximum norm. ✷

Newton derivative of the play and the stop
We obtain a Newton derivative for the play operator P r from the Newton derivative of the accumulated maximum on the spaces C 0,α [a, b] and W 1,p (a, b), where 0 < α ≤ 1 and 1 < p ≤ ∞. Within this section, X denotes any one of those spaces. We fix r > 0 and (u, z 0 ) ∈ X × R, and choose ∆ and δ according to Proposition 6.1.
We define ψ k : (X ∩ U δ ) × R → R by (see (61) for the definition of π r ) depending on whether [t k−1 , t k ] is a plus or a minus interval. Moreover, we define w k : The mapping w k represents the memory update of the scalar play. According to Proposition 6.1 and (66), (68) we have We compute a Newton derivative for ψ k , k ≥ 1. On a plus interval, we have By Proposition 4.7, a Newton derivative for the inner part is given by the linear and continuous mappings where µ ranges over all probability measures µ whose support is contained in v} .
The outer maximum in (75) is just the positive part function x → x + on R. It is well known that the Heaviside function, given by H(x) = 1 for x > 0 and H(x) = 0 otherwise, is a Newton derivative for the positive part function. Analogously, the characteristic function of (−r, r) is a Newton derivative for the function π r . Since the assumptions of the chain rule 3.3 are satisfied for the mappings involved here, the considerations above yield Newton derivatives of the functions ψ k as follows.
✷ Lemma 7.2. On a plus interval, a Newton derivative of ψ k , 1 ≤ k ≤ N , is given by where the elements L ∈ Ψ k (v, p) have the form and with M k (v) from (76). This Newton derivative is bounded on (X ∩ U δ ) × R. ✷ Corresponding considerations for ψ − instead of ψ + lead to the Newton derivative of ψ k on minus intervals. Lemma 7.3. On a minus interval, a Newton derivative of ψ k , 1 ≤ k ≤ N , is given by where the elements L ∈ Ψ k (v, p) have the form and This Newton derivative is bounded on (X ∩ U δ ) × R. ✷ We obtain a Newton derivative of w k by applying the chain rule in (73).
Lemma 7.4. A Newton derivative of w k , 1 ≤ k ≤ N , is given by where the elements and L ∈ Ψ k (v, y 0 ) has the form given in Lemma 7.2 or Lemma 7.3, respectively. This Newton derivative is bounded on (X ∩ U δ ) × R. ✷ In order to obtain a Newton derivative of the play, we define F k : depending on whether [t k−1 , t k ] is a plus or a minus interval. A Newton derivative for F k is given in the following lemma.
Lemma 7.5. On a plus interval, a Newton derivative of F k : (X ∩U δ )×R → L q (t k−1 , t k ), 1 ≤ k ≤ N , is given by where the elements L ∈ Z k (v, p) have the form Here, µ : where On a minus interval, the same is true, with (90) replaced by and (91) replaced by These Newton derivatives are bounded on (X ∩ U δ ) × R. ✷ Proof. On a plus interval, we have By Proposition 5.5, a Newton derivative for the inner part is given by the linear and continuous mappings where µ ranges over the set of measures having the asserted properties. The outer maximum in (95) corresponds to the positive part operator f → f + for real-valued functions. Considered as an operator from Lq(t k−1 , t k ) to L q (t k−1 , t k ) withq > q, it has as a single-valued Newton derivative the mapping defined by L f (h)(t) = H(f (t))h(t) for f ∈ Lq, H being the Heaviside function; see Theorem 3.49 in [Ulb11], Example 8.14 in [IK], or the original papers [HIK03,Ulb03]. Thus, applying Proposition 5.5 with range space Lq instead of L q , we may use the chain rule for the composition of the two max operations in (95). This yields the assertions for a plus interval. For a minus interval, the same modifications as used to prove Lemma 7.3 apply.
According to Proposition 6.1 we have, for all (v, From this formula we obtain a Newton derivative of the play operator. Theorem 7.6. Let X denote any one of the spaces C 0,α [a, b], 0 < α ≤ 1, or W 1,p (a, b), 1 < p ≤ ∞. Let u ∈ X, z 0 ∈ R, and let δ > 0 as well as the partition ∆ be chosen according to Proposition 6.1. Then for every r > 0 the play operator has a Newton derivative. A particular Newton derivative is given by with elements L Pr of G r (v, y 0 ) given by Here, L has the form given in Lemma 7.5, and L W k has the form given in Lemma 7.4. This Newton derivative is bounded on (X ∩ U δ ) × R.
Proof. This is a consequence of Lemma 7.4 and Lemma 7.5. Since there are only finitely many intervals [t k−1 , t k ], G r is bounded.
Since the stop operator is related to the play operator by the formula S r [u; z 0 ] = u − P r [u; z 0 ], it also has a Newton derivative. We use the notations of Theorem 7.6.
Corollary 7.7. The stop operator has a bounded Newton derivative with elements

Bouligand derivative of the play and the stop
The Bouligand derivative of the play and the stop is obtained in an analogous manner, now based on the chain rule from Proposition 3.2. We list the resulting formulas, using the notation from the previous section.

Refined Newton and Bouligand differentiability
As a result of investigating above the maximum and the accumulated maximum, we have seen that these operators satisfy a slightly stronger version of Newton and Bouligand differentiability. With regard to the former, for F : U → Y , U ⊂ X open, we have constructed a Newton derivative G : U ⇒ Y with a remainder estimate of the type sup L∈G(u+h) whereX is a normed space such that X ⊂X with continuous embedding. Having in mind applications to partial differential equations, we want to obtain explicitly a remainder estimate like (110) for the play operator. For this purpose, a corresponding generalization of the chain rule from Proposition 3.3 is needed.
Definition 9.1. Let X, Y andX be normed spaces with continuous embedding X →X, let F : for every h ∈ X with u + h ∈ U . ✷ Since ρ u ( h X ) ≤ ρ u (c h X ) for some constant c, every (X,X)-Newton derivative of F is also a Newton derivative of F as defined in 2.3.
The following proposition shows that the chain rule remains valid for the notion of an (X,X)-Newton derivative. It reduces to the standard one, Proposition 3.3, wheneverX = X andỸ = Y .
Proposition 9.2. Let X, Y, Z andX,Ỹ be normed spaces with continuous embeddings X ⊂X and Y ⊂Ỹ . Let U ⊂ X and V ⊂ Y be open. Let F 1 : U → Y with F 1 (U ) ⊂ V and F 2 : V → Z be given. Assume that F 1 and F 2 are locally Lipschitz and possess (X,X)-resp. (Y,Ỹ )-Newton derivatives G 1 : U ⇒ Y resp. G 2 : V ⇒ Z, let G 2 be locally bounded. Assume moreover that F 1 : (U, · X ) → (Y, · Ỹ ) is continuous. Then Since G 2 is locally bounded, there exists a C > 0 such that for sufficiently small h X we have L 2 ≤ C for all L 2 ∈ G 2 (F 1 (u + h)). Consequently, for all such h and L 2 , and for all L 1 ∈ G 1 (u + h) we have by assumption on F 1 Moreover, by assumption on F 2 Since F 1 is locally Lipschitz, k Y ≤ C 1 h X for small enough h X . Now let us defineρ : By assumption on F 1 ,ρ(λ) → 0 as λ → 0. Putting together the estimates obtained so far, we get independent from the choice of L 1 and L 2 , as long as h X is sufficiently small. Setting we have ρ(λ) → 0 as λ → 0. Thus, it follows from (117) The chain rule for the (X,X)-Bouligand derivative follows.
Proposition 9.3. Let X, Y, Z andX,Ỹ be normed spaces with continuous embeddings X ⊂X and Y ⊂Ỹ . Let U ⊂ X and V ⊂ Y be open. Let F 1 : U → Y with F 1 (U ) ⊂ V and F 2 : V → Z be given. Assume that F 1 and F 2 are locally Lipschitz and possess (X,X)-resp. (Y,Ỹ )-Bouligand derivatives. Assume moreover that F 1 : (U, · X ) → (Y, · Ỹ ) is continuous.
Proposition 9.4. (i) The Newton derivative G r of the play operator P r given in Theorem 7.6 has the following additional property. For every u ∈ X there exists ρ u : R + → R + , non-decreasing and bounded, with ρ u (δ) ↓ 0 as δ ↓ 0 such that holds for all h ∈ X, L Pr ∈ G r [u + h; z 0 + q] and γ ∈ (a, b]. Here we have set X γ = C 0,α [a, γ] resp. X γ = W 1,p [a, γ]. (ii) The Bouligand derivative of the play operator P r given in (109) has the following additional property.
holds for all h ∈ X and all γ ∈ (a, b].
Proof. It suffices to prove the result for the case γ = b. Indeed, due to the Volterra property of P r and L Pr , the expressions on the left side of (124) and (125), respectively, remain unchanged if we modify h on (γ, b]; therefore we obtain the assertions on [a, γ] from those on [a, b] by applying the latter to variations h · 1 [a,γ] instead of h. So, let γ = b. In the previous section, we have seen how the play operator can be represented locally as a composition of mappings which are linear or given by the maximum on an interval, the accumulated maximum on an interval, or the positive part on an interval. Thus, one has to check successively that one can apply the chain rule from Proposition 9.2 to the constructions in Lemma 7.2, Lemma 7.3, Lemma 7.4 and Lemma 7.5. The mapping ψ k , considered in Lemma 7.2 and Lemma 7.3, arises according to (75) from the composition a, b]. (A pair refers to the pairings in Definition 9.1, a single space stands for a pairing of this space with itself.) Analogously, according to (73) the mapping w k arises from the composition (X × (X × R),X × (X × R)) → (X × R,X × R) → R , and F k arises due to (95) from the composition (X × R,X × R) → (X × R,X × R) → Lq(a, b) × R → L q (a, b) .
In all those cases, one checks that the assumptions of Propositions 9.2 and 9.3 are satisfied.