Contact rate epidemic control of COVID-19: an equilibrium view

We consider the control of the COVID-19 pandemic through a standard SIR compartmental model. This control is induced by the aggregation of individuals' decisions to limit their social interactions: when the epidemic is ongoing, an individual can diminish his/her contact rate in order to avoid getting infected, but this effort comes at a social cost. If each individual lowers his/her contact rate, the epidemic vanishes faster, but the effort cost may be high. A Mean Field Nash equilibrium at the population level is formed, resulting in a lower effective transmission rate of the virus. We prove theoretically that equilibrium exists and compute it numerically. However, this equilibrium selects a sub-optimal solution in comparison to the societal optimum (a centralized decision respected fully by all individuals), meaning that the cost of anarchy is strictly positive. We provide numerical examples and a sensitivity analysis, as well as an extension to a SEIR compartmental model to account for the relatively long latent phase of the COVID-19 disease. In all the scenarii considered, the divergence between the individual and societal strategies happens both before the peak of the epidemic, due to individuals' fears, and after, when a significant propagation is still underway.


Introduction
As of April 10th 2020, almost half of world population is under strong restrictions (lockdown) enforced by local governments to limit the ongoing SARS-CoV-2 mediated COVID-19 pandemic. These restrictions mainly concern the reduction of social interactions. They are instantiated at different geographical scales (householdings, towns, states, countries...) and already had a significant worldwide economic impact. All individuals affected by epidemic control measures pay a cost in terms of health (reports show an under-reporting of severe illnesses such as hearth, ischemic strokes, ...), time, money, social interactions, psychological pressure (domestic violence increased), etc. However, until a sufficient proportion of the population becomes immune (by infection or vaccination), the choice of control measures together with an adequate tracking of the virus spread are key to limit the epidemic as well as the social and economic impacts of the contact rate decay. In such context, each individual needs to balance the individual and collective outcomes of his/her behavior, when choosing his/her level of social interaction with the population. For instance, he/she may be tempted to use the so-called "free lunch" strategy, i.e., take advantage of the low epidemic activity (due to others' efforts) while not contributing to the effort himself/herself. However, population benefits are at stake because the epidemic dynamics is induced by the aggregation of individual decisions. In practice, an equilibrium is formed between the individuals and we analyze in this work this equilibrium.
From the technical point of view, we study a "Mean Field Game". Introduced by Lasry and Lions in [38,39,40] and independently by Huang, Caines, and Malhamé [33,30,31,32], Mean Field Games focus on the derivation of a Nash equilibrium within a population containing an infinite number of individuals, see [16] for a complete mathematical description. Such asymptotic viewpoint simplifies the game theoretic analysis of the interactions among the population, as typically the impact of each individual over the entire population can be neglected. It is worth pointing out that the optimal strategy in a Mean Field game provides approximate Nash equilibrium for similar games involving a finite number of players (see [16]). Fields of applications of Mean Field games, mentioned in [28], include energy [24], finance [17,25], crowd modeling [3], and also epidemic dynamics [10,36,37,34,50]. These latter works focus on the impact of individual vaccination decisions on the dynamics of the epidemic. In this paper, we follow a similar approach but study the impact of individual decisions concerning distancing and isolation, in an epidemic dynamics where no vaccination is (yet) available.
Optimal control of epidemic is often modeled through a combination of isolation and vaccination strategies. Early work of [1,2] focuses on the optimal vaccination timing of individuals, while combined with cost free instantaneous isolation strategy. More realistic impacts and constraints on the quarantine and isolation strategies have been considered in [43,58] or [11,29]. The main objective of such policy is to control the reproduction number of the epidemic (see [45]) together with limiting the social and economic impact of such policy. An other part of the literature (see [26,49,48,23,22,13,57] for example) tries to model and to take into account the individual behavioral response to isolation policy in such context. Such modeling of individual response is of course greatly influenced by cultural habits together with societal, economic or religious need for social interactions. The addition of such individual based feedback effect on quarantine governmental policy is key in the modeling of optimal control for epidemic dynamics. The main outcome of this paper is to provide an easily tractable modeling approach for such a purpose.
We address in this work the question of epidemic control using an individual based modeling approach, and ask the question of how does the cost of a social distancing strategy disseminate among individuals. Each individual chooses optimally his/her contact rate with others, striking a balance between the cost of being infected by the virus, and the cost induced by reducing social interactions. On one hand, the epidemic dynamics is induced by the aggregation of all individual contact rate decisions. On the other hand, the epidemic level influences the contamination probability of each individual and hereby their own contact rate. We prove that a Mean Field Nash equilibrium is reached among the population and we quantify the impact of individual choice between epidemic risk and other unfavorable outcomes. Numerical experiments show that the self isolation equilibrium strategy is characterized by a quick contact rate reduction followed by a slower return to the usual social interaction level. It allows to reduce significantly the proportion of infected individuals required in order to achieve herd immunity.
We also compare the induced epidemic dynamics to a situation where a global planner (typically a strongly empowered government) is able to control all the interaction rates among individuals. We observe numerically that the Mean Field Nash equilibrium provides a sub-optimal contact rate strategy in comparison to the societal optimum, and measure the corresponding so-called cost of anarchy. We observe in particular that the divergence between the individual and societal strategies happens after the epidemic peak but while significant propagation is still underway. By taking into account the individual responses to different cost structures, our modeling approach can provide insights on the impact of political decisions concerning the costs induced by contact rate reduction.
The paper is organized as follows. We first present in Section 1 the SIR model used to describe the COVID-19 2020 epidemic dynamics, together with the control problem faced by each individual. We prove in section 2 the existence of a Mean Field Nash equilibrium in our setting and present a numerical algorithm for its approximation. Section 3 focuses on the numerical experiments and presents the computing strategy, and the results in terms of cost of anarchy and sensitivity of the optimal contact rate strategy to the main key modeling parameters. It provides in particular a comparison with a global planner setting. In order to take into account for the relatively long latent phase of the COVID-19 disease, we extend our study in Section 4 to an SEIR epidemic model, and present some numerical results. Section 5 provides the technical parts of the mathematical analysis associated to our study. Finally, Section 6 summarizes the main outputs of the paper and discusses their limits and potential extensions.
1 Individual based modeling of the epidemic dynamics 1

.1 The model
The dynamics of the epidemic is modeled by the standard SIR (Susceptible -Infected -Recovered) compartment model represented in Figure 1. We refer to [7,14,15] for additional details on this model, as well as the description of many other mathematical epidemic propagation models (and to [44,55,19,56] for specific coronavirus models). During the epidemic propagation, each individual can be either "Susceptible", "Infected" or "Recovered", and (S t , I t , R t ) denotes the proportion of each category at time t ≥ 0. The group of "Recovered" represents the collection of individuals whose behavior does not impact the transmission of the virus anymore. This includes the individuals who potentially did not survive the epidemic, as well as the individuals tested positive and isolated in a perfect quarantine. Besides, we hereby implicitly assume that the immunity acquired by the "Recovered" lasts for ever. As a first step, we choose to restrict our analysis to a simple SIR model rather than considering more complex (and realistic) models, in order to focus our analysis on the role of individual decisions aggregation on the epidemic dynamics. Nevertheless, our study can be extend to more complicated epidemic models. In particular, an extension to the SEIR compartment model is presented in Section 4.

Susceptible
Infected Recovered −βSIdt −γIdt More precisely, the (continuous-time) evolution of the disease is described by the following equations: with a given initial compartmental distribution of individuals at time 0 denoted (S 0 , I 0 , R 0 ), which is supposed to be known. We assume that initially S 0 +I 0 +R 0 = 1, and observe that the system (1) possesses a conservation law, i.e., for all t ≥ 0, S t + I t + R t = 1. 1 As the dynamics of the epidemic is not impacted by the size of the class R, we can restrict our focus on the evolution of the proportion (S, I) of both susceptible and infected. The model described by (1) involves two parameters, γ andβ. The constant parameter γ > 0 is exogenous and identifies the recovery rate, i.e., it represents the inverse of the lenght (in days) of the contagious period (or time before strict isolation). In our framework, the key parameter is the transmission rate of the disease, denoted (β t ) t≥0 , which is considered to be endogenous and time-dependent, contrary to more classical SIR models. To focus our study on this transmission rate, we assume that the parameter γ is fixed and known.
The transmission rateβ depends essentially on two factors: the disease characteristics and the contact rate within the population. We will denote by β 0 the constant initial transmission rate of the disease, i.e., without any control measures or effort from the population. Although the society cannot modify the disease characteristics, it can produce (possibly strong) incentives to each individual to reduce his/her contact rate with other individuals in the population. This mitigation strategy (lockdown) has been chosen by many countries during the Covid-19 epidemic propagation, some countries starting strict measures in January (China) while other countries in February, March and so on; this is implemented in order to slow down the epidemic propagation. However, for each individual, reducing the contact rate with others comes at a cost, as described in the introduction (health hazards, psychological pressures, loss of social relationships, income uncertainty...) and it may even expose the individual to yet unknown risks. The latter is especially true for long epidemics that require substantial efforts from the individuals. On the other hand, if everybody else lowers his/her own contact rate then the epidemic diffusion will be vanishing and a given individual, having this knowledge, can act as a free-rider and effectively increase his/her contact rate without too much risk.
Finally, the global epidemics propagation rateβ of the society follows from the aggregation of all individuals contact rates, so that an equilibrium is reached between all contact rates β chosen by each individual and the overall epidemic transmission rateβ in the society.

Individual viewpoint: contact rate optimization
We focus in this section on the individual perspective on the epidemic control. Consistent with the literature, we suppose that the population can be partitioned into several collections of identical individuals (here the "Susceptible", "Infected and infectious" and "Removed" classes), so that each individual in a given collection takes the same decisions as other individuals in the same class. This implies that we can consider a "representative" individual, which is an arbitrary individual in a given class. As we are interested in the unfold of the COVID-19 pandemic (as opposed to an endemic disease such as measles, mumps, rubella, pertussis, ...) we will consider a finite time horizon T > 0, large enough so that the epidemic is over at time T . 2 The probability of infection of the representative individual depends on both his/her own contact rate β with the population, together with the proportion I t of infectious individuals in the population. More precisely, let τ denote the random infection time of the representative individual. His/Her probability P β t = P(τ ≤ t) of being infected before time t, when using contact rate β, has the following dynamics (see [36,37]): where I t is the proportion of (contagious) infected at time t, whose dynamics is driven by the population contact rateβ (as described in (1)). We assume that, while in the Susceptible class, i.e., before the (possible) infection time τ , the representative individual can choose his/her contact rate β t ∈ [β min , β 0 ] (for all t ≤ τ ). Recall that β 0 represents the transmission rate of the disease without any control measures, while β min > 0 is the lowest possible contact rate value that can be attained by the representative individual. The efforts of an individual for decreasing social interactions, in order to lower the transmission rate of the virus from β 0 to some β ∈ [β min , β 0 ], induce an instantaneous cost, represented by a decreasing function c : [β min , β 0 ] → R +3 .
Following [36], we assume that the global cost (seen from time t = 0) of an individual, while infected at time τ and choosing a dynamic contact rate (β t ) t , is defined by: where τ ∧ T denotes for the minimum between τ and T . More precisely, when reducing his/her social interactions on the time interval [0, T ], the representative individual faces a global cost given by the sum of: (i) a cost for reducing social interactions until being infected, represented by: meaning that, if the individual is not infected during the period [0, T ], his/her efforts are costly until T . On the contrary, if the individual is infected before T , his/her effort are only costly before τ .
(ii) a cost incurred at the infection time τ , defined by r I 1 τ ≤T , where r I denotes the unitary cost of infection as quantified by the representative individual, together with any other costs that the presence in the "Infected" class implies. Note that r I is taken here as a constant which does not depend on the pandemic evolution neither on the individual control measures that are specific to the "Infected" class.
We do not specify here the nature of these previous costs, both have health components but may also include other types of costs. We refer the reader to the literature on the QALY/DALY scales for further details [59,5,51]. Hence, the expected cost of the representative individual, seen from time t = 0, when using individual contact rate β while the population contact rate isβ, is given by the following: where the distribution of τ is given by (2) and thus indirectly driven byβ.
In order to determine his/her optimal contact rate β * while the population rate driving (I t ) t isβ, the representative individual faces the following minimization problem: where the set B of admissible contact rate strategies is defined by:

Societal viewpoint: induced epidemic dynamics
In the previous subsection, we described the situation from the point of view of a representative individual. We assumed that each individual chooses his/her own contact rate β in order to minimize the associated cost C(β,β), whileβ represents the population's contact rate. In our (Mean Field games) framework, one individual taken alone has no influence on the pandemic dynamics when choosing his/her contact rate. However, the global epidemic propagation rateβ of the society is induced by the aggregation of all individuals contact rates. Besides, the behavior of the representative individual described in the previous section, is the one used by all individuals assumed to be identical. Hence, following the prescriptions of the Mean Field games framework, an equilibrium is sought between the contact rate β chosen by each individual and the overall epidemic transmission rateβ observed at the society level. More precisely, we look for an equilibrium among all individuals in the population, in the sense of the following definition. (i) Individual rationality: it is optimal for the representative individual to choose the contact rate β = β when the epidemic dynamics is (S , I , R ); (ii) Population consistency: whenever the population contact rate isβ = β , the induced epidemic dynamics is given by (S , I , R ).
Identifying such an equilibrium boils down to a fixed point property of the optimal best response function of the representative individual as described in the following Section. 5 2 Mean Field Nash equilibrium and Numerical approximation

Mean Field Nash equilibrium
As emphasized in the previous section, the contact rate chosen by each individual is impacted by the epidemic dynamics through the proportion of infected (I t ) t , while the number of infected is a direct consequence of the aggregation of all individuals' behavior. We thus look for an equilibrium in such context, i.e., in the sense of Definition 1.1. In fact, finding a Mean-Field Nash equilibrium is equivalent to identifying a fixed point of the so-called best response function, which provides an optimal individual contact rate β * ∈ B (or a set of optimal strategies if non-uniqueness of the optimal strategy) in response to a population contact rateβ.
We therefore introduce the best response function T , which associates to any societal transmission ratē β ∈ B, the optimal individual contact rate β : As mentioned earlier, although we expect to find at least one fixed point to this application, in general T is a multi-valued mapping. We make the following assumption: Assumption 2.1. The cost function c is decreasing, two times differentiable with continuous second derivative (i.e., of C 2 class) and the following holds: Most of the results that follow also hold under much less demanding assumptions, but at the price of increasing technicalities. We first state the theoretical result concerning the existence of a unique best response strategy in T (β) for a givenβ ∈ B.
We are now in position to state the main mathematical result of the paper, concerning the existence of a Mean Field Nash equilibrium.
Theorem 2.2. Under Assumption 2.1, the model (1) admits a Mean Field Nash equilibrium β ∈ B together with (S β , I β , R β ), in the sense of Definition 1.1, i.e., for any β ∈ B, Note that the result of Theorem 2.2 only informs on the existence of an equilibrium and not to its possible uniqueness. The uniqueness of a Mean Field equilibrium has been recognized as a difficult problem from the very beginning of the MFG theory (see counter-examples in [38, remark after Theorem 2.2]) and very little theoretical guidance exists to support such a claim. Among the methods that can allow to obtain uniqueness one can list the monotony assumptions (see [40,38,39]) or the evolution dynamics (see [54]) but none seems to apply directly here. Moreover, with few exceptions, the epidemic Mean Field equilibrium has rarely been proved to possess systematic uniqueness properties. For sake of clarity, the proofs of both Lemma 2.1 and Theorem 2.2 are postponed to Section 5.4.

Numerical approach
We describe below the methodology used in the numerical experiments in order to approximate the Mean Field Nash equilibrium β . Recall that the cost for the representative individual when choosing a contact rate β while the societal transmission rate isβ, is given by C(β,β).
The metric equilibrium flow approach introduced in [54] (to which we refer the reader for rigorous mathematical transcription of the objects below, see also [50] for an application) prescribes the following iterative procedure in order to reach an equilibrium: choose a pseudo-time step h > 0 and define iteratively β n+1 as a minimizer of the following functional Here, d(·, ·) is a geodesic distance on the space B of all possible individual choices β. Note that when B is a Hilbert space, C(·, ·) is smooth enough, and d(·, ·) is the distance induced by the canonical norm, the fact that β n+1 is a minimizer of the functional in (12) implies: where ∇ 1 denotes for the derivative with respect to the first argument. Relation (13) is similar to the JKO scheme used in one variable gradient flows, see [4]. The problem with (13) is that it is implicit and thus not compatible, in this form, with numerical computation. In practice, when the vectorial structure on B is compatible with the topological structure, one can propose an iterative procedure to find β n+1 solution of (13): start with β n+1,0 = β n and iterate for ≥ 0 It is standard to see that for h small enough and in presence of (e.g. Lipschitz) regularity of ∇ 1 C with respect to its first argument, the iterations are guaranteed to converge, by a Picard fixed point argument, to the (unique) solution of (13). However, for numerical convenience, in practice only L iterations of (14) are performed, and for our numerical simulations, L = 1 worked just fine. Therefore, we implement the following proxy for the minimization in (12): When B is a Hilbert space and forgetting any possible regularity issues, we obtain an Explicit Euler discretization of the equilibrium flow defined in [54]. Thus, we can expect that lim n→∞ β n will be the equilibrium we want to compute.

Choice of parameters
The following numerical experiments are done using a contact rate reduction cost function defined by: Recall that the parameter β min represents the minimal achievable contact rate by a representative individual, while β 0 denotes the usual contact rate used before the beginning of the lockdown measures. In other words, β 0 represents the transmission rate of the disease without any isolation effort of the population. The shape of the cost function encompasses the increasing difficulty to bring the contact rate closer to zero. Conversely, without effort of the individual at time t, meaning that β t = β 0 , the associated cost c(β t ) is equal to zero. In addition, note that this function c satisfies Assumption 2.1.
The set of parameters used in the experiments are provided in Table 1. The associated reproduction number R 0 without isolation measure, commonly defined by R 0 := β 0 /γ in the literature on epidemic models, is equal to 2.0 in our framework, and is thus in the confidence interval of available data [42]. The parameter γ corresponds to the inverse of the virus contagious period see [6]. We assume that the initial proportion of infected I 0 in the overall population is 1% at time 0, when the contact rate optimization starts. We set the cost r I incurred by an infected individual to r I = 300. We recall that this cost is not necessarily expressed in terms of money, but can also be medical side effects or general morbidity (see [59,5,51] for an introduction on QALY/DALY) and is relative to the definition of c(·).
(0.99, 0.01, 0.00) 1/10 0.20 0.05 0.14 360 days 300 Finally, recall that there is considerable uncertainty in the medical literature on the choice of all parameters described above, so the sensibility to the values chosen has been tested in Section 3.5.

Mean Field Nash equilibrium
The numerical approximation of the Mean Field Nash equilibrium is obtained using the algorithm described in Subsection 2.2. The initial guess β 0 is taken constant β 0 (t) = β 0 , ∀t ≥ 0.  Table 1. Left: global cost C in terms of the learning time. Right: Mean Field Nash equilibrium contact rate β (magenta line), identical to the contact rate at the penultimate step of the algorithm (green dashed line) and compared to β 0 (red line).
We plot in Figure 2 the convergence towards the minimal cost; the decay of the cost is very fast in terms of the learning time h. The contact rate at equilibrium is represented in the right plot, while the empirical convergence of the algorithm is confirmed by its superposition with the approximate contact rate computed at the previous step. At this numerical Mean Field Nash equilibrium, the cost function C(β , β ) is 211.33 and provides for each individual a relative gain of 12% in comparison to the zero effort strategy β 0 . In addition, only 85% of this cost is explained in terms of the wealth impact (instead of 100% for the strategy β 0 ), with the remaining 15% being related to the cost of social effort; the probability of being infected over the interval [0, T ] decreases from 80% in the 0-effort benchmark scenario, to 62%, see Figure 3. Note that in this case, the minimum attainable infection probability is 50% (as calculated from [35,Lemma A.1] or from equation (1)).
The equilibrium contact rate β is characterized by three major phases, in response to the Mean Field Nash equilibrium epidemic dynamics presented in Figure 3. First, at the beginning of the epidemic, the number I of infected people is relatively low. Individuals make therefore no effort in order to reduce their social interactions and the virus is transmitted at the normal rate β 0 . This leads to a large augmentation in the proportion of Infected, implying a significant increase of the individual's probability of contracting the virus. In response to this, individuals begin to reduce significantly their social interactions, implying a strong reduction of the transmission rate of the disease. Finally, after the epidemic peak, all individuals slowly reduce their effort until the number of infected people is relatively close to 0.   Table 1. Solid lines represent the evolution at the Mean Field Nash equilibrium, i.e., with transmission rate β . The evolution at equilibrium is compared to the epidemic dynamics with constant transmission rate β 0 (dashed lines). Figure 3 provides a comparison between the SIR dynamics of the Mean Field Nash equilibrium (solid lines), and the one generated by the no-effort strategy β 0 (dashed lines). Even if driven by self-interest, individual efforts do reduce social interactions population-wise. This can be explained as follows. First, we observe that the proportion R T of recovered at terminal date drops from 80% to 60% when the population is applying the Mean Field Nash equilibrium strategy. This means that the proportion of the population spared by the virus goes from 20% in the case without effort to 40% at equilibrium. Second, one can observe that the infection peak occurring around t = 50 days is twice less critical at equilibrium, but, as a counterpart, the epidemic lasts longer as the number of infected decreases more slowly after the epidemic peak. As the infection peak is less critical, it limits, and may even prevent, the saturation of the healthcare facilities. Although not represented in our model, this necessarily implies a decrease in the mortality rate of the virus.

Impulse control equilibrium
We now focus on the situation where the set of admissible strategies is restricted to a subset of B, denoted bȳ B, of particular piece-wise constant strategies: where we took β = 0.14 as described in Table 1. This framework encompasses the realistic situation where the instantaneous contact rate of each individual can not be, in practice, chosen within the whole set B and has to be restricted to a unique control period. The representative individual will optimally select the times t 1 and t 2 , representing respectively the beginning and the end of his/her lockdown period.

Proportion of Recovered
Nash in Nash in Figure 4: Comparison between Mean Field Nash equilibria in B (plain lines) andB (dashed line), using the parameters described in Table 1. Figure 4 compared the Mean Field Nash equilibria obtained over both setsB and B. The equilibrium strategy overB starts isolation measures at time t 1 = 20 by decreasing the contact rate from β 0 = 0.20 to β = 0.14. The duration of the lockdown is 81 days, after which the contact rate is immediately returning to normal, i.e., β 0 . The cost induced for each individual is around 210.78, which is lower than the one around 211.33 associated to the equilibrium over B. Moreover, we observe that the induced SIR dynamics provides a lower epidemic size R T together with a lower proportion of infected at the epidemic peak, hereby reducing the potential mortality rate induced by the virus. This observation enlightens as well how the Mean Field Nash equilibrium β in B is not optimal for the society as a whole, as it can be improved by restricting the set of admissible strategies tō B. This observation leads us to look towards the optimal societal contact rate for the population as a whole.

Cost of anarchy
In our previous equilibrium analysis, each individual is considered to be too small in order to impact the epidemic dynamics of the society and can hence acts in a selfish manner: each individual minimizes his own cost C(·,β) in response to the transmission rateβ of the epidemic. On the other hand, a global planner, e.g., a government with full empowerment, will optimize the global cost of the entire society with respect to the choice of the transmission rate in the society. Namely, the global planner will solve: This leads to a different optimization problem, which is well documented in the literature, see for example [52,11,29,20], and even more largely on the topic of vaccination (see, e.g., [2,43,46,35]).

Proportion of Recovered
Nash equilibrium Societal optimum Figure 5: Comparison between two strategies: the Mean Field Nash equilibrium contact rate β ∈ B (solid lines) and the optimal transmission rate for the society (dashed line), using the parameters described in Table  1.
In our framework, we can compute, through classical optimization procedures (e.g., the Pontryagin principle), the optimal control of the transmission rate from the society point of view. We refer to the literature previously cited for the details on the techniques allowing to find this optimal control and we only present here the numerical results. Note that in such context there are several additional, classical, procedures available to compute the optimal control (see for instance the forward-backward sweep method in [41,Chapter 4]). Nevertheless, in our framework, a slight modification of the previously implemented gradient descent detailed in Subsection 2.2 works just fine.
First, observe that the societal optimal transmission rate imposes larger effort at the beginning of the control period, and relieves these constraints much more slowly. The total control duration period for the societal optimum is 151 days in comparison to 96 days for the Mean Field Nash equilibrium, although societal control begins later than in the case of the Nash. Secondly, observe that the optimal transmission rate chosen by the global planner accentuates the already encouraging results obtained with the Mean Field Nash equilibrium on the epidemic dynamics: the epidemic size R T represents only 55% of the total population.
The societal optimum allows to reach an individual cost around 200.25, while the Mean Field Nash equilibrium provides a cost valued around 211.33. This phenomenon allows to mathematically quantify the so-called "cost of anarchy", induced when letting each individual decide on his/her own, instead of letting a global planner take decisions for the population as a whole. Of course, the societal optimal strategy is not a Mean Field Nash equilibrium: given this optimal transition rate for the society, each individual is tempted to make less effort in reducing his/her own contact rate which will drive away the global rate towards the Mean Field equilibrium.

Proportion of Recovered
Societal optimum in Societal optimum in Figure 6: Comparison between two strategies: the societal optimum contact rate in B (solid lines) and the societal optimum contact rate inB, using the parameters described in Table 1.
We also compare in Figure 6 the societal optimal transmission rate in B to the optimal one inB, where only one strong effort period is allowed. The societal-wide optimum control period inB starts at time t = 23 and lasts 111 days. This induced cost is around 2% higher than the one induced by the optimal social diffusion rate. This may allow to quantify the importance of going through a progressive lock-out strategy.
Finally, Figure 7 provides a closer look on the four aforementioned strategies, focusing on the time interval [10,150], in order to highlight the control (lockdown) and control-less (lockout) properties. One can observe that for both Mean Field Nash equilibrium strategies, the control period starts and ends earlier in comparison to the societal optimum situation. Indeed, on one hand, individuals engage in preventive measures by decreasing their interactions earlier than a global planner would recommend, due to fear of the infection spreading. On the other hand, they release their efforts just after the peak of infection, whereas a global planner would recommend maintaining a relatively high level of effort in order to avoid further spreading of the virus. Therefore, while the Nash equilibrium of individuals allows, through premature efforts, to decrease the peak of infection, the socially optimal strategy allows a more rapid decrease in the proportion of infected after the peak, by maintaining intense efforts. Nevertheless, we shall remind that the global planner should also take into account the possible saturation of the healthcare system, leading to a lower epidemic peak induced by the optimal societal transmission rate.

Proportion of Infected
Nash in Nash in Social optimum in Social optimum in Figure 7: Comparison between the transmission rate and the dynamic proportion of infected on the time interval [10,150], for the four aforementioned strategies: the Mean Field Nash equilibrium in B (solid magenta lines) andB (dashed red lines), the societal optimum in B (dotted green lines) andB (dash-dot blue lines), using the parameters described in Table 1.

Sensitivity to parameters
The parameters used in the previous numerical experiments are provided in Table 1 above. Nevertheless, as the current medical literature still shows considerable uncertainty concerning the numerical values for those parameters, we tested the sensitivity of our findings with respect to the choice of the main parameters of the model. The corresponding figures are provided in Appendix A for better readability.
(i) Figure 9 provides the Nash equilibrium and the epidemic dynamics for three different values for r I , describing the relative effects of the two parts of the costs. As expected, the more costly is the infection for an individual, the more efforts he/she will do in order to limit his social interactions, in order to decrease his/her probability of infection. In particular, reducing the sanitary cost r I from 350 to 250 implies a 20% decrease on the level of the epidemic peak.
(ii) Figure 11 provides a similar study for three different values of the reproducing number R 0 . For higher R 0 , individuals make more effort in order to reduce their social interactions. Bringing R 0 from 2.5 to 1.8 reduces the epidemic peak by 40% together with reducing the size R T of the epidemic from 70% to around 60%. This result is perfectly understandable since the higher R 0 is, the higher the probability of being infected without effort is. Each individual limits his social interaction in order to decrease the probability of being infected, and thus diminishes the wealth impact of the epidemic.
(iii) Finally, Figure 10 studies the sensibility of our findings with respect to the initial proportion I 0 of infected at time 0. A higher I 0 induces an earlier beginning of the control period, together with a stronger efficacy of it. At terminal date, the total proportion of susceptible remains similar. This implies that the long term effects of a late detection of the epidemic can be compensated by a stronger isolation equilibrium policy. Once again, we omit here the possible negative outcomes induced by the saturation of the health care system.

The SEIR model and application to COVID-19
It should be noted that the COVID-19 disease is characterized by a relatively long latency phase (as well as many other complex dynamics). To account for this latent phase, where infected individuals are not yet contagious, we can extend our reasoning to an SEIR model, where the class E represents the individuals infected but not yet infectious. The dynamics of the SEIR model are described by the following system: whereβ still represents the societal transmission rate, and α > 0 is a parameter specific to the SEIR model, representing the rate at which an exposed person becomes infectious. The average incubation period is therefore given by 1/α. The computations provided in Section 5 still hold for the SEIR model. Therefore, the application of the numerical scheme described in Section 2.2 is straightforward, and only the numerical results are presented here. The parameters used for the numerical simulations are those provided in [8,21], and are described in the following table. Other parameters given in Table 1   The numerical results are provided in Figure 8 and have the same features as for the SIR model. More precisely, the proportion of the population spared by the virus goes from less than 20% in the case without effort to 40% at equilibrium. Moreover, the infection peak occurring around t = 50 days is three times less critical at equilibrium. This result would limit, and may even prevent, the saturation of the healthcare facilities, and thus implies a decrease in the mortality rate of the virus. However, as a counterpart, the epidemic lasts longer as the proportion of exposed and infected individuals decreases more slowly after the epidemic peak. Finally, and similarly as for the SIR model, the cost of anarchy exists. In particular, the societal optimal transmission rate requires a greater and longer-lasting effort, even if it starts a little later. The optimal transmission rate at the societal level thus improves the rate of recovery from 60% to 55%. However, since the effort begins later, the infection peak is higher than for the Nash equilibrium.

Proportion of Recovered
Initial Nash equilibrium Societal optimum Figure 8: Transmission rate and induced evolution of Susceptible, Exposed, Infected and Recover classes, using the parameters described in Table 2. Solid lines represent the transmission rate and the epidemic evolution at the Mean Field Nash equilibrium, while dashed lines model the societal equilibrium. These two equilibria are compared to the epidemic dynamics with constant transmission rate β 0 (dotted lines).

15
In this section, we detail the computations and proofs when the epidemic is modeled by a SIR. Nevertheless, these computations still hold if we consider instead an SEIR model, as mentioned in Section 4. More precisely, since the individual is considered infected as soon as he/she enters class E, the probability of being infected in the considered time interval [0, T ] remains unchanged. In particular, the cost satisfies the same formula (27) as for the SIR model, as well as the gradient formula (30).

Probability of infection
We introduce (Sβ t , Iβ t , Rβ t ) in order to denote, at time t ≥ 0, the solution of the system (1) with contact ratē β. For all t ∈ [0, +∞), we denote by ϕβ t (β) the probability that infection occurs before time t for an individual choosing his/her own contact rate β, while the epidemic evolves according the population's transmission ratē β. Note that ϕβ(β) is redundant with the notation P β but emphasizes the dependence inβ of the distribution function of the random infection time τ .
Lemma 5.1. The probability of being infected before time t ≥ 0, for an individual choosing a contact rate β ∈ B, and when the proportion of infected is Iβ, is equal to: To compute this probability, we follow [36,37]. The Markov chain of an individual, who chooses a contact rate β ∈ B with infected individuals, and whose state at time t ∈ [0, T ] is denoted by M t , is described in terms of the following passage probabilities: P M t+∆t = R M t = I = γIβ t ∆t + o(∆t), The probability of being infected before time t + ∆t can be written as follows:

Computation of the cost
Recall that given a finite time horizon T , the expected cost of an individual is defined by (5). In other words, we have: Since the cumulative distribution function of the random variable τ at time t corresponds to the individual's probability of being infected before time t, which is denoted by ϕβ t (β), we obtain: Moreover, using Equation (20), we can also write:

Gradient of the cost
In order to obtain the gradient of this cost, we have to compute the Gateau derivative D h C of C with respect to the first variable β in the direction h. Using Equation (26), we obtain: Some preliminary computations of the Gateaux derivatives allow to write: Replacing in (28), we therefore obtain the following relation: where and Equation (30) is used for numerical simulation, in particular in the equilibrium flow descent described in Subsection 2.2.

Proof of the existence of an equilibrium
First we recall that Equation (1) has a unique solution for anyβ ∈ B (see for instance [12,18,53]). In particular one can prove that ifβ n is a sequence of functions in B converging in L 1 (thus also in L 2 ) to someβ ∞ then the corresponding solution (Sβ n t , Iβ n t , Rβ n t ) of (1) converges (for any given t) to (Sβ ∞ t , Iβ ∞ t , Rβ ∞ t ). Moreover for any β ∈ B the solution (Sβ t , Iβ t , Rβ t ) is a Lipschitz function of time with Lipschitz constant L S ≤ β 0 + γ.
We start with the proof of Lemma 2.1.
Proof. To this end, we consider thatβ is fixed. Define the value function at time t (compare with formula (27)): This corresponds to the optimal cost of an individual starting at time t in the susceptible class. By invoking standard arguments (see [9]) one can show that Π t is the (unique) solution of the following Hamilton-Jacobi-Bellman equation: where Under Assumption 2.1, the minimization in (34) is straightforward to analyze (and in general is related to the Fenchel transform of c(·)). Moreover the value y * realizing the minimum is unique and the following mapping, x → β opt (x, Iβ t ) = y * = arg min is Lipschitz in both arguments with a constant L H valid for all x ∈ [0, r I ] and Iβ t ∈ [0, 1]. Since Iβ t is Lipschitz (in particular continuous) we obtain that β opt (Π t , Iβ t ) is continuous with respect to time and thus Π t is a classical solution of (34). It is a Lipschitz function of time with Lipschitz constant L Π ≤ c(β 0 ) + r I β 0 .
We now prove Theorem 2.2.
Proof. The proof consists of applying Schauder Theorem (see for example [60, Theorem 1.C] or [47,Theorem 18.20]) to the mapping T , in order to prove that it has a fixed point. To this end, we define a subset D ⊂ B consisting of Lipschitz functions with constant L H (L Π + β 0 + γ). Obviously D is a compact set in L 2 (0, T ). Previous considerations show that for anyβ, the corresponding optimal individual choice β opt (Π t , Iβ t ) is in D.
Moreover D is a compact subset of L 2 (0, T ). For any sequenceβ n in B converging in L 2 to someβ ∞ we have that on the one hand Iβ In this work, we consider, within both a SIR and SEIR models, the question of the COVID-19 control measures as seen from the point of view of the individuals. We assume that each individual can choose to decrease his/her social interactions in order to slow down the spread of the virus. This means that the transmission rate, usually constant and exogenous in standard SIR models, is endogenous and time-dependent. Individuals can choose to lower their contact rate, which will allow them to minimize the likeliness of being infected, but this comes at a cost. The impact on the overall epidemic unfold from a single individual is negligible but the aggregating behavior of all individuals determine the epidemic evolution. This is formalized through a Mean Field approach.
We prove the existence of a Mean Field Nash equilibrium, i.e., a contact rate such that no individual has an interest in choosing a contact rate different from the overall societal contact rate. We then perform numerical simulations in order to find this equilibrium, which seems to be unique for the tested cases. The transmission rate of the disease induced by the equilibrium allows a clear improvement in the evolution of the epidemic, compared to the evolution with the initial transmission rate β 0 , which corresponds to the transmission rate of the disease without any effort of the population. In particular, despite the selfishness of individuals (who only seek to minimize their own cost), their efforts make possible to reduce the number of people affected by the disease by 25%. In addition, the infection peak is less critical, which limits, and may even prevent, the saturation of the health care system and a corresponding decrease in the mortality rate of the virus.
However, the Nash equilibrium is not the best that can be achieved if compared with a situation where individuals are 100% altruistic and only see the good of the society as a whole. To quantify this difference, we compute the societal-wide optimum to compare the two strategies. We observe that the divergence between the two strategies arrives both before and after the peak of the epidemic. More precisely, the Nash equilibrium allows, through premature efforts, to decrease the peak of infected, but the centralized epidemic control implies a more rapid decrease in the proportion infected after the peak, by maintaining intense efforts. As a consequence, there is a cost of anarchy, meaning that the Nash equilibrium induces a higher cost than the societal optimum. While stopping early, at a point where the epidemic decreases but is not yet over, may be intuitively explained by the selfish nature of individuals, the fact that they decide collectively to begin efforts early than the societal optimum is a more intriguing feature. The latter fact is consistent with the experimental results documented in the remarkable work [27].
Of course, any model is but an imperfect description of the reality and there are some limitations of our model too. First of all, as there is no consensus in the medical literature on the parameters of the epidemic, this impacts our model too. Indeed, the recent literature on the COVID-19 epidemic is abundant but discordant on the dynamics of the epidemic, in particular on the parameters γ and R 0 := β 0 /γ. Moreover, the appropriate estimation of the cost of effort, denoted by the function c, and its counterpart r I are still object of research, as they are not necessarily monetary or economic costs, but often costs related to health and social interactions. The modeling of these costs needs to be related to the concept of QALY/DALY, see [59,5,51].
To account for the long latency phase of the disease induced by the COVID-19, we extend our reasoning from a SIR model to a SEIR. This choice can also be discussed since the disease induced by the COVID-19 has many other complex dynamics (for e.g. large number of asymptomatic carriers, see [44,55,19] for alternatives already used in coronavirus epidemics). Our model could also be extended to take into account other ways to control an epidemic, such as vaccination for example, even if no vaccine is known to date. Moreover we assume that all individuals are rational, identical, and that they have a perfect knowledge of the epidemic dynamics. The model could also be extended to a add some heterogeneity between individuals, since the epidemic does not affect individuals in the same way, and is more costly for those at risk. All these remain for future work.
The study of the cost of anarchy raises the question of incentives: what levers can healthcare authorities use to bring the Nash equilibrium closer to the societal optimal? Indeed, many countries have introduced fines, or even prison sentences, in the event of failure to comply with lockdown. Our model can give details in this direction, and invites a finer assessment of the costs related to the epidemic, but also to the lockdown. A model with centralized control also opens up the question of modeling the interaction (or not) between countries with different levels of epidemic dynamics.

A Additional numerical results
The figures listed and described in Section 3.5 are grouped together in this appendix. The parameters are those described in Table 1