INFERRING KEY EPIDEMIOLOGICAL PARAMETERS AND TRANSMISSION DYNAMICS OF COVID-19 BASED ON A MODIFIED SEIR MODEL

This study aims to establish a model-based framework for inferring key transmission characteristics of the newly emerging outbreak of the coronavirus disease 2019 (COVID-19), especially the epidemic dynamics under quarantine conditions. Inspired by the shifting therapeutic levels and capacity at different stages of the COVID-19 pandemic, we propose a modified SEIR model with a two-phase removal rate of quarantined hosts undergoing continuously tunable transition. We employ the Markov Chain Monte Carlo (MCMC) approach for inferring and forecasting the epidemiological dynamics from the publicly available surveillance reports. The effectiveness of a short-term prediction is illustrated by adopting the data sets from 10 demographic regions including Chinese mainland and South Korea. In the surveillance period, the average R0 ranges from 1.74 to 3.28, and the median of the mean latent period does not exceed 10 days across the surveillance regions. Mathematics Subject Classification. 92D30. Received May 6, 2020. Accepted November 23, 2020.

to more than 16 million confirmed cases and 0.6 million deaths worldwide, which constitutes an unprecedented challenge to the global public health community. Under the increasingly severe situations of epidemic prevention and control in many countries over the world, the epidemiological modeling using surveillance data provides useful mathematical tools for identifying and predicting the disease outbreaks. Particularly important, among others, are the "Susceptible-Exposed-Infectious-Recovered" (SEIR) type models with consideration of the presymptomatic period of COVID-19. Wu et al. [15] employed the SEIR model to estimate the basic reproduction number of the outbreak in Wuhan, China. Prem et al. [8] considered a deterministic age-structured SEIR model with quarantine control included as a prevention strategy of contagion. Walker et al. [12] used an age-structured stochastic SEIR model to fit the key parameters of the observed transmission dynamics of COVID-19. Wan et al. [13] analyzed the spread dynamics and trend of COVID-19 in Wuhan using the data since the city's closure to February 12, 2020. A series of papers [9,10,14] developed the quarantined SEIR-type models by imposing restrictions on mobility of patients, with model parameters fitted to the data reported by China CDC using the Markov Chain Monte Carlo (MCMC) method. Under the SEIR framework, Lin et al. [4] adopted a similar compartmental model to capture individual behavioural reaction and governmental actions against COVID-19. Li et al. [3] introduced a networked SEIR-type model with migration data within China, and employed Bayesian inference to infer critical epidemiological characteristics.
Recent months have witnessed the role and importance of mathematical modeling in the decision-making of governments in response to the current pandemic. However, detecting and forecasting the spread of infectious diseases at early stages remain a challenging task due to the lack of model selection criteria or the high computational cost for parameter tuning. Here, we aim at providing a modified SEIR model with asymptomatic and quarantined classes for adaption to the newly emerged COVID-19, and propose a MCMC based method for inferring the key parameters and related transmission characteristics of the COVID-19 pandemic.

Methods
The data used to fit parameters in this paper are information for confirmed and removed cases in Chinese mainland (23 January to 12 February 2020), France (7 March to 12 April 2020), Germany (7 March to 1 April 2020), Iran (7 March to 31 March 2020), Italy (7 March to 12 April 2020), Japan (20 March to 12 April 2020), South Korea (16 February to 9 March 2020), Spain (7 March to 12 April 2020), United Kingdom (7 March to 12 April 2020), and the United States (7 March to 16 June 2020). Only the Chinese mainland and South Korea datasets are used in the main section to illustrate the results, but the other results are available in Table 2.
We first illustrate the SEIR-type dynamic model under quarantined measures that is based on the previous works [5,9,10,14], as shown in Figure 1. In particular, the infectious population is divided into four compartments: the unquarantined infectious individuals with symptoms (I), the undetected asymptomatic carriers (A), the quarantined infectious individuals with symptoms (I q ), and the detected asymptomatic carriers (A q ). Given the evidence showing that COVID-19 is contagious even during the incubation period of infection, we modify the previous SEIR models [5,9,10,14] as follows: where the notation is summarized in Table 1. Note that we exclude, without loss of generality, the quarantined susceptible individuals (S q ) from our SEIR model by assuming that they constitute a subpopulation isolated from the reaction mixture during the epidemic spread. Therefore we divide the total population into two parts: the quarantined susceptible population and an effective population. Here, we regard N e as the effective population size, which can be obtained using the fraction p of the total population during the epidemic outbreak, N e = pN total , where N total is the total population size. In our numerical simulations, the effective population size N e appears as a model parameter to be determined.
Specifically, we adopted a two-phase removal rate γ Iq (t) for the quarantined symptomatic population as follows: where z 1 and z 2 are parameters controlling both low and high levels of the removal rate, and a and b are parameters controlling the position and steepness of the transition between the low and high rates. An intuition behind this choice is as follows. The cure rate could be generally low at the early stage due to the novelty of the COVID-19 disease, gradually improved after a buffering period, and eventually saturated as the pandemic crisis unfolds. Our numerical results show that this heuristic removal rate significantly improves the reproduction of observed surveillance data and enhances the prediction of the COVID-19 outbreaks. We note that the prediction accuracy does not sensibly depend on the specific form of γ Iq (t), and other conceivable protocols (e.g., sigmoid function) also generate similar results.
To infer the key transmission characteristics of COVID-19 including the effective reproduction number, the asymptomatic ratio, the mean latent period, and the inflection points, we fit the parameters of model (2.2) using the surveillance data collected from China CDC (http://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm) and Johns Hopkins University (https://github.com/CSSEGISandData/COVID-19). Specifically, denote byĨ q (t) andR 1 (t) the number of confirmed and removed cases of COVID-19 on the day t in the surveillance period, and the least squares loss function for optimization of our SEIR model is then written as follows: where T is the surveillance period available for inference of our model parameters,Î q (t; Θ),Â q (t; Θ) and R 1 (t; Θ) denotes the corresponding numerical solution to equation (2.2) during the surveillance period using the fourth-order Runge-Kutta method with the epidemic parameters and initial condition collectively denoted by Θ = [α, β, ε, η, γ I , γ A , z 1 , z 2 , a, b, µ, ρ, θ, ϕ, γ Aq , N e , E 0 , A 0 , I 0 , E q 0 , A q 0 , R 20 ]. See Table 1 for more detailed description. To solve the high-dimensional optimization problem in an effective manner, we perform the MCMC sampling [9,10] of the parameter space using the MATLAB toolbox MCMCSTAT [2], as shown in Algorithm 1. In our optimization procedure, we randomly generate 20 sets of initial values in a reasonable range around the parameters, and the proposed algorithm selects the best initial value leading to the least sum of squared deviations from the surveillance data. We further numerically examine the initial-value sensitivity (IVS) of our model predictions, finding that a considerable fraction of parameters share almost the same optimal initial values (which we called "generic initial estimation", see Table 4) across different demographic regions, possibly and partially due to the common epidemiological properties of the COVID-19 dynamics. This robust IVS of our model significantly alleviates the difficulty of the inference problem and renders the estimation process of (non-generic) parameters computationally tractable. It is noteworthy that our results derived from local optima of a loss function subject to specific initial values of the (non-generic) parameters only provide suboptimal, rather than an optimal, predictions, but numerical experiments demonstrate the potential of our method in forecasting of the COVID-19 outbreaks in practice.

Results
We first apply our method to predict the inflection time with the largest single-day number of confirmed cases during the outbreaks of COVID-19 in Chinese mainland and South Korea, respectively. Considering that the asymptomatic ratio of confirmed cases is very small in the surveillance period [1], we set χ = 0 when fitting with data reported by China CDC. But in other regions, we set χ = 1. Figures 2 and 3 plot the simulated populations of quarantined infectious and quarantined removed individuals using the inferred parameters from the surveillance data (also see Tab. 2). Here we set k max = 1000.
With the estimated parametersΘ and simulated time course of epidemics, one can readily obtain several key transmission characteristics of COVID-19, such as the inflection time τ infl = arg max t {I q (t;Θ) + χA q (t;Θ)}, the mean latent periodτ inc = 1/α, the asymptomatic ratio p A (t) = Aq(t)+A(t) Iq(t)+I(t)+Aq(t)+A(t) and the effective reproduction number defined as follows (see Appendix for detailed derivation): (3.1) In the surveillance period, the average R 0 ranges from 1.74 to 3.28, and the mean latent period is relatively short in the Chinese mainland, Japan, and South Korea, while the average does not exceed ten days across the surveillance regions. Moreover, the real values of the inflection point and the number of existing confirmed cases at the inflection point fall within the confidence intervals. See Tables 2 for the detailed estimation results using the surveillance data across different demographic regions. To explore the proportion of cases with asymptomatic infections in the surveillance period, we create the box plots for the data of asymptomatic ratio p A (t) at t = 7, see Figure 4. These plots show that the asymptomatic ratio of the United States exhibits a more obvious increment to the other ones, and the standard deviation of the prediction to the asymptomatic ratio of Japan is relatively high.

Discussion
In this paper, we proposed a modified SEIR model with both quarantined and asymptotic populations as well as infection by exposed individuals for fitting the observed transmission dynamics of COVID-19, thus enabling an accurate short-term prediction of the infectious disease and providing implications for outbreak control. Several important epidemiological parameters represented by the effective reproduction numbers R 0 across different demographic regions were also calculated, which are close to the result obtained in [16]. A recent review [6] reported that an average estimation for R 0 is around 3.28, which is consistent with our findings (see Tab. 2).
Our prediction of the asymptomatic ratio of COVID-19 also agrees basically with the medical evidence [7] based estimates of the pandemic. From Table 3, we obtain the mean value ofβ, that is, the transmission epidemic under control. In contrast to these two regions, the detection rate of asymptomatic carriersμ was much low, and the epidemic situation of the United States seems to have been more severe.
On the other hand, since the early recovery rate in these high-prevalence areas is generally low because of the long incubation period and high incidence of COVID-19 within aged population, it is therefore imminent to continuously strengthen prevention and control measures and improve medical standards. According to the current result, strengthening the prevention and control measures can slow the development of the epidemic.
To generalize this work, we need more time-dependent parameters instead of constant ones. In the present situation, possibly unsteady infection prevention and control measures may induce that our prediction lacks precision. Besides, due to the local convergence of the algorithm, we need to find the appropriate initial estimation to accelerate the process of fitting the reported data. In future studies, it would be important to design a more effective algorithm for this kind of high-dimensional optimization problem.
Appendix A. Derivation of equation (3.1) In this appendix we provide detailed derivation of the basic production number R 0 using the next-generation matrix approach [11]. At the disease-free steady state S = N e , we consider the following closed linearized infection subsystem of the model (2.1): If we set x = (E, I, A) , where the symbol denotes transpose, the subsystem can be rewritten in a vector form as follows: Then from the next-generation matrix given by we have the basic reproduction number R 0 as follows: where · denotes the matrix spectral norm.
Appendix B. Detailed description of Algorithm 1