ANALYSIS OF THE AGE-STRUCTURED EPIDEMIOLOGICAL CHARACTERISTICS OF SARS-COV-2 TRANSMISSION IN MAINLAND CHINA: AN AGGREGATED APPROACH

The novel coronavirus (SARS-Cov-2) has raged in mainland China for nearly three months resulting in a huge threat to people’s health and economic development. According to the cumulative numbers of confirmed cases and deathes of SARS-COV-2 infection announced by the National Health Commission of China, we divided the human population into four subgroups including the adolescents group (0–19 yr old), the youth group (20–49 yr old), the middle-aged group (50–74 yr old) and the elderly group (over 75 yr old), and proposed a discrete age-structured SEIHRQ SARS-COV-2 transmission model. We utilized contact matrixes to describe the contact heterogeneities and correlations among different age groups. Adopting the Markov chain Monte Carlo (MCMC) algorithm, we identified the parameters of the model and fitted the confirmed cases from January 24th to March 31st. Through a more in-depth study, we showed that before January 28th (95% CI [Feb. 25th, Feb. 31st]), the effective reproduction number was greater than 1 and after that day its value was less than 1. Moreover, we estimated that the peak values of infection were 66 (95% CI [65,67]) for the adolescents, 3996 (95% CI [3957,4036]) for the young group, 14714 (95% CI [14692,14735]) for middle-aged group and 297 (95% CI [295,300]) for elderly people, respectively; the proportions of the final sizes of SARS-COV-2 infection accounted for less than 90% for each group. We found that under the current restricted control strategies, the most severe and high-risk group was middle-aged people aged between 50–74 yr old; without any prevention, the most severe and high-risk group had become the young adults aged 20–49 yr old. Mathematics Subject Classification. 92D30, 93C15. Received May 6, 2020. Accepted July 31, 2020. 1. Background The novel coronavirus pneumonia (SARS-COV-2) is a new champions league by linear single strand caused by chain RNA coronavirus pneumonia, which can lead to the Severe Acute Respiratory Syndrome (SARS), the


Data description and collection
All the daily cumulative documented cases on SARS-COV-2 infections from the National Health Commission of People's Republic of China were collected for the modelling study and data fitting [20]. Since the diagnosed method has been changed and the clinically diagnosed cases had been counted in the cumulative documented cases on February 12th, 2020, the period of data for the epidemic curve fitting were collected from January 24th to March 31st [17]. Additionally, to obtain relatively reliable data, the sudden increment on February 12th was allocated to each day in the preceding week in proportion to the original increments of documented cases by day in these days.

Age-structured model
In order to construct the SARS-COV-2 transmission model, we divided the population into four age groups: 0-19 yr old, 20-49 yr old, 50-74 yr old, and 75 yr old and over, which characterize the spread heterogeneity of SARS-COV-2 in adolescents, young adults, middle-aged adults and elderly individuals. Let N k (k = 1, 2, 3, 4) represent the total populations of 0-19 yr old, 20-49 yr old, 50-74 yr old, elderly people over 75 yr old, respectively. According to a generalized SEIR modeling approach, we divided the total population N k (t) in age group k into six different statuses: susceptible class, latent class, infectious class, hospitalized class, recovery class, and unsusceptible class, represented by S k (t), E k (t), I k (t) H k (t), R k (t), and S Q k (t), respectively. The susceptible individuals in age group k contact c kj times with the infected individuals in age group j. Then the c kj Ij Nj . Susceptible individuals in age group k infected by the total infected individuals in each age group j become latent individuals E k at transmission probability β k . A proportion of susceptible individuals may choose to self-quarantine (with rate α) to reduce the risk of infection. Unsusceptible individuals may drop off their awareness of self-protection and become susceptible individuals at rate α 1 . Latent individuals transferred to infected individuals after a 1/q-day incubation period. Infected individuals comprising of symptomatic individuals and asymptomatic individuals converted to recovered individuals R k after 1/γ-day treatment or were quarantined in hospitals at rate k. The hospitalized individuals will either recover or die at rates γ 1 and d, respectively. The hospitalized individuals were absolutely quarantined, which completely cut off the transmission routes from susceptibles. The transformation between the above six states of SARS-COV-2 infection can be expressed by the following differential equations: The biological significance and parameter values of model (2.1) were given in Table 1. Table 2 gave the initial values of model (2.1). The basic reproduction number, the average number of secondary cases produced by one infected individual during its infectious period in all complete susceptibles, is an important quantity to determine emerging diseases prevalence. Adopting the approach in [4,13], we calculate the basic reproduction number of SARS-COV-2 with age structure as  Table 3. Total population of each age group (unit: 1000). and where ρ is the spectral radius of matrix F V −1 , I d represents a 4 × 4 identity matrix, and 0 represents a zero matrix with 4 × 4 elements. The numbers of N k were taken from Table 3. Generally, R 0 is to estimate the severity of the onset of an outbreak. As the times evolves, the rates of susceptibles and infectives decreases as the rate of removed and quarantined people increase. However, the effective reproduction number, R e , is the average number of secondary infections produced by a newly infected individual at time t. For model (2.1), Apparently, R e evolves over time, which has the property: if R e < 1, SARS-Cov-2 pandemic is self-limiting and said to be under control due to containment strategies. Undoubtedly, it is an important quantity to be qualified the effects of control measures.

Parameter estimation
According to the Chinese Population Statistic Yearbook [18], the numbers of population at each age group were listed in Table 2. Based on the introduction of SARS-COV-2 infection, the incubation period of SARS-COV-2 varies from 4 to 7 days. In this work, we chose an average incubation period of 5.2 days [5]. Moreover, the duration of SARS-COV-2 infection varies from 5 to 10 days and as such we assumed that the average duration of SARS-COV-2 propagation is 7 days.
As for age-structured epidemic models, the contact matrix plays a vital role in the description of age heterogeneities. However, the acquisition for such matrices were a little bit complected. Fortunately, we successfully catched such contact matrices in the household mode and in the general mode of 152 countries in [9]. Thanks to [9], we readily acquired a high-dimensional contact matrix of 16 × 16 elements with an interval of five years in length. In this paper, we mainly focus on the epidemiological characteristics of adolescents, young adults, middle aged and the elderly age groups of SARS-COV-2 transmission. Hence, we needed to reduce the highdimensional contact matrix with 16 × 16 elements to a low-dimensional contact matrix with 4 × 4 elements by an appropriate approach. Indeed, the above objective can be achieved by the method in the appendix of [8]. The calculation details were enclosed in the Appendix B. By a manipulation, we obtained the associated contact matrices between different age groups in the household mode and the general mode in mainland China (see Fig. 1). It was easy to see that there were some significant differences under different exposure patterns. In the household mode, parents were more likely to interact with their children than contact with colleagues. In the general mode, people of their peers had more contact chances than those with other age groups. Since a series of strict control measures such as city closure, household quarantine and traffic restrictions came into effect on January 23 in mainland China, the contact pattern was approximated to the contact pattern in the household mode introduced before. with some random perturbation errors

Simulation methods
and Let C j (1) =C j (1) and j be assumed to follow the normal distribution N (0, σ 2 j ) (j = I, D) with σ I = 1000 and σ d = 100 which make the identified parameters located in a feasible domain. Denoting the parameter vector Q, we defined the combined maximum likelihood function associated with each of the independent cumulative documented data set C = {C I (t), C D (t)} 67 t=0 as follows (2.10) Then, we derived that the conditional posterior distribution of the parameters satisfies P (Q|C) ∝ L(C|Q)P (Q) and P (Q|C) ∝ L(C|Q) if we selected a symmetric proposal distribution in the Metropolis-Hastings Algorithm [2]. Furthermore, we adopted the random walk with the probability min 1, P (Q|C) P (Q|C) (2.11) to update the identified parametersQ. We estimated the model parameters by maximizing the likelihood function defined by (2.10) as well as comparing the model quantities defined in (2.6) and (2.7). The MCMC numerical method with M-H algorithm was run for 50 000 iterations with a burn-in of the first 20 000 iterations. The 95% confidence intervals (95% CI) of estimates were obtained by conducting 1000 simulation samples and calculating the median and 95% CI.

Simulation results
Since the Chinese government has enforced a series of containment strategies, the contact patterns in different modes were substantially changed. If any strategy has not been implemented by the Chinese government, any of the Chinese may work, stay at home, go to school or go out leisure activities and in this case, the mode was defined as a general mode, which consists of work mode, household mode, school mode, and leisure mode. If a series of containment measures has been enacted, almost everyone has been quarantined at home and in this case, the mode was defined as the household mode.
According to the cumulative number of confirmed cases reported by the Health Committee of China from January 24th to March 31th [20], the fitting process was achieved by using the MCMC numerical method. According to the MCMC method with MH algorithm, the basic reproduction number was evaluated to be 1.6811 (95% CI [1.6782,1.6840]). Figure 2 showed that the model was a good fit for the cumulative number of SARS-COV-2 documented cases and confirmed deathes with the 95% confidence intervals. To calibrate the rationality of model (2.1), we picked up cumulative reported cases and cumulative deathes from April 1st to 13th   Fig. 3). Finally, as shown in Figure 4, in the household mode, the number of confirmed patients with SARS-COV-2 in the 50-74 yr old segment accounted for 67.44%; the proportion of confirmed illnesses in 0-19 yr old group was only less than 1%; elderly patients of 75 yr old and over accounted for nearly 3%.

Sensitivity analysis
The basic reproduction number plays a key role in determining the prevalence of a disease. In general, if the basic reproduction number is less than 1, the disease is eradicated; otherwise, the disease erupts. In order to calculate the basic reproduction number of SARS-COV-2 infection in mainland China, we employed the identified parameters in the previous section (see Tabs. 1 and 3) and formula (2.2). As shown in Figure 5a, the effective  reproduction number was a monotone decreasing function associated with time t, and the effective reproduction number of SARS-COV-2 infection in mainland China on April 13th is about 0.8298. This manifested that the number of people infected by SARS-COV-2 is gradually decreasing, and the situation took a favorable turn. In addition, from Table 4, it can be found that the final size at each age group in household mode has been significantly lowered than that value in the general case, which indicated that the current measures such as city closure and home isolation are very effective. Figure 3 and Table 4 showed that when control measures are in place, the peak of the epidemic will arrive earlier. Of course, the peak value of the epidemic with control was much smaller than that without control measures. Without strict control measures, infection may increase by a factor of 50 thousands in population between the ages of 0 and 19, the peak value may increase by a factor of 25 thousands in population between the ages of 20 and 49, the peak value may increase by a factor of 2 thousands in population between the ages of 50 and 57, and the risk may increase by a factor of 12 thousands in population at the ages of 75 or older. These control measures advanced the peaks of the infection. For example, for population group aged 0-19, the peak was 60 days ahead. For population group aged 20-49, the peak was 50 days ahead. For population group aged 50-74, the peak was 64 days ahead, and for population group aged 75 or older, the peak was 62 days ahead. In the absence of control measures, people in the 20-49 and 50-74 age groups were most vulnerable. Under the current control strategies, individuals in the 50-74 age group had the greatest risk of infection compared to individuals in other age groups.

Discussion
In this paper, we established an age-structured SARS-COV-2 infection model based on the compartmental theory. The age-structured heterogeneity has been addressed by a certain contact matrix obtained from [9]. Actually, such an approach not only reduces the numbers of identified parameters, but also describes the correlations between different age groups. Although we had few data information in each age group about SARS-COV-2 infection in China, we successfully mimicked the cumulative confirmed cases with the solution of model The basic reproduction number was estimated to be R 0 = 1.6811 with 95% confidence intervals from 1.6782 to 1.6840. This means that on average each infectious individual had infected 1.6811 susceptibles. Generally, the disease will increase exponentially if R 0 was greater than one, whereas it can be eradicated if R 0 was less than one under certain control measures. Indeed, although R 0 of SARS was estimated to be around 3 [1], the outbreaks of SARS were successfully controlled by isolating infectious individuals and some carefully implemented control strategies. Facing SARS-COV-2 infection, the Chinese government has adopted the familiar control strategies as the prevention experiences as SARS in 2003. The effects of household isolation and traffic restrictions have been become more and more obvious over time. On January 28th, the value of the effective reproduction number R e was about 1 followed by a decrease in R 0 , while that value of R e tended to 1 till March 16th, which implied that such policies are effective in slowing down SARS-COV-2 infection in mainland China. It is worth noting that the underlying reproduction number was smaller than those values based on the earlier cumulative data, but it was well in accordance with the value of 2.2 (95% CI [1.4,3.9]) by the statistic analysis [3,5]. Table 4 and Figure 4 reflected that under the current strategies, the duration of SARS-COV-2 infection in adolescents has been shrunk almost four months, the length of SARS-COV-2 infection for young group has been reduced almost three months, the duration for middle-aged group has been shrunk 100 days and that value for elderly people have been reduced 120 days. Meanwhile, under the current measures, the final sizes of SARS-COV-2 infection have been reduced 90% for each group compared to those values with on control strategy. Our comparative results suggested that these control measures were more effective in populations aged 0-19 yr because their risk of contact with infectives was significantly reduced compared with other age groups.
Although the social distancing and age structured epidemiological characteristics of SARS-COV-2 have been addressed in this project, the further precise information about each age group and region heterogeneities of our estimate was not well-fitted because of lacking of time series diagnosed cases in each age group. The absolute accuracy of the estimated outcomes should be validated by those epidemiological information in detail in the future. Some evidence showed that SARS-COV-2 infection has some certain link with climate and how these aspects affect the SARS-Cov-2 transmission has not been addressed. Additionally, other factors, including the improvements of medical sources, detection methods, antiviral treatments etc., will be introduced to study the potential dynamics of SARS-COV-2 infection. Individual-level heterogeneities, considering nosocomial infection and clustering infection should also be addressed by some individual-level or individual-agent models in future study.

Conclusion
By analyzing and fitting the cumulative documented data of SARS-COV-2 infection in mainland China from January 24th to April 13th, 2020, an aged-structured SEIHRQ model was constructed by dividing the population into four groups: adolescents group (0-19 yr old), young adults group (20-49 yr old), middle-aged group (50-74 yr old) and elderly aged group (75 yr old and over). The parameters and initial values of the model were identified by the MCMC algorithm, the effective basic reproduction number of the model was calculated, and the age-related characteristics of the SARS-COV-2 infection under different policies were analyzed. Through careful analysis, we found that the young group and middle-aged group have high-risk of infection. Under the policy of the household isolation and traffic restrictions, the middle-aged individuals have higher risk of infection than other age groups. Without any measure, the young group had the highest risk of being infected with SARS-COV-2.
The Chinese government has implemented a series of strict control measures on January 23rd. Those policies, aiming to ensure early detection, early quarantine, and early treatment, have updated over time. That accounted for the uptake value of the effective reproduction number of R e . Figure 5a showed that the effective reproduction number of SARS-COV-2 infection in mainland China on January 28th (95% CI [Jan.25th, Jan.31st]) reached 1, followed by a continuous decrease. That is to say, the effective reproduction number was less than 1 after January 28th, 2020. The above analysis showed that non-pharmaceutical innervations were in favor of reducing the magnitude of the epidemic peaks, delaying the arrival time of the peak, and thus combating SARS-COV-2 prevalence in mainland China. Consequently, lowering the peak size and delaying the arrival time of the peak greatly alleviate the acute pressure on the health-care system.

Appendix A. Original contact matrices
For a discrete age-structured SARS-COV-2 transmission model (2.1), the contact matrix plays an important role in the description of age heterogeneities and correlations. As shown in Section 2.3, we directly acquired the two contact matrices with 16 × 16 elements in the household and general modes from [9], listed in Tables A.1 and A.2. Due to lack of detailed information on SARS-COV-2 infection in mainland China, we just divided the population into four age groups (0-19, 20-49, 50-74, ≥ 75 yr old). We were primarily concerned with the risk of contact and possible onward SARS-COV-2 spread among the adolescents, young group, middle-aged group and elderly people. Thus, we need to adopt an appropriate approach to aggregate the matrices A.1 and A.2 into two 4 × 4 matrices. Following the steps proposed by Meltzer et al. [8], we gave the detailed calculation processes as follows.

Appendix B. Aggregate approach
Let a ij , i, j = 1, ..., m, be the elements of a known matrix, where i, j refer to row and columns, respectively, and m is the number of age groups in the known matrices A.1 and A.2. Let us aggregate the known matrices as the required matrices as introduced in [8]. Denote the modified matrix of D = (d f g ) with f, g = 1, ..., n. Then the infer and super subindexes of the revised matrices satisfies j inf = l(g), j sup = u(g).
-1. Let the contact rate between someone in group i and another individual in group g satisfy For the sake of convenience, we gave an example with matrix A.1 to show this calculation process. If we wanted to calculate the average number of contacts between an individual in 0-19 age group (in the first four columns) and another individual in 0-4 age group (in the first one row), we defined it as i.e., Repeating the above processes, we aggregated the Table A.1 into the transpose of Table B.1. After further aggregations, 0-4, 5-9, 10-14 and 15-19 age groups were aggregated into a single age group (0-19 age group), as shown in Table B.1. -2. Next, we tried to aggregate the row vectors with 16 dimensions into the associated row vectors with 3 dimensions. Assuming that Chinese population in the age group i is N i , we calculated the populationweighted contacts of N i d i g and then obtained the total contact rates between groups f and g. The above processes were given by Using equation (B.4), we calculated Y 11 (the total number of contacts made between 0-19 and 0-19 age groups), Y 12 (the total number of contacts made between 0-19 and 20-49 age groups) and Y 21 (the total  According to [18], the population distributions (N i , i = 1, .., 16) are listed in Table B.2.
Repeating the above calculation, the total number of contact matrices with 4 × 4 elements in the general mode was listed in Table B.3. -3.
Step 2 gave the whole aggregated contact matrix. Now we were concerned with the matrix that gives the number of contacts per person in the relevant age group in every day. Thus, we need to divide the Obviously, e f g denoted the rate at which an individual in age group f contacted with anyone in age group g per unit time.
For an instance, Doing the similar algebras, we addressed the aggregated matrix in the general mode as shown in Table B.4 and Figure 1b.
Redoing steps 1-3, we readily got the aggregated matrix in the household mode as shown in Table B.5 and Figure 1a.