Statistical Analysis on COVID-19

1. Abstract

1.1. Background: Since receiving unexplained pneumonia patients at the Jinyintan Hospital in Wuhan, China in December 2019, the new coronavirus (COVID-19) has rapidly spread in Wuhan, China and spread to the entire China and some neighboring countries. We establish the dynamics model of infectious diseases and time series model to predict the trend and short-term prediction of the transmission of COVID-19, which will be conducive to the intervention and prevention of COVID-19 by departments at all levels in mainland China and buy more time for clinical trials.

1.2. Methods: Based on the transmission mechanism of COVID-19 in the population and the implemented prevention and control measures, we establish the dynamic models of the six chambers, and establish the time series models based on different mathematical formulas according to the variation law of the original data.

1.3. Findings: The results based on time series analysis and kinetic model analysis show that the cumulative diagnosis of pneumonia of COVID-19 in mainland China can reach 36,343 after one week (February 8, 2020), and the number of basic regenerations can reach 4.01. The cumulative number of confirmed diagnoses will reach a peak of 87,701 on March 15, 2020; the number of basic regenerations in Wuhan will reach 4.3, and the cumulative number of confirmed cases in Wuhan will reach peak at 76,982 on March 20. Whether in Mainland China or Wuhan, both the infection rate and the basic regeneration number of COVID-19 continue to decline, and the results of the sensitivity analysisshow that the time it takes for a suspected population to be diagnosed as a confirmed population can have a significant impact on the peak size and duration of the cumulative number of diagnoses. Increased mortality leads to additional cases of pneumonia, while increased cure rates are not sensitive to the cumulative number of confirmed cases.

1.4. Interpretation: Chinese governments at various levels have intervened in many ways to control the epidemic. According to the results of the model analysis, we believe that the emergency intervention measures adopted in the early stage of the epidemic, such as blocking Wuhan, restricting the flow of people in Hubei province, and increasing the support to Wuhan, had a crucial restraining effect on the original spread of the epidemic. It is a very effective prevention and treatment method to continue to increase investment in various medical resources to ensure that suspected patients can be diagnosed and treated in a timely manner. Based on the results of the sensitivity analysis, we believe that enhanced treatment of the bodies of deceased patients can be effective in ensuring that the bodies themselves and the process do not result in additional viral infections, and once the pneumonia patients with the COVID-19 are cured, the antibodiesleft in their bodies may prevent them from reinfection COVID-19 for a longer period of time

2. Key words

New coronavirus; Infection prediction; Infection prevention and control; ARIMAX model; SEIR model

3. Introduction

Since December 2019, many unexplained cases of pneumonia with cough, dyspnea, fatigue, and fever as the main symptoms have occurred in Wuhan, China in a short period of time [1, 2]. China's health authorities and CDC quickly identified the pathogen of such cases as a new type of coronavirus, which the World Health Organization (WHO) named COVID-19 on January 10, 2020 [3]. On January 22, 2020, the Information Office of the State Council of the People's Republic of China held a press conference introduced the relevant situation of pneumonia prevention and control of new coronavirus infection. On the same day, the People's Republic of China's CDC released a plan for the prevention and control of pneumonitis of new coronavirus infection, including the COVID-19 epidemic Research, specimen collection and testing, tracking and management of close contacts, and propaganda, education and risk communication to the public [4]. Wuhan, China is the origin of COVID-19 and one of the cities most affected by it. The Mayor of Wuhan stated at a press conference on January 31, 2020 that Wuhan is urgently building Vulcan Mountain Hospital and Thunder Mountain Hospital patients will be officially admitted on February 3 and February 6[5]. By 24:00 on February 6, 2020, a total of 31,161 confirmed cases, including 636 deaths, were reported in the Chinese mainland, 22,112 confirmed cases, including 618 deaths, were reported in Hubei province, and 11,618 confirmed cases, including 478 deaths, were reported in Wuhan city. The spread of COVID-19 and various interventions have had an incalculable negative impact on People's daily lives and the normal functioning ofsociety. Cities in China's Hubei province have issued varying degrees of closures and traffic restrictions [6]. In fact, there are many imminent questions about the spread of COVID-19. How many people will be infected tomorrow? When will the inflection point of the infection rate appear? How many people will be infected during the peak period? Can existing interventions effectively control the COVID-19? What mathematical models are available to help us answer these questions? The COVID-19 is a novel coronavirus that was only discovered in December 2019, so data on the outbreak is still insufficient, and medical means such as clinical trials are still in a difficult exploratory stage [7]. So far, epidemic data have been difficult to apply directly to existing mathematical models, and questions need to be addressed asto how effective the existing emergency response has been and how to invest medical resources more scientifically in the future and so on. Based on this, this article aims to study the gaps in this part.

4. Methods

4.1. Data Recently, COVID-19 suddenly struck in Wuhan, the seventh largest city of the People's Republic of China. The daily epidemic announcement provides us with basic data of epidemiological research. We obtained the epidemic data from the National Health Commission of the People's Republic of China from January 10, 2020 to February 9, 2020, including the cumulative number of cases, the cumulative number ofsuspected cases, the cumulative number of people in recovery, the cumulative number of deaths and the cumulative number of people in quarantine in the Chinese mainland [8]. At the same time, we collected the epidemic data of Hubei Province and its capital city Wuhan from the Health Commission of a Province from January 20, 2020 to February 2, 2020, including the cumulative number of cases, the cumulative number of recovered people and the cumulative number of quarantined people in Hubei Province and Wuhan [9].

4.2. The Model Based on the collected epidemic data, we tried to find the propagation rule of the COVID-19, predict the epidemic situation, and then propose effective control and prevention methods. There are generally three kinds of methods to study the law of infectious disease transmission. The first is to establish a dynamic model of infectious diseases; The second is statistical modeling based on random process, time series analysis and other statistical methods. The third is to use data mining technology to obtain the information in the data and find the epidemic law of infectious diseases [10]. Considering the shortage of the collected public data in time span, the research content of this paper is mainly based on the first two kinds of methods. The spread of the COVID-19 has exploded rapidly in Wuhan, China, and effective government intervention and prevention and control measures in all sectors depend on the best possible outbreak prediction [11]. This paper mainly builds a dynamic model of COVID-19 transmission and a statistical model based on time series analysis, and compares the prediction effects of these mathematical models on the spread of COVID-19 epidemic. Due to the outbreak of existing data is not relatively large sample data, in the spread of COVID-19 at this stage, the dynamics model we built is more suitable for containing parameters to be estimated to predict the development trend of epidemic, peak size, etc., based

4.3. SEIQDR-Based Method for Estimation After the outbreak of the COVID-19 epidemic, the Chinese government has taken many effective measures to combat the epidemic, such as inspection detention, isolation treatment, isolation of cities, and stopping traffic on main roads [12-14]. However, the traditional SEIR model cannot fully describe the impact of these measures on different populations. Based on the analysis of the actual situation and existing data, we divided the population into different warehouses and established a more effective model for the dynamic spread of infectious diseases. According to the actual situation of the epidemic, we divided the population into 6 different categories to comply with the current spread of COVID-19 in China. Seeing Table 1 for specific classification. Since the incubation period of the COVID-19 is as long as 2 to 14 days, there are already infected but undetected people (E) in the natural environment of the susceptible population (S), when the first case is identified. Some people who have been infected need to go through a certain incubation period before suspected symptoms can be detected (Q). Chest CT imaging was used to observe whether there were glassy shadows in the lungs to determine whether the diagnosis was confirmed (D). Another part of the population has been infected and has been sick, because not isolated, is highly infectious in the population. After a period of quarantine treatment, these two groups of people will be discharged from hospital (R), or face death due to basic diseases, based on these, we classify the population as shown in (Table 1).

5. Results

5.1. TS Model-Based Estimates We use sequence diagrams and autocorrelation functions of the original data to determine the stationarity of these time series, and to smooth the series whose average and variance are not always constant. In the exponential smoothing method, we perform a natural logarithmic transformation on the series to omplete the smoothing process. In the ARIMA and ARIMAX models, we use the first-order difference or the second- order difference to smooth the original sequence. Using the above processing, we can obtain the time series analysis model summary information of the number of confirmed cases in mainland China as shown in (Table 2). As shown in Table 2, we have established multiple time series analysis models for the number of confirmed cases in mainland China. By comparing, we can initially find that the Brown model is a natural logarithmic transformation of the original sequence and the second order ARIMAX (0,1,0) model for difference processing seems to be more suitable. Among them, the Brown model has a stationary R-square of 0.605, the Ljung-Box Q (18) test statistic has a value of 0.958, the stationary R-square of the ARIMAX (0,1,0) model is 0.977, and the value of the Ljung-Box Q (18) test statistic is 0.987. According to Table 2, we find that ARIMAX (0,1,0) model is the best of the six time series models in terms of goodness of fit and Ljung box Q(18) test results. We preliminarily think that Brown model and ARIMAX (0,1,0) model should have good statistical significance, and they should be able to predict the number of

5.2. SEIQDR-Based Estimates According to the data released by the National Health Construction Commission of China, we set the data on January 10 as the initial value. On January 10, the transmission of COVID-19 only occurred in Hubei Province, of which 41 were confirmed, 0 were suspected, 2 were cured, 2 were infected with the COVID-19 but not yet sick, and 0 people were ill but not isolated, namely: Since COVID-19 originates from Wuhan, Hubei, the above initial value can also be used as the national initial value. Based on this, we use the least square method to calculate k and errors SSE of Hubei Province and Mainland China as follows : Among them, means Hubei Province’s k value, means the k value of mainland China, because Hubei Province is the birthplace of the epidemic, with a large number of patients and limited medical resources, a large number of mild patients are self-isolated at home, which increases the transmission time after the incubation period [30]. The number of cases outside Hubei Province is small, and medical resources are sufficient, making the k value small. The latent can get timely treatment after the onset, and reduce the transmission time after the onset. It can be seen from the comparison of k value that it is a very correct and effective decision

6. Discussion

There is no doubt that the propagation of COVID-19 in the population will be affected by the intricacies of many factors. In the early stage of the COVID-19 propagation, it is difficult to establish a dynamic propagation model with parameters to be estimated and obtain fairly accurate simulation results, but the preliminary estimation of parameters such as average latency and mortality through existing data may be helpful for solving important parameterssuch as infection rate and rehabilitation rate, which will help us have a more accurate grasp of the transmission trend of COVID-19. On the other hand, statistical modeling of the spread of new coronavirus pneumonia in the population based on time series analysis is a thing that can be done immediately after getting the latest data every day, because the dynamic model of the time series is based on the law of the data itself. Although this method often requires sufficient data to support it, in the early stages of epidemic transmission, this method can still be used to more accurately predict the indicators of epidemic transmission in the short term, so as to provide intervention control at all levels of the departments and Policy implementation provides short-term emergency prevention programs.

7. Limitations

This article will inevitably make some assumptions when building the model. When we build a dynamic discrete model for a certain period of time for COVID-19, we ignore the impact of factors such as population birth rate and natural mortality. For simple calculations, we also Assume that the latent population of COVID-19 and the infected but not yet isolated population have the same range of activities and capabilities, that is, we assume that for COVID-19, the population And the crowd have the same contact rate. On the other hand, this article is based

8. Acknowledgement

This work was supported by the Philosophical and Social Sciences Research Project of Hubei Education Department (19Y049), and the Staring Research Foundation for the Ph.D. of Hubei University of Technology (BSQD2019054), Hubei Province, China.

References

1. Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet, 2020.

2. Shen M, Peng Z, Xiao Y, et al. Modelling the epidemic trend of the 2019 novel coronavirus outbreak in China. bioRxiv, 2020.

3. World Health Organization (WHO). Coronavirus. 2020

4. National Health Commission of the People’sRepublic of China. 2020.

5. Health Commission of Hubei Province. 2020.

6. Health Commission of Hubei Province. 2020.

7. Chan JFW, Yuan S, Kok KH, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet, 2020.

8. National Health Commission of the People’sRepublic of China. 2020.

9. Health Commission of Hubei Province. 2020.

10. Ma ZE, Zhou YC, Wang WD, et al. Mathematical modeling and research of infectious disease dynamics. 2004.

11. Wu P, Hao X, Lau EHY, et al. Real-time tentative assessment of the epidemiological characteristics of novel coronavirus infections in Wuhan, China, as at 22 January 2020. Eurosurveillance, 2020, 25(3): 2000044.

12. National Health Commission of the People’sRepublic of China. 2020.

13. National Health Commission of the People’sRepublic of China. 2020.

14. Health Commission of Hubei Province. 2020.

15. Read JM, Bridgen JRE, Cummings DAT, et al. Novel coronavirus COVID-19: early estimation of epidemiological parameters and epidemic predictions. medRxiv, 2020.

16. National Health Commission of the People’sRepublic of China. 2020.

17. National Health Commission of the People’s Republic of China. 2020.

18. Dye C, Gay N. Modeling the SARS epidemic. Science, 2003, 300(5627): 1884-1885.

19. Riley S, Fraser C, Donnelly CA, et al. Transmission dynamics of the etiological agent of SARS in Hong Kong: impact of public health interventions. Science, 2003, 300(5627): 1961-1966.

20. de Oliveira EM, Oliveira FLC. Forecasting mid-long term electric energy consumption through bagging ARIMA and exponential smoothing methods. Energy, 2018, 144: 776-788.

21. Chen P, Yuan H, Shu X. Forecasting crime using the arima model[C]//2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, 2008, 5: 627-630.

22. Li X. Comparison and analysis between holt exponential smoothing and brown exponential smoothing used for freight turnover forecasts[C]//2013 Third International Conference on Intelligent System Design and Engineering Applications. IEEE, 2013: 453-456.

23. HANSUN S. A New Approach of Brown’s Double Exponential Smoothing Method in Time Series Analysis[J]. Balkan Journal of Electrical and Computer Engineering. 2016, 4(2): 75-78.

Citation:

Zhao B. Statistical Analysis on COVID-19. Annals of Clinical and Medical Case Reports 2020