Published May 5, 2020 | Version v1
Project deliverable Open

DATS 6401 – Final Project – Xi Zhang

Creators

Description

COVID-19 SARS and Ebola Data Analysis
This project aims to arouse public awareness of COVID-19 prevention and control. We hope to find out the similarities and differences between COVID-19 and other infectious diseases, so as to further understand COVID-19.
* Publish Site: https://xzhangfox.github.io/COVID-19-SARS-and-Ebola-Data-Analysis/
 

Motivation
When I saw COVID-19 starting to spread in Washington state in February, I am worried every day about the new coronavirus outbreak and the lack of awareness of protection in the United States. 
As a Chinese, I know what has happened in China in January and February. China is recovering right in March but it paid hugely. I know the reason for China's recovering was that the whole country was on high alert and everyone attached great importance to epidemic prevention.
But it is happening again. The United States in March, like China at the start of the outbreak, is not taking it seriously. People didn't realize the seriousness until the death rate rose. I don’t want people to suffer again. I hope to pass on the severity of the new coronavirus and help people live safely during the outbreak.
So I started this data research project on COVID-19. Through data visualization, people's awareness of self-protection is aroused, and they hope to find more directions for the solution of COVID-19 from previous outbreak cases.
Requirements
* Python 2.7/3.7
* Visual Studio Code
* Tableau
Goals
Visualization
I am going to visualize epidemic data mainly in Tableau. The information carried by the data will be mined from varying aspects and communicated to the audience interactively. Use HTML, CSS, javascript, and design styles to capture the audience's attention. And make the expression of professional knowledge more popular and interesting.

Modeling
For further research, I plan to build a model to analyze similarities and differences of COVID-19, SARS, and Ebola data. Use the time series models to predict the spread trend in the time dimension.

Data Preprocessing
Novel Coronavirus (COVID-19) Cases Data is collected by The Center for Systems Science and Engineering (CSSE) at JHU, and available to be downloaded to CSV files from GitHub. <br>
Due to the ongoing outbreak of novel coronavirus pneumonia. JHU CSSE updates the latest data of the day on a daily basis. I wrote a code in Python to walk through all the data files in the folder and organize them into a complete data set. 
With the daily update of JHU CSSE, my total data volume is increasing day by day. The latitude, longitude, and country/state data contained in this data can be used in geographic charts, and time-lines accurate to the hour can provide research on trends and future predictions. I found reliable historical data on SARS and Ebola on Kaggle. Since they are no longer updated, I only preprocessed them and unified the type of the column with COVID-19 data. They were also uploaded to the Google Spreadsheet just like the COVID-19 data.
Because of the data format of the daily report, even error types are difficult to unify. I tried using pandas to collate the content and the NLP model to fetch the data I needed. But this part is still being perfected.

Inspiration for Web Design
I want the homepage to be as eye-catching as possible. Then, with the introduction of striking visual effects, the severity of COVID-19 was conveyed by embedding relevant literature, news, videos, and charts. In terms of the interaction method, I hope to have a certain dynamic effect, so that the whole aesthetic effect is lively.

References
* CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html
* YouTube: https://www.youtube.com/watch?v=21MIvkk7Imc
* W3School: https://www.w3school.com.cn/index.html
* CDBM: https://www.cebm.net/covid-19/global-covid-19-case-fatality-rates/
* NBCNews: https://www.nbcnews.com/health/coronavirus


 

Files

Consolidation.ipynb

Files (213.8 kB)

Name Size Download all
md5:4d9e15b77017f821ad61ea054500e15e
49.7 kB Preview Download
md5:b25077f4642c5b615032451efb893d7d
74.8 kB Preview Download
md5:5ca99d339d58495aaa5abcb78ad1b6f7
4.5 kB Preview Download
md5:c74279727bb4d1e69a4a6051c825e14f
18.3 kB Download
md5:387a81be25a78248e00dbd94c95f3959
14.6 kB Download
md5:ddd30cb6ea07a208d0340612b22a3b4f
20.6 kB Download
md5:dabab810cc1ca0040f666327dbb00402
14.8 kB Download
md5:450f708c92af2899c684119349b39ba1
16.6 kB Download

Additional details

Related works