Drug Database
Description
# drugdatabase
## Introduction
In this project, I scrape the data from https://www.webmd.com/ website which contains information about health including drug, medicine and doctor for finding the most effective drug from customer review information. You can find the recommended way to cure your illness from this website. But in this project, I will focus on drug information of common condition as the website provide customer to review and rate the drug. So I will scrape this information to see which drug has the most effective and satisfaction to customer. Moreover, I will take a look at the effective of each drug form and what form is commonly use.
Enjoy the programing.
## Part 1 : Scraping Data
The first process is to scrape the drug information for each condition from the website. The information would consist of drug name, indication, type, form, average price, number of review, effective, ease of use, satisfaction from customer review. I use Beautiful Soup to read the content of html and collect all the links for all common condition which have 37 conditions. After that, I use for loop with selenium to open all links to find all drug links for each condition and then collect all information. Finally, I delete all Null values in data frame and save it as csv file.
## Part 2 : Cleaning Data
The next process is cleaning and preparing data for further analysis. As the website did not contain the specific part of average price and form of the drug, I can take just paragraph of information. I have to extract information from the paragraph, I use tokenize function to get the sentence that contain ‘average’ word which contain average price and form in the sentence. After that, I split the sentence into word and get the information from it. The last cleaning process is to sum all drug review rating and find the average of drug that have the same name, same form for same condition as sometime the website contain the duplicate drug information. Finally, I have 685 drugs left for 37 conditions.
# Part 3 : Drug Database
As I would like to see the effective of drug by form, indication, type and condition, so I sum all data up and find the average by each group. After I got all the data, I create the drug Database. The drug database consists of 7 function.
First function, you can see the top drug for each condition by specific criteria.
Next function is to see the overall drug of each condition by criteria.
Third function, it will show the boxplot of different form of drug for specific condition by criteria.
The next one is nearly the same as the third function, show overall rating the boxplot of all drug in database by different form.
The next two function is to see the different overall rating of OnLabel and OffLable, RX and OTC.
The last function is to see the what is the most commonly form of drug in the market by pie chart.
## Conclusion :
In conclusion, you can see from the pie chart, tablet is the most commonly sale in the market for relief 37 common condition. Next we take a look at price and effective of each form, we can see the effective rating are nearly the same, but liquid drink form is the most cheapest. So if you can, you would consider to buy liquid drug for saving your money. This boxplot show that price and effective of drug are not in the same trend which mean that expensive drug doesn’t mean the effective one. For the obvious example, meniere’s drug is really cheap but the effective rate is relatively high. The next graph btw on and off label, On-Label medications - FDA approved for the treatment of this condition. Off-Label medications - occasionally used but are not FDA approved for the treatment of this condition. Ease of use and effective average seem to be the same, but on-label drug are cheaper. For RX and OTC, Rx medications - require a prescription. OTC medications - may be purchased over the Counter. The average price of OTC drug are relative low compare to RX drug, and the effective of OTC drug is higher than RX. Moreover, OTC is easiler to find in the market, so you would like to buy this type of drug when you have some illness.
Files
1.WebMD_Scraping_Data.ipynb
Files
(10.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:61438e6a1923704ab89c24c67cd11d75
|
56.0 kB | Preview Download |
|
md5:7784a26cb4e3d164783e0171165561b2
|
77.5 kB | Preview Download |
|
md5:ee083b78f8236f6325d7680d93965972
|
65.9 kB | Preview Download |
|
md5:144e67197b39ba081eb046ac1e9ed9fa
|
8.7 MB | Download |
|
md5:6701469fbd66bf70fbce6022f116ef76
|
1.2 MB | Preview Download |
|
md5:482dfb9cfb62bcd1f94597c66137b8e3
|
69.5 kB | Preview Download |
|
md5:4d2ff884ef6a24147319af18c4b416b2
|
92.2 kB | Preview Download |
|
md5:2b1bb10342fdfbccfdf492260a71eeed
|
20.4 kB | Preview Download |
|
md5:63bff75bcaca021c18d773d1ccd9af77
|
19.7 kB | Preview Download |
|
md5:7d0e2c1194351ffe124232c7dc51cf91
|
19.2 kB | Preview Download |
|
md5:b468f8d783ac3f8b4de366a4de17dbcc
|
21.8 kB | Preview Download |
|
md5:3302fa582bc29d6c4b9d855beb540fbe
|
97.3 kB | Preview Download |
|
md5:c4dcd1872027f46d0aa0a6c56c7b3d06
|
27.6 kB | Preview Download |
|
md5:8fb900c96273403ba4bef8d46eedf342
|
25.5 kB | Preview Download |
|
md5:f704eb75834d9dce0123e7f0ac49456a
|
20.0 kB | Preview Download |