Published December 12, 2019 | Version v1
Journal article Open

Drug Database

Creators

  • 1. Tiabrat

Description

# drugdatabase

## Introduction

In this project, I scrape the data from https://www.webmd.com/ website which contains information about health including drug, medicine and doctor for finding the most effective drug from customer review information. You can find the recommended way to cure your illness from this website. But in this project, I will focus on drug information of common condition as the website provide customer to review and rate the drug. So I will scrape this information to see which drug has the most effective and satisfaction to customer. Moreover, I will take a look at the effective of each drug form and what form is commonly use. 
Enjoy the programing. 

## Part 1 : Scraping Data

The first process is to scrape the drug information for each condition from the website. The information would consist of drug name, indication, type, form, average price, number of review, effective, ease of use, satisfaction from customer review. I use Beautiful Soup to read the content of html and collect all the links for all common condition which have 37 conditions. After that, I use for loop with selenium to open all links to find all drug links for each condition and then collect all information. Finally, I delete all Null values in data frame and save it as csv file.


## Part 2 : Cleaning Data

The next process is cleaning and preparing data for further analysis.  As the website did not contain the specific part of average price and form of the drug, I can take just paragraph of information. I have to extract information from the paragraph, I use tokenize function to get the sentence that contain ‘average’ word which contain average price and form in the sentence. After that, I split the sentence into word and get the information from it. The last cleaning process is to sum all drug review rating and find the average of drug that have the same name, same form for same condition as sometime the website contain the duplicate drug information. Finally, I have 685 drugs left for 37 conditions.

# Part 3 : Drug Database

As I would like to see the effective of drug by form, indication, type and condition, so I sum all data up and find the average by each group. After I got all the data, I create the drug Database. The drug database consists of 7 function.
First function, you can see the top drug for each condition by specific criteria.
Next function is to see the overall drug of each condition by criteria. 
Third function, it will show the boxplot of different form of drug for specific condition by criteria.
The next one is nearly the same as the third function, show overall rating the boxplot of all drug in database by different form.
The next two function is to see the different overall rating of OnLabel and OffLable, RX and OTC.
The last function is to see the what is the most commonly form of drug in the market by pie chart.

## Conclusion :

In conclusion, you can see from the pie chart, tablet is the most commonly sale in the market for relief 37 common condition. Next we take a look at price and effective of each form, we can see the effective rating are nearly the same, but liquid drink form is the most cheapest. So if you can, you would consider to buy liquid drug for saving your money. This boxplot show that price and effective of drug are not in the same trend which mean that expensive drug doesn’t mean the effective one. For the obvious example, meniere’s drug is really cheap but the effective rate is relatively high. The next graph btw on and off label, On-Label medications - FDA approved for the treatment of this condition. Off-Label medications - occasionally used but are not FDA approved for the treatment of this condition. Ease of use and effective average seem to be the same, but on-label drug are cheaper. For RX and OTC, Rx medications - require a prescription. OTC medications - may be purchased over the Counter. The average price of OTC drug are relative low compare to RX drug, and the effective of OTC drug is higher than RX. Moreover, OTC is easiler to find in the market, so you would like to buy this type of drug when you have some illness.

 

Files

1.WebMD_Scraping_Data.ipynb

Files (10.5 MB)

Name Size Download all
md5:61438e6a1923704ab89c24c67cd11d75
56.0 kB Preview Download
md5:7784a26cb4e3d164783e0171165561b2
77.5 kB Preview Download
md5:ee083b78f8236f6325d7680d93965972
65.9 kB Preview Download
md5:144e67197b39ba081eb046ac1e9ed9fa
8.7 MB Download
md5:6701469fbd66bf70fbce6022f116ef76
1.2 MB Preview Download
md5:482dfb9cfb62bcd1f94597c66137b8e3
69.5 kB Preview Download
md5:4d2ff884ef6a24147319af18c4b416b2
92.2 kB Preview Download
md5:2b1bb10342fdfbccfdf492260a71eeed
20.4 kB Preview Download
md5:63bff75bcaca021c18d773d1ccd9af77
19.7 kB Preview Download
md5:7d0e2c1194351ffe124232c7dc51cf91
19.2 kB Preview Download
md5:b468f8d783ac3f8b4de366a4de17dbcc
21.8 kB Preview Download
md5:3302fa582bc29d6c4b9d855beb540fbe
97.3 kB Preview Download
md5:c4dcd1872027f46d0aa0a6c56c7b3d06
27.6 kB Preview Download
md5:8fb900c96273403ba4bef8d46eedf342
25.5 kB Preview Download
md5:f704eb75834d9dce0123e7f0ac49456a
20.0 kB Preview Download