Published November 6, 2019 | Version v1
Journal article Open

Most Popular Check-ins Place from Instagram

Description

# Most Popular Check-ins Place in Instagram

## Introduction

This project scrape 'ustrip' hashtag data from Instagram social application for finding the most popular US check-ins place. Moreover, this project also use data from https://www.macrotrends.net/ for US index data, https://www.ncdc.noaa.gov/ for average temperature of each state in each month data, and https://crime-data-explorer.fr.cloud.gov/ for crime data. These 3 data are used to analysis which factors that would affect to number of check-ins in each place. You can also find the popular place to check-ins for each state in each month too. Enjoy the programing. 

## Part 1 : Scraping Data

The first process is to scrape the check-ins and date of the user from Instagram. In this project, I use selenium to connect to Instagram website and use Beautiful Soup to read the content of html. As Instagram is a infinite scroll website, I use selenium to open the browser and scroll the page down until the end to collect all the post links. Then I use for loop to open all the links and get the check-ins location and date from each posts.

## Part 2 : Cleaning Data

As all posts didn’t contain check-ins place so the second step is clear all rows that don’t have check-ins data by delete all none values.Moreover, I would like to find the factor that would affect to number of check-ins. There are 3 variables that I think it would have some relation with number of travel which are USIndex, Temperature and Crime rate. So I use data from macrotrend website for USIndex data, National Center for Environmental information for average temperature of each state, and FEDERAL BUREAU OF INVESTIGATION for the crime data. For the temperature data, it contain separate data of each stage, so I use for loop to run all files and then concatenate all the file. Then I have to clean all data up a bit and rename some columns for more readable.

## Part 3 : Finding Latitude and Longitude

After I have all location data, the next process is to find the latitude and longitude of the location. I use geo coder to find the location from the name of the check-ins. And also drop the place the cannot find the location of it. 

## Part 4 : Creating Database

After I got all the data, I create the Instagram 'ustrip' Check-ins Database. I check up the posts number of each year and observe that some year contain only just a small data, so I decide to use the data from 2016 to 2019.

This are the function in database :

1. Geomap plot for total check-ins of each state in each season, month or year. 
2. The plot of total number of check-ins of each state from JAN 2016 to OCT 2019.
3. Top check-ins place for each state and number of check-ins. 
4. Map plot top Top check-ins place for each state and number of check-ins.
5. The number of checkins with US Index. 
6. The number of checkins with temperature of each state.
7. The number of checkins with crime rate 

## Conclusion :

From the database, you can find the popular place or state to visit in each season. Moreover, it also show that there are mostly likely to have 2 peak check-ins periods for each year, which are March to April and August to September. From the US Index graph, it suggest that US Index are not seem to affect to number of check-ins. I think this is because even the US Index is high, but if it is in the high season, they would still get out to travel in US anyway. From the temperature graph, it number of check-ins is in the same trend as temperature. This must because international tourist would likely to have some cold experience or see some snow as their country  didn’t have such a thing, so new experince is what people want.Finally, the crime graph show that the crime rate is continuously drop since 2016 and the number of chech-ins also rise up too. Thererfore, the safety of place would also affect number of tourist too.
 

Files

1.IG_'ustrip'_Scrape_Data.ipynb

Files (61.7 MB)

Name Size Download all
md5:3770baa049ae3fe854ab8b1640f4a278
7.3 kB Preview Download
md5:e0cba0eea8725a3be73674aeb3dc960f
15.8 kB Preview Download
md5:9ec712b9336a0130d613f6c0c3f786d4
71.8 kB Preview Download
md5:05fc92b8b88c122b018b5d87f2cd2d3c
3.9 MB Download
md5:4b2221b0adce677635b511facdf18cc4
363.4 kB Preview Download
md5:144e67197b39ba081eb046ac1e9ed9fa
8.7 MB Download
md5:31f1b74403a15e6fb1dfabd4705c0d82
3.0 MB Preview Download
md5:a7aa726625bbb7846854957c655d51c3
1.9 MB Preview Download
md5:ac46ee4652acdb90fa8c0200b5535696
8.5 kB Preview Download
md5:924491685a0c801f098a15acfa61c698
6.7 MB Preview Download
md5:ccc8ad501195c83c139336f00255c0e8
37.0 MB Preview Download
md5:5c287b0af12f0f21aa111b4358284542
872 Bytes Preview Download
md5:7912c277a8f660c0764ec7fa30e3cb51
74.8 kB Preview Download
md5:a3fcfa9c23d86faa3cbc2f5b02663ee9
12.1 kB Preview Download
md5:9523cb43530cdbaf71e72cf33bbbb529
1.7 kB Preview Download