DATS 6401: Visualization of Complex Data

Final Project

Data Breaches data between 2009 to 2018

Tejaswini Vuppu Aug 17, 2019



Data Breaches has become a common problem nowadays, not in a particular country that, I am pointing out. It is happening all over the world. It happens all over the world. I used some visualizations to overcome this issue to some extent by looking at Type of breach, Company impacted by it and which organization mostly attacks. I showcased on the highest breach happened from the year 2009-2018 based on the attributes given in the dataset. Statements are for companies related to data breaches where I saw data of the company that affected but also customers and vendors having difficulty. The data for this topic comes from the Privacy Rights Clearing House. Dataset is with factors like Year, Company, State, Total Records, Latitude, Longitude, Description of Incident etc



chart represents total number of breaches by year



In the below diagram, I have displayed the histogram, by plotting the Year of the breach in the x-axis and count of attacks on the y-axis. And I have represented it, naming the distribution of breaches by year. When I look at the histogram, 2017 is the year where hackers are mostly concentrated. In that year, i see the highest breach count. Until 2013, I do not see so many data breach happening, but it started increasing from 2014. And the state becomes vice versa every year, but I see the increased state from 2014



chart represents type of breach and number of occurances



In the below diagram, I have displayed the barplot by plotting the Breach Type in the x-axis and count of it on the y-axis. And I have represented it naming the count of breach types. When I look at barplot HACK is a breach which is topmost breach type where most hackers use it. Rest second and third positions are covered by DISC and PHYS.In this barplot, i see the most prominent data breach type. HACK is breach type with a count of 250. Remaining all having the count below 100. So, if companies attacked by this type of breach, will be careful this problem may be eradicated. Other companies also should know more about this type of breach and know the ways to stop attacked by this type of breach.



chart represents type of breach by state and total number of records exposed



In the below diagrams, I plotted the geo maps by taking the State column data and combining it with the total number of records. And now I got to know which State has the highest number of records saying that this state is most commonly affected with the data breaches. Based on this information also I got to know which state should protect the data from the hackers. I got to know how many times each state is affected.

This graph is not able to show proper metrics because of the number of records is exceeding data type.



chart represents type of breach by state and total number of occurances





chart represents type of breach by year





chart represents type records count by longitude latitude







In this project, I dealt with different types of breaches and tell people who are worried about it. These are areas, which are to be improved and vendors, be able to see which is the most important breach, happening frequently. By using the 'Description of Incident' column, we can eradicate the attacks which are coming in future by using some methodologies. I conclude saying that factors to protect the company from data breaches are giving limited access to secure database which is Information source in this dataset and keeping passwords which are quite lengthy for these type of databases.