Protected health information breaches on GitHub
Description
Medical scientists are encouraged to use GitHub for software development, but without training, they might leak protected health information (PHI) by inadvertently including data in what should be software repositories. During the fall of 2016, we attempted to identify obvious breaches of PHI on GitHub as part of an ongoing interest in patient privacy. Searching GitHub for keywords patient, dob, and ssn uncovered hundreds of repositories, which were further scanned for sensitive information (names, organizations, phone numbers, street addresses, credit cards, IPs, SSNs, and emails) using Python's common regex module and the Stanford Natural Language Toolkit. Manual investigation of the results uncovered three repositories that exposed patient information. A popular health care provider exposed approximately 4000 patient names. On Dec 1, we were able to track down the healthcare provider from both the repository name and doctor names in the repository files. We contacted the organization and those repositories were taken down within the day of contact. A health collection agency’s repository led to the exposure of social security numbers, dates of birth, home addresses, email addresses, and insurance and billing information of roughly 30,000 patients. After we contacted the repository owner, the data and the repository were removed from GitHub some six months after the data had first been exposed. A crisis center’s long-term breach of PHI was discovered in August 2016, and from the repository dates, it had been up for at least three years. The original repository managed the medical records application developed for the crisis center. We contacted that organization and the repository was taken down within a few days. A contractor for a health insurance wellness program leaked some patient data with names, social security numbers, addresses, and health measures such as blood pressure, etc. This organization was contacted by our hospital compliance office and the GitHub repository was removed. Our talk will cover the discovery of these PHI breaches, and how we handled them with GitHub, our compliance office, and the organizations involved.
Files
Files
(14.5 MB)
Name | Size | Download all |
---|---|---|
md5:347a475b4b42ceb860678e1f4586d3bf
|
14.5 MB | Download |