Data Breach Chronology

Data Breach Chronology Archive (2005-2018) Available Here

Frequently Asked Questions

Q. Can I download the Data Breach Chronology Database?  

A. Yes! You may purchase the Data Breach Chronology Database here.

 

We are committed to maintaining and improving this project over the long term. By purchasing this data, you are funding continued development of this resource. We currently offer temporary download passes (subject to our Terms of Service) for our database on a sliding scale.

 

If you are working on unfunded research (including class projects) or for a nonprofit or media outlet on a limited budget, you can request a fee waiver. To request complimentary access for a limited time, please contact us at databreachchronology@privacyrights.org and briefly describe your proposed use and affiliation. Our team will respond to you as soon as possible; however, we cannot guarantee that we will be able to respond in time for any deadline. 

Q. What is the Data Breach Chronology  

A: The Data Breach Chronology is a tool designed to help advocates, policymakers, journalists and researchers better understand reported data breaches in the United States. We launched the project in 2005 in response to the widely-publicized ChoicePoint incident, and it has evolved over time from a list of manually entered breaches to a robust database and visual dashboard. 

 

Q. How is this project funded?  

A: This project was funded in large part thanks to The Rose Foundation for Communities and the Environment Consumer Products Fund.  We have also received funds for this project from cy pres awards and Consumer Federation of America.  

If you are interested in supporting this project, please reach out to us at support@privacyrights.org

 

Q.  Is this a complete record of every data breach in the United States? 

No. The data is comprised of publicly available information on reported breaches and should not be considered a complete and accurate representation of every data breach in the United States. It reflects breaches reported in the United States that are made publicly available by government entities.  

 

Q. What are the next steps for this project? 

In 2023, we are seeking additional funding to: 

  • Develop an updated taxonomy of breach types and business types; 

  • Further enrich the database with unique business identifiers where possible;  

  • Better identify duplicate breach entries;  

  • Regularly update the database; and  

  • Develop additional visualizations and tools to help those working to advance data privacy and security for people. 

If you are interested in getting updates on this project, join our email list here

 

Q. How can I get involved with this project? 

A. Thank you for your interest – there is no shortage of work that can be done to continue to improve this project, and there are many ways to join us in that endeavor! 

  • Donate your time and expertise as a data science or tableau volunteer to help us collect, clean, process, maintain, and present this resource. Contact us at databreachchronology@privacyrights.org with the subject line “VOLUNTEER”.
     
  • Apply for a legal internship to help us stay up to date on changing data security and breach notification laws.  
     
  • Apply to join our Data Breach Chronology advisory committee to help drive future project decisions and new features. Contact us at databreachchronology@privacyrights.org with the subject line “ADVISORY COMMITTEE”.
     
  • Donate to sustain the project.  

Data Breach Chronology Data 

Q. What data makes up the Data Breach Chronology? 

A. We collect data from publicly available, government-maintained data sources.  This includes the U.S. Department of Health and Human Services and various state Attorneys General who publish data breach notices they receive under their states’ data breach notification laws. 

 

Q. How far back does the data go? 

A. Our historical database has been cleaned and normalized going back to 2005. 

 

Q. What happens to the raw data after it is collected? 

A We begin the challenging task of cleaning and normalizing the raw data so that it can be entered into a single, usable database. Our data through 2021 has been cleaned and normalized thanks to the Coleman Research Lab

 

Q. How is the data processed to extract relevant information? 

A. We use a combination of human and AI resources to process the raw data, extract relevant information, and apply classifications including type of breach, type of organization, number of records exposed in the breach, and relevant date information.  

 

Q. How is AI involved in processing the data? 

A. We use a combination of manual data entry as well as OpenAI’s GPT-3.5 and GPT-4, powerful language models, to enrich our database with categories and impact information based on the information provided in the breach notification, the available training data and our own taxonomies. These inferences may be incorrect. 

If you see something that looks incorrect, please let us know by emailing databreachcorrections@privacyrights.org and include “CORRECTION” in the subject line followed by the name of the breached business that you believe needs to be corrected. In the body of your email, please include a link to the specific reported breach and the proposed correction so we may review.  

 

Q. What are the labels for breach type and business type? 

A.  

Type of Breach  

  • CARD - Fraud Involving Debit and Credit Cards Not Via Hacking (skimming devices at point-of-service terminals, etc.)  

  •  HACK - Hacked by an Outside Party or Infected by Malware  

  •  INSD - Insider (employee, contractor or customer)  

  •  PHYS - Physical (paper documents that are lost, discarded or stolen)  

  •  PORT - Portable Device (lost, discarded or stolen laptop, PDA, smartphone, memory stick, CDs, hard drive, data tape, etc.)  

  •  STAT - Stationary Computer Loss (lost, inappropriately accessed, discarded or stolen computer or server not designed for mobility)  

  •  DISC - Unintended Disclosure Not Involving Hacking, Intentional Breach or Physical Loss (sensitive information posted publicly, mishandled or sent to the wrong party via publishing online, sending in an email, sending in a mailing or sending via fax)   

  •  UNKN - Unknown (not enough information about breach to know how exactly the information was exposed)  

 

Type of Organization:  

  • BSF - Businesses (Financial Services, Banking, Insurance Services)  

  • BSO - Businesses (Manufacturing, Technology, Communications, Other)  

  • BSR - Businesses (Retail/Merchant including Grocery Stores, Online Retailers, Restaurants)  

  • EDU - Educational Institutions (Schools, Colleges, Universities)  

  • GOV - Government & Military (State & Local Governments, Federal Agencies)  

  • MED - Healthcare and Medical Providers (Hospitals, Medical Insurance Services)  

  • NGO - Nonprofits (Charities and Religious Organizations)  

  • UNKN – Unknown 

 

Q. Have you considered revising the labels for breach type and business type?  

A. Yes.  We plan to address this when we secure resources for this project. 

 

Data Breach Chronology Dashboard 

Q. I have an idea for a new visualization, can I provide input for this project? 

A. Yes! Please let us know at databreachchronology@privacyrights.org and include “SUGGESTION” in the subject line.  

 

Q. I believe a breach has incorrect information associated with it, how can I alert you? 

A. Please email us at databreachcorrections@privacyrights.org and include “CORRECTION” in the subject line followed by the name of the breached business that you believe needs to be corrected. In the body of your email, please include the specific reported breach and the proposed correction so we may review.  

 

Limitations and Disclaimers 

The Data Breach Chronology is based on publicly available information and should not be considered a complete and accurate representation of every data breach in the United States.  Rather, it reflects the data breaches that have been reported and made publicly available in the United States.

   

You should pay careful attention to the issue of duplicate reporting when making use of this data or making assertions based on this data. This version of the Data Breach Chronology does not identify when a single breach has been reported to multiple state Attorneys General. As of February 2023, there will be duplicated breaches in the database and reflected in the dashboard.

 

Additionally, though we have (where possible), scraped the contents of breach notification letters and include a link to the original PDF, we do not host these letters locally–and links may no longer be active.

   

Privacy Rights Clearinghouse makes no representations as to the accuracy of the information included in the Data Breach Chronology.