I have been slowly reading through the Verizon Data Breach Report (which is awesome!) and one thing kept niggling at me. As I read through it, this popped into my mind: Are the numbers maybe skewed by just one or a couple huge cases?
My Hypothesis: Only one or a couple of breach cases are responsible for a huge majority of the records breached.
So I started taking notes and went back into the report a bit. Satisfied with my findings, I read on not 1 more page before the authors outright stated on page 32: “The top five breaches account for 93 percent of total records compromised.” Way to deflate my balloon!
Nonetheless, it does diminish the value of the graphs dealing with number or percent of records, which I think the authors acknowledged by keying more on the breaches and less on the records disclosed. So that’s good!
Following are the notes I had taken to investigate my hypothesis. They’re here mostly just to hear myself talk, and don’t necessarily have much actual use othewise. But feel free to read if you want.
90 breaches in the study (pg 6)
285,000,000 records involved (pg 6)
financial services account for 30% of the breaches (~30) (pg 6)
financial services account for 93% of the records (265,000,000) (pg7)
external sources account for about 93% of the records (266,788,000) (pg 11)
median of external records per breach is 37,847 (pg 11)
I’m going to guess that all of the meaningful financial services breaches occurred with external sources, considering the numbers above. This means that out of 30 breaches with a total record disclosure of 265,000,000, the average breach should be 8.8 million. If this were a normal distribution, the average and median should be similar, but they’re not even close. To me, this indicates just a couple large numbers, while many of the others were quite small.
95% of records were breached by an attack of high difficulty (pg 28)
There are some other numbers which indicate that there was really not just one single large incident, but at least a few. If there was just one large incident, these numbers would also be nearly 90%, but they’re not:
Financial services almost certainly were targeted by just the larger % types of hacking from the graph on page 17: SQL injection, improperly constrained or misconfigured ACLs, and unauthorized access via default or shared credentials. The attack was through a web application (79%) and remote access & mgmt (27%) and/or End-User Systems (26%). (pg 19) This could certainly indicate at least 2 major incidents that account for this huge number of records breaches in 2008. In fact, I wouldn’t be surprised if one large incident was due to a web app, and a second was a combination of remote access andend-user systems, with those two attacks being the huge majority of the records.
I’m actually surprised that no graph was presented which shows that a huge percentage of the records fell to targeted breaches, as that is what I suspect, at least with highly difficult breaches, anyway.
Certainly these couple financial services breaches housed online data, as 99.9% of all records were online data (pg 30), i.e. payment card records which were 98% of all records (pg 32).
Hell, page 32 confirms my suspicions: the top 5 breaches contribute 93% of all records. Doh!