My Road to Digital Forensics Excellence

The Art of Incident Response

Posted by Paul Bobby on November 4, 2008

If there’s one thing that Corporate investigators have over the independent consultant or law enforcement it is the Incident Response experience. The compromise of one or many machines can be sufficient to initiate an Incident Response that may involve a few to many individuals, large numbers of machines, and even lead to governance change or other Enterprise-wide mitigations and changes.

Much has been written on the topic, and you can spend thousands of dollars taking course after course, so I won’t presume to write as if I am the expert. However I have been involved in numerous large-scale Incident Response scenarios and there are one or two aspects to the process that continue to ‘overcook my grits’.

Over-Zealous Intel and Early Indicators

So how does this all start anyway? After all some sort of early intelligence has to occur before we declare an incident. That’s not where my beef lies. It is unreasonable to assume that the intel stops once an incident is declared; it is never that clearcut, nor should it be. You may learn different indicators, analyze malware more thoroughly, and employ mitigations that put a stop to the propagation of the malware or exfiltration of data.

While this goes one, you’ve already started the hard drive pulls, the servlet installs, the triage of computers. All this requires time and effort and you run the risk of wasting significant dollars and man-hours in the pursuit of something that does not exist. Where does this wasted effort come from?

I’ve found that in the rush to get a ‘handle’ on the situation, a lot of the early information is left in a partly analyzed form. Entire series of computers may be flagged as needing remote triage based on this partial analysis. Furthermore, the group of individuals performing the forensic triage is different to the group performing the intelligence gathering. Communication between these groups is not as good as it should be. There’s no animosity between the groups, nor are geographical differences to blame; the problem I find is complacency.

During the ‘meh’ downtime between significant incidents is the boring and routine ethics investigations, smaller incidents, and other run of the mill investigations. Each one requires the same level of effort regardless of the individual involved; that coupled with work experience, friendships and all that ‘human’ stuff, finds people not questioning the data in front of them when it counts.

Here’s an example from a recent incident I was involved with. If I tell you that a machine tried to connect to another machine on port 80, say, but that machine wasn’t listening on port 80, would you ‘release the hounds’? Looks like a port sweep to me, and the machine isn’t running a web server. Doesn’t sound like it needs a servlet, triage and 5 people to deal with it does it? And yet it’s these very simple indicators that can get elevated to a truly cosmic level when associated with a larger incident.

Another unfortunate trait of human nature rears its ugly head during these incidents. Regardless of maturity levels during periods of calm, when things get toasty, I sometimes get the distinct impression that individuals are scrambling and fighting each other as they climb a mountain so they can be the first to get to the top and yell ‘I’ve found it!’. Yes, the ego’s really do come in to play, and is the biggest cause for lack of analysis and hastily drawn conclusions.

The Cluster that is Status Reporting

The main challenge to the incident handler is data management. What is everyone working on? Are they updating their results in a central location? Are they all investigating using the same approach? What about communications? What about status reporting?

The problem I see is the focus on the ways and means to solving this problem instead of actually getting it done. Roll-up spreadsheets, sharepoint, Microsoft OneNotes, they are all great tools but are as effective as a piece of paper unless the human beings involved actually take the time to update the data.

When is it over?

Intel still coming in? Machines still being triaged? Deep Dived? That’s not the problem – the problem is when all the sexiness has left the incident, and actions just linger and linger, never getting completed, no lessons learned, no root-cause analysis, no damage assessment.

Yeah that’s what frosts my cookies, the incident never truly ends.

What, if anything, frustrates you about the response process?


One Response to “The Art of Incident Response”

  1. Brian said

    Paul, what bothers me the most about Incident Response, after the circus has left with my list of recommendations is the condescending attitudes from the other groups who’s responsibility it was to make sure this never happened in the first place. They go ballistic to get you involved to locate the issue, but then realize “Oh crap, my shortcomings created this mess,” and then they try to back peddle and cover up their mess. Will they be forth coming to show they have fixed the problems after the fact? You will never hear from them again, until next time that is.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: