Professional Documents
Culture Documents
WRITTEN: 2004
Example..................................................................................................................1
Introduction.............................................................................................................2
SANS Six Step Incident Response Methodology...................................................4
Incident Response Tools........................................................................................6
Example Corporation – Worm Incident Revisited...................................................7
Common Mistakes of Incident Response.............................................................10
Conclusion............................................................................................................12
Glossary................................................................................................................13
DISCLAIMER:
Example
Meanwhile, the worm, which caused the above laptop problems, continues to
spread throughout Example’s network. The malicious software made its way into
Example after being brought in by one of the sales people who often plugs his
laptop into untrusted networks, such as hotels and customer environments,
outside the company. With most of the Example’s security monitoring capabilities
deployed in a DMZ and on a network perimeter, the remainder of Example’s
vulnerable corporate assets are largely unguarded and unwatched. Thus, as the
worm wends its way around Example’s enterprise, the company security team is
not even aware of a developing disaster.
Page 1
Soon, network traffic generated by the worm has increased dramatically, as more
machines become infected and start spewing copies of the same worm. When
the infection reaches critical levels and starts to affect the performance of
monitored servers, the security team is notified by a flood of pager alerts… chaos
ensues. While some try installing anti-virus updates other apply firewall blocks
(preventing not only worm scanning, but also the download of updates) and yet
others try to scan for vulnerable machines that contributes to the network-level
denial-of-service.
The financial and technological damage is easy to see. And yet, the recurring
security incident described above shows what happens when companies lack a
central point from which to manage security incidents.
Introduction
In light of this, being prepared for incident response is likely to be one of the most
cost effective security measures the organization takes. Timely and effective
incident response is directly related to decreasing the incident-induced loss to the
organization. It can also help to prevent an expensive and hard-to-repair
reputation damage, which often occurs following the security incident. Several
industry surveys have identified that public company's stock price may plunge
Page 2
several percent as a result of a publicly disclosed incident
(http://www.securityfocus.com/news/11197). Incidents that are known to wreak
catastrophic results upon the organizations may involve malicious hacking, virus
outbreaks, economic espionage, intellectual property theft, network access
abuse, theft of IT resources and other policy violations.
Most of us in the security industry are already familiar with the traditional
challenges we face every day… too much security data to sift through, too many
false alarms to deal with, and not enough budget or resource to handle an ever-
growing number of security incidents. One additional and often overlooked
challenge involves the security management process itself. Largely ignored in
many of today’s IT enterprises, a clearly defined, documented, and repeatable
incident management process defined in an incident response plan is
fundamental to ensuring fast and accurate handling of security incidents.
Even if an explicit incident response plan is lacking, after the incident occurs the
questions such as these might be asked by the company management:
• What to do now?
• How to put it the way it was?
• How to prevent recurrence?
• How we should have prepared?
• Should we try to figure who is responsible?
To build an initial incident resolution management framework one can use SANS
Six Step incident response methodology. This approach was originally developed
for US Department of Energy, adopted elsewhere in the US government and
then popularized by the SANS Institute
(http://www.sans.org/rr/whitepapers/incident/)
1. Preparation
2. Identification
3. Containment
4. Eradication
5. Recovery
6. Follow-Up
Page 3
SANS Six Step Incident Response Methodology
Let’s spend just a moment reviewing a few key features of the SANS Six Step
Incident Response methodology:
The Preparation stage covers everything one should do before handling the first
incident. It involves both technology issues, such as preparing response and
forensics tools, learning the environment, configuring systems for optimal
response and monitoring, as well as business issues -- such as assigning
responsibility, forming a team and establishing escalation procedures.
Additionally, this stage covers the steps necessary to increase a company’s
security posture and thus decrease the likelihood and damage from future
incidents. Security audits, patch management, employee security awareness
program and other security tasks all serve to prepare the organization for incident
action. Building a culture of security and a secure computing environment also
serves as incident preparation.
Page 4
defined above) is of crucial importance. Careful record keeping is very important,
since such documentation will be heavily used at later stages of the response
process. One should record everything that was observed in relation to the
incident, whether online or in the physical environment. During this stage, it is
important that people responsible for incident handling maintain the proper chain
of custody (explained here http://en.wikipedia.org/wiki/Chain_of_custody as
“document or paper trail showing the seizure, custody, control, transfer, analysis,
and disposition of physical and electronic evidence.”). Contrary to popular
opinion, this is important even when the case is never destined to end up in
court. Following established and approved procedures will help the investigation
that is internal to the company.
Containment is what keeps the incident from spreading and thus incurring
higher financial or other loss. During this stage, the incident responders will
intervene and attempt to limit the damage, such as by tightening network or host
access controls, changing system passwords, disabling accounts, etc. While
completing the above steps, one should make every effort to keep all the
potential evidence intact, balancing the needs of system owners and incident
investigators. The backup of affected systems is also essential at this step. This
is done to preserve the system for further investigation as well as remediation.
The important decision on whether to continue operating the affected assets
should be made by the appropriate authorities during this stage.
Eradication is the only stage when the factors leading to the incident are
eliminated or mitigated. Such factors often include system vulnerabilities, unsafe
system configurations, out-of-date protection software or even imperfect physical
access control. Also, the non-technology controls such as building access
policies or key card privileges might be adjusted at this stage. In the case of a
Page 5
hacker-related incident, the affected systems are likely to be restored from the
last clean backup or rebuilt from the operating system vendor media with all
applications reinstalled.
Time is most critical during the eradication stage. The first response should
satisfy several often conflicting criteria, such as accommodating the system
owners requests, preserving evidence, stopping the spread of damage while
complying to all the appropriate organization's policies.
Follow-up steps often need to be distributed to a wider audience than the rest of
the investigation process. Enterprise-wide security knowledge base helps to
address this challenge. It will ensure that IT resource owners will be more
prepared to combat future threats. To optimize the distribution of incident
information, one can use various forms and templates, prepared in advanced for
different types of incidents. Properly sanitized past incident cases should also be
added to an organization-wide security knowledge base, in addition to the
industry security resources and vulnerability knowledge. Such materials can later
be used for training new incident responders as well as broader IT audience. A
summary of suggested actions might also be sent to the senior management.
While people and processes are important, tools is what completes the security
triangle. When the incident is suspected, the response team will need the tools to
verify its status, assess damage that was incurred as well as can be occurred
and then proceed to contain and recover from the incident. This involves a wide
Page 6
range of tools from intrusion detection to forensics and vulnerability
management. Backup tools should also not be overlooked. Tools helpful for
incident management can be organized as such:
Some tools are helpful in more than one of the above category. For example, a
Security Information Management (SIM) solution often holds most of the
evidence from the scene of the information security incident. Incident handling is
a natural SIM product functionality aimed at gathering and organizing security
event data around incidents and also enforcing proper response workflow in
order to facilitate effective and prompt response to security incidents.
Specifically, a SIM can
• Facilitates the effective handling process
• Integrates evidence storage and analysis
• Enforces proper access control to evidence
• Enables team collaboration
• Simplifies resolution monitoring and reporting
• Makes security measurable
Other tools that an incident team needs to be very familiar with include disk
image forensics tools, covering the whole lifecycle from making a forensics copy
of the suspect’s workstation to final evidence presentation to an internal authority
or law enforcement. Those tools do require significant training, especially if used
for cases where court trial is likely.
Page 7
A network helpdesk operator receives calls from several users – all reporting
computer failures and slow network response. Using a newly established
process, a trained team and right tools, an incident case is opened according to
the plan and user complaints from that department are summarized and
presented to all relevant parties, including the security team contact. The affected
machines together with the information on their owners are also added to
corresponding case fields. The operator then assigns the case to the security
event monitoring team, as mandated by his instructions, derived from the incident
plan.
By logging into their SIM and running a report, the analyst has found out that the
triggered rule aims to detect high-severity attacks against the web server, which
Page 8
are preceded by the reconnaissance activity, such as a server version query. The
web server was first probed for its type and version and later attacked by a
known exploit detected by the network intrusion detection system. The company
security monitoring procedure mandated that such be investigated.
Thus, the analyst clicked on the correlated event in the corresponding report and
chose to add it to a new incident case. He then added a note saying that he
received an email notification and started the investigation in accordance with the
security procedure.
After the case was registered by the system, the analyst proceeded to investigate
the related events. He opened the report to view the raw security events that
triggered the correlation. Such events included probes against multiple servers
followed by an attack. He looked at the attack details and found out that the IDS
signature for the exploit matched the server type and the operating system. He
added all the related events to the incident case as well.
Further, he run an query to look for more traces of the same attacker’s IP
address (the source) in the event database. Multiple entries indicative of
scanning, denied connections on the firewall and TCP port 80 attempts across
the enterprise were discovered. The report results were also added to the
incident case.
At that stage it was obvious that a consistent attack was in progress. The note
was added to the case Identification section saying that the incident is confirmed
and several servers might have been impacted.
The analyst then searched all events involving the attacker web server. No
suspicious activity has originated from it. However, since the server was not a
business critical asset, it was possible to take it offline for investigation. This
decision was recorded in the Containment section of the incident case and the
server was taken offline.
The detailed server investigation that followed has not revealed any signs of a
successful compromise. However, the server logs contained evidence of a
multiple failed exploit attempts. The server was also found missing several critical
patches. Their lack was apparently not detected by the attacker. It was decided
to patch the server before the regular maintenance window and to return it
online. It was also decided to increase the logging level on the server. The
respective note was made in the Mitigation section of the incident case and the
above steps were performed.
After the server was returned into operation, the analyst has assigned the case to
the incident manager who had the authority to review the performed steps and to
close the case. The manager added several notes to the follow-up section, which
Page 9
suggested that servers in that subnet be scanned for vulnerabilities more often.
The case was then closed.
While many organizations are on the path towards organizing their incident
response, many pitfalls lay in wait for them on the path to incident management
nirvana. This section summarizes several mistakes that companies make in their
security incident response.
The first mistake is simply not creating an incident response plan before incidents
start happening. Having a plan in place (even a plan that is not well-thought)
makes a world of difference! Such plan should cover all the stages of incident
response process from preparing the infrastructure to first response all the way to
learning the lessons of a successfully resolved incident.
If you have a plan, then after the initial panic phase, ('Oh, my, we are being
hacked!!!') you can quickly move into a set of planned activities, including a
chance to contain the damage and curb the incident losses. Having a checklist to
follow and a roster of people to call is of paramount importance in a stressful
post-incident environment.
To jump-start the planning activity one can use a ready-made methodology, such
as SANS Institute 6-step incident response process, covered above. With a plan
and a methodology your team will soon be battle hardened and ready to respond
to the next virus faster and more efficiently. As a result, you might manage to
contain the damage to your organization.
The second mistake is not deploying increased monitoring and surveillance after
an incident has occurred. This is akin to shooting yourself in the foot during the
incident response. Even though some companies cannot afford 24/7 security
monitoring, there is no excuse for not increasing monitoring after an incident has
occurred.
At the very least, one of the first things to do after an incident is to crank up all
the logging, auditing and monitoring capabilities in the affected network and
systems. This simple act has the potential to make or break the investigation by
providing crucial evidence for identifying the cause of the incident and resolving
it. It often happens that later in the response process, the investigators discover
that some critical piece of log file was rotated away or an existing monitoring
feature was forgotten in an 'off' state. Having plenty of data on what was going
Page 10
on in your IT environment right after the incident will not just make the
investigation easier, it will likely make it successful.
Another side benefit, is that increased logging and monitoring will allow the
investigators to confirm that they indeed have followed the established chain of
custody
The third mistake is often talked about, but rarely avoided. Some experts have
proclaimed that every security incident needs to be investigated as if it will end
up in court. In other words, maintaining forensic quality and following the
established chain of custody needs to be assured during the investigation.
Even if the case looks as if it will not go beyond the suspect's manager or the
human resources department (in the case of an internal offense) or even the
security team itself (in many external hacking and virus incidents), there is
always a chance that it will end up in court. Cases have gone to court after new
evidence was discovered during an investigation, and, what was thought to be a
simple issue of inappropriate Web access became a criminal child pornography
case.
Moreover, while you might not be expecting a legal challenge, the suspect might
sue in retaliation for a disciplinary action against him or her. A seasoned incident
investigator should always consider this possibility.
The fourth mistake is reducing your incident response to "putting it back the way
it was". This often happens if the company is under deadline to restore the
functionality. While this motive is understandable, there is a distinct possibility
that failing to find out why the incident occurred will lead to repeat incidents, on
the same or different systems.
Page 11
but it feels much worse to be hit twice by the same threat and have you defenses
fell in both cases.
The final mistake sounds simple, but it is all too common. It is simply not learning
from mistakes! Creating a great plan for incident response and following it will
take the organization a long way toward securing the company, but what is
equally important is refining your plan after each incident, since the team and the
tools might have changed over time.
Conclusion
While the above cases are simplistic in nature they readily show the need for any
security management system to have not only an incident response plan but also
an integrated incident handling system to ensure complete and effective
response planning deployment. Having a highly efficient plan helps organizations
save money by limiting the impact on core business from security incidents and
increasing the efficiency of existing security infrastructure investments. Overall,
the SANS process allows one to give structure to the otherwise chaotic incident
response workflow. It defines the steps that will then be followed under incident-
induced stress with high precision.
In fact, many of the above steps may be built from the pre-defined procedures.
Following the steps will then be as easy as selecting and sometimes customizing
the procedures for each case at hand. Incident handling workflow will become
more streamlined and the crucial steps will not be missed and documented
properly. Using pre-defined procedures also helps train the incident response
staff on proper actions for each process step. The automated system may be
built to keep track of the response workflow, to suggest proper procedures for
various steps and to securely handle incident evidence. Additionally, such a
Page 12
system will facilitate collaboration between various response team members,
who can share the workload for increased operational efficiency.
What is even more important, monitoring incident resolution activities allows the
organization to implement effective security metrics. It is one thing to count
number of alerts or events flowing from various sensors, but to take security
assessment to the next level one needs to measure the performance of the
whole security process, involving both people (such as security team members
working on the incident cases) and technologies.
Glossary
Page 13
problems. Thus, that limits our discussion to information security incidents, which
cover computer and network security, intellectual property theft and many other
issues.
It is worthwhile to note that the term evidence is used throughout the chapter
indicates any data discovered in the process of incident response.
Page 14