Checklists, checklists, checklists
When it comes to the worst, any incident response or business continuity plan should be kept as simple as possible. This will make it easier to implement if needed. This kind of simplicity is important: security incidents usually cause a high level of stress which in some cases compromises the IT staff’s capabilities to stay calm and level-headed. The person who needs to put the plan into action also might not be the person who wrote the plan.
Under such circumstances, the interpretation and implementation of a carefully formulated half-page of text becomes very error-prone. There is a best-practice example for this from another industry: aviation. Passenger aircraft come with a folder called the “QRH”, or Quick Reference Handbook. It contains checklists for every eventuality, from powering up the systems to engine failure. The lists are quick and easy to implement, which makes sense: during an incident, time is a commodity that an organization does not always have – it needs to act fast. There cannot be any ambiguity of any sort in the plan and certainly no room for interpretation. Even though compiling those checklists is costly, the cost is lower by several factors than the cost for being unprepared if an incident does happen. And if things go seriously wrong, things need to be done right first time. While IR contractors can also try to and work without any of the above information, they will need more personnel for this. But then, even for them a lack of procedures can also compromise a successful response.
Another factor which is sometimes overlooked: if security guards are present around the perimeter, make sure that they are briefed about the arrival of your IR team. It must be ensured that they have all the access they need by the time they arrive on the premises. This part is especially important outside regular office hours. Having an incident response delayed because of a security guard not being briefed properly or because someone forgot to give them physical access to a server room is not acceptable.
Lines of communication
Establishing clear lines of communication is a vital part of any strategy. This includes communications during an emergency as well as the way an emergency is reported. Many users are reluctant to report "strange things" because they are concerned that they will be reprimanded or ridiculed. The best policy here would be "If you see something, say something", as it is also propagated by the DHS. It is often better to raise the security alarm more often instead of remaining silent in that one case where it would have mattered. Still, an organization must make sure that "alert fatigue" does not set in.
During an incident, there should be designated points of contact where all the information comes together. It is not necessarily a healthy approach to CC 25 people on the initial report. The reason comes down to the simple realities of "spray-and-pray emailing". As soon as a dozen people receive an email, there is a great chance that nothing will happen, because everybody who received the email will assume that "one of the other recipients is surely going to do something about it". Let's make this real: When you are buried up to the neck in work, when was the last time you sat down and read a long-winded email that you were copied on along with 25 other people? Flat hierarchies where everyone talks to everyone else may be good for everyday business, but not necessarily desirable during an incident. The danger of miscommunication or missing important parts is too great. Regarding outside communication: the manner of communication depends on the nature of the incidents. If outward-facing services are unaffected or only suffer a negigible impact, outside communication should not immediately "blurt out" the incident which would otherwise have gone unnoticed.
Useful things to have around when the IR team arrives
Some incident responders have their own, customized "jump bag" for assignments which contains the most important tools they need for their job. However, sometimes they may not have had time to restock their kit between two assignments. Therefore it might be useful to have some of the following handy when they arrive in order to prevent delays.
- some SD card readers and a stack of new, factory-sealed SD cards
- blank CDs
- small network hub
- flashlights & batteries
- assorted network cables in a color that is not normally used in your cabling
- Some way of providing catering for the IR team. This is not mandatory, but it is certainly a nice gesture which can also save time.
- A solid supply of caffeinated beverages (hot and cold)
- scratchpads and pens
Still, it is recommended to have some bits of information ready, in case questions come up. If there is a chance that the case might receive some attention in the press, it is definitely better to have something ready to send to press representatives as opposed to scrambling to find information first, during which time the press makes up their own mind and spread inaccurate or incomplete information. In an all-out scenario where public services are affected, communication should be concise, brief and updated frequently. Waiting for “perfect” or “complete” information may not always be desirable. Depending on the individual case, each hour that passes with no information for customers or subcontractors is an hour during which unfounded rumors can start circulating. On that note, keeping your own staff informed is just as important. They should also be asked not to post anything on any social media websites on the current events, because some details can give outsiders an incorrect picture when taken out of context and compromise accurate communication. Should an outage potentially affect outward facing services, it might be wise to involve other departments such as PR in an incident response strategy so they can communicate.
Log wisely
From our experience in responding to incidents with G DATA Advanced Analytics, one of the most powerful tools in this context is a good set of log files. While firewall logs often exist, sufficiently detailed logs should also be available from all servers, including domain controllers. Recovery can be significantly more efficient, when client logs are available, too. The important question to answer is, which events should be logged and retained. As a rule of thumb, we recommend to retain log files for at least one year. A good logging strategy needs to address the trade-off between the necessary degree of detail and retention costs.
We recommend to retain log files for at least one year.
G DATA Advanced Analytics regularly works with customers to support the decision process and develop an adequate strategy for each environment. Good logs make is easier for responders to look for certain indicators of compromise in them, whether they are file-based or network-based, such as communications with a known malware command & control server. Even though it might seem counter-intuitive at first, one thing that should be logged is failed logon attempts. Any logs should be stored logically separated from the production environment and also be processed separately. A logging server on a production environment can be compromised as well – in case the logs have been tampered with, the logs essentially become worthless. Given the fact that malicious intruders can move around the network for up to several months without being detected, it is also advisable to provide sufficient historical data. This information will help clarifying the course of an incident as well as formulating future mitigation strategies.
Body of Evidence
Depending on the nature of an incident, it would be wise to avoid trying to resolve it "the quick and dirty way" in order to resume normal operations. This sounds counter-intuitive (again), but there are cases where evidence needs to be preserved in order to examine it to clear up the root cause of an incident. Taking a machine that has been infected with malware and just reinstalling it can destroy vital evidence which can be used for developing future strategies. The machine might be back in working order, but you are still none the wiser when it comes to answering the question of how something could occur in the first place. Not only that: it is not sufficient to limit focus on individual computers. Incident response always addresses the infrastructure in its entirety. Without a sufficient understanding of the root cause of an infection, the risk for re-infection increases dramatically - even for non-targeted scenarios like the many cases, where certain industries or sectors thereof have been hit by ransomware.
The character of an attack cannot always determined by looking at the tools an attacker uses. Their modus operandi is at least equally important.
If an attacker is found to be traversing a network laterally, the first impulse is to show him the door and get him out as fast as possible. If evidence of criminal activity is detected, this must be secured in a way that the does not tip the attacker off. You don't know what he might do if he finds out that he was detected - in a worst case, he might pull a "tableflip" and destroy as much as possible on the way out. Within a valuable enough target, it is entirely possible that an attacker actively tries to counteract mitigations attempts, which will render remediation strategies more expensive. Corralling the attacker is often the better approach. Incident responders are trained in procedures of collecting and preserving a body of evidence. Properly secured evidence is especially important if an incident response is followed or accompanied by criminal investigation with a good chance of identifying the perpetrator(s).
Time Equals Money
Having recovery plans, failover plans, well-documented assets and networks etc. will dramatically reduce the time and resources an external IR contractor needs. They might only need to send in one or two people for a day to get back on track with all the required documentation in place. The bad new is: no emergency plan is ever really complete in a sense of "will not require any more action". The plan is always a work in progress and part of a life cycle that also contains execution of IR, lessons learned and then a new plan. This is hard, given the resource constraints in most SMEs, but there is no other alternative if the plan is to be effective.
Gathering information when the incident has already occurred or is still in progress is possible to some extent, but this information might be incomplete and will consume time that could be used more efficiently. This again incurs massive costs, typically several thousand Dollars / Euros per day per person – money which does not need to be spent if the organization is “incident-ready” ahead of time. Outsourcing IR is an option which makes a lot of sense for smaller organizations which do not have the capabilities to train and maintain a permanent IR team. Negotiating a contract with an external Incident Response contractor is a good idea in such cases as this keeps per-incident costs in check. G DATA offers vendor-neutral IR services via G DATA Advanced Analytics - no matter which security solutions are currently in place.
Practice makes perfect
Any plan that is formulated should also be practiced. Practice drills will lay bare potential weak points or bottlenecks in a plan and bolster confidence in it. Make sure that the organization profits from the experience and amends its documentation accordingly. To establish where you are might initially just involve tabletop simulations. Those might then be transferred to a production environment, without anyone actually attacking. Usually, many people are informed that such a drill is scheduled. Therefore, everybody involved is primed, prepared and "ready to roll". Drills often take place after hours when nobody is getting in the way and any unforeseen elements can be taken care of without interfering with daily business. The final challenge would be to commission a Red-Teaming exercise, where an actual attack is performed. This will hone an organization’s skills in dealing with a real scenario and permanently improve its security posture.
It is important to bear the following in mind, though: Very few people should be aware that what happens is actually red-team drill. And even though it might feel very real, even a red-teaming exercise is not fully representative of real-world conditions. For legal reasons, even members of a red team cannot pull out all the stops and operate just as a real attacker would. They need to follow some very clearly defined rules and operate within a tight legal framework, whereas a real attacker seldom has to follow any of those rules.
Everybody is familiar with Murphy's law and this law dictates that a real emergency usually strikes at the least convenient time. This might be the time when key people are not available on short notice - the primary contact might be on holiday, the second-in-command just called in sick and the third point of contact is currently busy performing maintenance work that cannot be interrupted or postponed. Therefore, a good contingency plan which takes those eventualities into account is absolutely essential. This plan should also include ways to override the usual "chain of command" in case any link in that is missing for any reason. Triggering the emergency plan is something that any member of the IT department should have the authority to do at any time without having to fear reprimands (except in cases of intentional misuse, obviously), especially if the alternative is not to act at all.
Post-incident Debriefing
In case a security incident has occurred and was addressed, it is always important to talk about it with all involved parties once the dust has settled. It should always be done, even if everything worked to perfection. This opportunity can be used to pat people on their backs for a job well done and to point out what went particularly well. It can and should also be used to evaluate things that can be handled differently in the future. If any weak points in the plan have surfaced either in a practice run or a real incident, this is the time to address those and make changes to the plan which avoid future shortcomings. The key here is to learn from the experience. If everyone just goes back to business as soon as the incident is over, there is a high risk that nothing is learned and that the same mistakes are made again at one point in the future.
8 tips to leverage your Incident Response strategy
- Make your plan as easy to implement as possible
- Give the authority to initialize emergency procedures to all of your IT staff
- Have contingency plans
- Implement a good logging strategy
- Always have "the big picture" in mind: even a seemingly minor incident might hint at something
- Streamline emergency communication channels
- Practice any plan you make
- Learn lessons from every incident and improve your strategy