Chief Security Architect
Simply stated firewall rules allow computers to send traffic to, or receive traffic from, programs, system services, computers, or users securely. Whether you have five or 500 firewalls, you need to understand the risk in real time if you want to stay ahead of the game. But with complex rule configurations, routers, etc. to continuously monitor and maintain, it can be hard to identify which are running smoothly, which are smouldering and which are seconds away from meltdown. By focusing your efforts on the right firewall at the right time you can mitigate every risk before it becomes a problem. So, how do you know which one that is? Intelligent network security metrics hold the secret.
What Are Intelligent Network Security Metrics?
Ideal metrics don’t just make you look good, they provide some indication of what is wrong or out of control as soon as, or even before, problems arise so that you can correct it and improve your operational efficiency. By combining a myriad of metric results you can generate a ‘risk score’ that provides instant visibility into the security and compliance posture of all your firewalls, enhanced workflow automation and updated PCI-DSS reporting. Sounds good doesn’t it – so where do you start?
The first step is to examine your rules, routers and firewalls to identify which are most susceptible to risks, which do you rely on the most, and which do you need the most from – this will help identify and prioritise where to focus your resources.
A standard firewall metric that will probably spring to mind is ‘availability’. It tells you about the performance of the box - for example 99.9% up. However, in my opinion, this is a relatively useless measurement as, although it looks like everything’s fine with only 0.1% downtime, it doesn’t tell you what went wrong, how to fix it, or how to improve performance and avoid it happening again, it simply states the obvious - that valuable uptime was missed.
That’s not to say that all basic metrics aren’t valuable. Some standard baseline performance metrics that deliver exceptionally useful data, and every firewall team should be tracking, are CPU utilisation, memory utilisation, traffic passed, traffic dropped, and simultaneous connections. These are all dimensions that are important when examining your firewall’s current performance and whether it behaved like this previously – yesterday, last week or last month, to determine if there’s a significant change warranting further investigation. These are also key components for a capacity planning exercise to pinpoint if a firewall is overloaded. However, before upgrading your hardware, it is worth checking whether the firewall configuration can be optimised as there may be underutilised capacity elsewhere.
A sophisticated metric for tracking firewall performance is to use an external testing product that streams traffic through the firewall to a collector, and records the speed and jitter of the firewall and network influence on this packet stream. This live bandwidth monitoring can be an important part of understanding if a firewall is cleanly passing VoIP and Video Conferencing traffic.
Change Management Monitoring
Nothing stays the same for long and, as your IT environment alters so to does your firewall - you need to change, create, disable, or even delete rules. Change can impact availability, either positively or negatively, and as this is one of the main things a firewall must provide, metrics that provide meaningful data that can be acted upon are invaluable.
Configuration updates happen in a number of ways:
- unplanned or emergency (out of cycle) changes
- changes with no authorisation sometimes referred to as ‘cowboy changes’ – someone logs in, makes a change and doesn’t have any documentation either before or afterwards or what was done
Firewalls do not have a change management process built into them, so documenting changes has never become a best (or even a standard) practice for many organisations. If a firewall administrator makes a change because of an emergency or some other form of business disruption, chances are he is under pressure to make it happen as quickly as possible, and process goes out the window. But what if this change cancels out a prior policy change, resulting in downtime? By monitoring the number of planned versus unplanned changes you can determine how well the team are pre-empting the users requirements and proactively managing the firewalls versus ‘seat of the pants’ updates. A great metric is the percentage of changes resulting in outages as this provides feedback on how well the operational team understand the changes they’re making and the impact of them, and if they’re using some method or tool to verify them before they’re made.
Another really useful metric, although rarely tracked, is the MTTR (mean time to recovery), in other words how fast did the team restore service for each of your outages, and is a good gauge of your team’s familiarity and understanding of the firewalls configuration and whether it’s improving or diminishing. It could also be an indicator that everything is getting complex or unruly. It has been estimated by numerous industry watchers that 80% of all outages are caused by configuration adjustments and that a further 80% of the MTTR is spent identifying what changed therefore it stands to reason that if the team understands exactly what happened they should be able to isolate the failure point within a minute and restore service in less than five. Ultimately the goal is to eliminate downtime in the first place.
Risk and Compliance Monitoring
As corporate policies evolve and compliance standards change, you need to review how you are enforcing traffic on the firewalls and make changes. Hackers like the fact that firewall teams never remove rules and this is how many compromises occur. Metrics can be used to check how often a rule is applied and clean up all those that are redundant, which have been replaced by new rules, rules for services no longer used that you were not informed about, and all those temporary exceptions that were added to get projects, acquisitions, mergers and so on finished.
Other useful metrics for risk and compliance monitoring are the ones that can easily be seen trending towards zero or 100%. Examples include the number of shadow rules – ones that are blocked by another rule; the percentage of unused and therefore wasted rules; and the percentage of rules that actually violate company policy. By continuously monitoring these metrics over time you can see how effective the team is which goes back to the old adage that a good metric is one that tells you something meaningful. If you’re score is getting better then you know you’re doing something right.
Good metrics are transferable across industry and companies – they enable you to make changes that make a difference. Combining these various test results provides each firewall gateway with a security score that provides a comprehensive, cross vendor, organisational grade. This provides a clear understanding of the nature and level of overall network security risk and granular, actionable data needed to manage it accordingly. Although be warned, I have seen instances of teams who, having discovered the ‘satisfaction’ of numbers moving in the right direction, became fixated on the wrong goal - if you’re concentrating your efforts on making the metrics look good then you’re not focusing on making strong rules.
Chief Security Architect
As Chief Security Architect, Hamelin identifies and champions the security standards and processes for Tufin. Bringing more than 16 years of security domain expertise to Tufin, Hamelin has deep hands-on technical knowledge in security architecture, penetration testing, intrusion detection, and anomalous detection of rogue traffic. He has authored numerous courses in information security and worked as a consultant, security analyst, forensics lead, and security practice manager. He is also a featured security speaker around the world widely regarded as a leading technical thinker in information security.
Hamelin previously held technical leadership positions at VeriSign, Cox Communications, and Resilience. Prior to joining Tufin he was the Principal Network and Security Architect for ChoicePoint, a LexisNexis Company. Hamelin received Bachelor of Science degrees in Chemistry and Physics from Norwich University, and did his graduate work at Texas A&M University.