You are on page 1of 72

Towards High Performance Network

Defense
Zhichun Li
EECS Department
Northwestern University

Motivation
Attackers
Botnets

Professional attackers exploit


networks for profit $$$ Worms
2

Network Level Defense


Network gateways/routers are the vantage
points for detecting large scale attacks
Only host based detection/prevention is not
enough
Some users do not apply the host-based schemes
due to the reliability, overhead, and conflicts
Many users do not update or patch their system on
time
E.g., Conficker worm in the end of 2008 infected 9~15
millions of hosts
Cannot only reply on end users for security protection
3

Challenges
Scalable to high speed networks with a
large number of users
Highly accurate
Adapt fast to the emerging threats
Have good attack coverage

Network-based Intrusion Detection,


Prevention, and Forensics System
Framework
Accuracy &
Scalability &
Coverage

Packet
streams

(I) Sketch
based monitoring
& detection

Scalability

Accuracy &
adapt fast

(III) Signature (II) Polymorphic


matching
worm signature
engines
generation
(IV) Network
situational
awareness

Accuracy &
adapt fast

High-speed Network Monitoring


and Anomaly Detection
Online traffic monitoring and recording
[SIGCOMM IMC 2004, INFOCOM 2006, ToN 2007] [INFOCOM 2008]

Reversible sketch for data streaming computation


Record millions of flows (GB traffic) in a few hundred KB
Small # of memory access per packet
Scalable to large key space size (232 or 264)

Online sketch-based flow-level anomaly detection


[IEEE ICDCS 2006] [Journal of Computer Networks 2010] [IEEE CG&A, Security
Visualization 2006]

Online stealthy botnet scan detection

[IEEE IWQoS 2007]

0 1

h1(k)

K-1

hj(k)
hH(k)
6

Network and Distributed System


Diagnosis
Overlay network monitoring and diagnosis
[SIGCOMM IMC 2003, SIGCOMM 2004, ToN 2007]
[SIGCOMM 2006]

End-user network diagnosis [INFOCOM 2007 (2)]


Internet-scale Virtual Private Network (VPN)
and backbone monitoring and diagnosis
[INFOCOM 2009]

Internet-scale Data Center and dist system


profiling and diagnosis [NSDI 2010]
7

Polymorphic Worm Signature Generation


Exploit invariant signature generation [IEEE Symposium
on Security and Privacy 2006] (cited by ~100, code and
test cases release to Columbia U., UT Austin, Purdue,
Georgia Tech, UC Davis, etc)
Vulnerability signature generation [IEEE ICNP 2007, ToN
2010]
[NSF CyberTrust 06 Award]
1010101

Internet

Network
gateway

10111101

11111100

Our network

00010111

Online Protocol Parsing and


Signature Matching
NetShield vulnerability signature based
NIDS/NIPS [NSF CyberTrust 08 Award] [under
submission] [patent filed]
Interested by Cisco (IPS ruleset & site visit)
Code release has been used by researchers in
University of Toronto

Using failure information to detect


enterprise zombies [SecureCom09]
Spamming botnet detection [NSDI09]
9

Network Situational Awareness


Large-scale botnet and P2P misconfiguration
event situational-aware forensics
Botnet attack target/strategy inference [ASIACCS09]
Root cause analysis of the P2P
misconfiguration/poisoning traffic [INFOCOM10]
Analysis of 2TB data across 4 years over 5 /8 IPs

10

Current Work
Data center management and
configuration
Internet emergency response
AS topology study [CoNEXT09]
Recovery via IXP [Infocom10]

Network based web dynamic vulnerability


defense
Social network security
11

NetShield: Matching a Large


Vulnerability Signature Ruleset
for High Performance Network
Defense

12

Outline

Motivation
High Speed Matching for Large Rulesets
High Speed Parsing
Evaluation
Research Contributions

13

NetShield Overview
NIDS/NIPS (Network Intrusion
Detection/Prevention System) operation
Signature
DB

Packets

NIDS/NIPS

Security
alerts

Accuracy
Speed
Attack Coverage

14

State Of The Art


Regular expression (regex) based approaches
Used by: Cisco IPS, Juniper IPS, open source Bro
Example: .*Abc.*\x90+de[^\r\n]{30}

Pros
Can efficiently match multiple sigs simultaneously,
through DFA
Can describe the syntactic context

15

Cons of Regex
Limited expressive power, cannot describe
semantic context, thus inaccurate
Theoretical prospective
Regex

Protocol Context
Context
Sensitive
grammar
Free

Practical prospective
HTTP chunk encoding
DNS label pointers

State Of The Art


Vulnerability Signature [Wang et al. 04]
Blaster Worm (WINRPC) Example:
Vulnerability: design flaws enable the bad
BIND:
inputs lead&&
therpc_vers_minor==1
program to a bad &&
state
rpc_vers==5
packed_drep==\x10\x00\x00\x00
Good
&& context[0].abstract_syntax.uuid=UUID_RemoteActivation state
BIND-ACK:
Bad input
rpc_vers==5
&& rpc_vers_minor==1
CALL:
rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00
Bad
Vulnerability
&& opnum==0x00 && stub.RemoteActivationBody.actual_length>=40
state
Signature
&& matchRE(stub.buffer, /^\x5c\x00\x5c\x00/)

Pros

Directly describe
semantic context
Very expressive, can
express the vulnerability
condition exactly
Accurate

Cons

Slow!
Existing approaches all
use sequential matching
Require protocol parsing

17

Motivation of NetShield

18

Motivation
Desired Features for Signature-based
NIDS/NIPS
Accuracy (especially for IPS)
Speed
Cannot capture
vulnerability Coverage: Large ruleset
condition well!

Regular
Expression

Vulnerability

Accuracy

Relative
Poor

Much Better

Speed

Good

??

Memory

OK

??

Coverage

Good

??

Shield
[sigcomm04]

Focus of
this work
19

Research Challenges and Solutions


Challenges
Matching thousands of vulnerability
signatures simultaneously
Sequential matching match multiple sigs.
simultaneously

High speed protocol parsing

Solutions
An efficient algorithm which matches multiple
sigs simultaneously
A tailored parsing design for high-speed
20
signature matching

Background
Vulnerability signature basic
Use protocol semantics to express vulnerabilities
Defined on a sequence of PDUs & one predicate for
Blastereach
WormPDU
(WINRPC) Example:
BIND:
Example: ver==1 && method==put && len(buf)>300

rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00


Data
representations
&&
context[0].abstract_syntax.uuid=UUID_RemoteActivation
BIND-ACK:
For all the vulnerability signatures we studied, we only
rpc_vers==5 && rpc_vers_minor==1
CALL: need numbers and strings
rpc_vers==5
&& rpc_vers_minors==1
&&<,packed_drep==\x10\x00\x00\x00
number
operators: ==, >,
>=, <=
&& opnum==0x00 && stub.RemoteActivationBody.actual_length>=40
String operators:
==, match_re(.,.), len(.).
&& matchRE(stub.buffer,
/^\x5c\x00\x5c\x00/)
21

Outline

Motivation
High Speed Matching for Large Rulesets
High Speed Parsing
Evaluation
Research Contributions

22

Matching Problem Formulation


Suppose we have n signatures, defined on k
matching dimensions (matchers)
A matcher is a two-tuple (field, operation) or a fourtuple for the associative array elements
Translate the n signatures to a n by k table
This translation unlocks the potential of matching
multiple signatures simultaneously
Rule 4: URI.Filename=fp40reg.dll && len(Headers[host])>300
RuleID Method == Filename == Header == LEN
1

DELETE

POST

Header.php

awstats.pl

fp40reg.dll

name==host; len(value)>300

name==User-Agent; len(value)>544
23

Matching Problem Formulation


Challenges for Single PDU matching
problem (SPM)
Large number of signatures n
Large number of matchers k
Large number of dont cares
Cannot reorder matchers arbitrarily -buffering constraint
Field dependency
Arrays, associative arrays
Mutually exclusive fields.

24

Difficulty of the SPM


Bad News
A well-known computational geometric problem
can be reduced to this problem.
And that problem has bad worst case bound
O((log N)K-1) time or O(NK) space (worst case
ruleset)

Good News
Measurement study on Snort and Cisco ruleset
The real-world rulesets are good: the
matchers are selective.
With our design O(K)
25

Matching Algorithms
Candidate Selection Algorithm
1.Pre-computation decides the rule order and
Integer range checking
matcher order
balanced binary search tree
String Match
exact matching
Trie
2.Decomposition.
each matcher
Regex DFA (XFA)
separately and iteratively combine the
results efficiently

26

Step 1: Pre-Computation
Optimize the matcher order based on buffering
constraint & field arrival order
Rule reorder:
1
Require
Matcher 1

Require
Matcher 1
Require
Matcher 2

Dont care
Matcher 1

Dont care
Matcher 1
&2
27

Step 2: Iterative Matching


PDU={Method=POST, Filename=fp40reg.dll,
Header: name=host, len(value)=450}

S1={2} Candidates after match Column 1 (method==)


S2= S1 A2 +B2 ={2} {}+{4}={}+{4}={4}
S3=S2 A3+B3 ={4} {4}+{}={4}+{}={4}

Si Ai 1

RuleID Method == Filename


== Header == LEN
Dont care

R1
R2
R3

1
2

DELETE

Header.php

awstats.pl

fp40reg.dll

SiPOST

* matcher i+1 *

Si Ai 1

require
In Ai+1 len(value)>300
name==host;
matcher i+1

name==User-Agent; len(value)>544
28

Complexity Analysis
Three HTTP traces:
avg(|Si|)<0.04
Merging complexity
Two WINRPC
Need k-1 merging iterations
traces: avg(|Si|)<1.5
For each iteration
Merge complexity O(n) the worst case, since Si can
have O(n) candidates in the worst case rulesets
For real-world rulesets, # of candidates is a small
constant. Therefore, O(1)

For real-world rulesets: O(k) which is the


optimal we can get
29

Refinement and Extension


SPM improvement
Allow negative conditions
Handle array cases
Handle associative array cases
Handle mutual exclusive cases

Extend to Multiple PDU Matching (MPM)


Allow checkpoints.

30

Outline

Motivation
High Speed Matching for Large Rulesets.
High Speed Parsing
Evaluation
Research Contribution

31

High Speed Parsing


General V.S. Special Purpose
Keep the whole parse
Parsing and matching
V.S. on the fly
tree in memory
Parse all the nodes
in the tree

Only signature related


V.S. fields (leaf nodes)

Design a parsing state machine


Build an automated parsing state machine
generator

Outline

Motivation
High Speed Matching for Large Rulesets.
High Speed Parsing
Evaluation
Research Contributions

33

Evaluation Methodology
Fully implemented prototype
12,000 lines of C++ and
3,000 lines of Python
Release at:

www.nshield.org
Deployed at a university DC
with up to 106Mbps

26GB+ Traces from Tsinghua Univ. (TH), Northwestern (NU)


and DARPA
Run on a P4 3.8Ghz single core PC w/ 4GB memory
After TCP reassembly and preload the PDUs in memory
For HTTP we have 794 vulnerability signatures which cover
973 Snort rules.
For WINRPC we have 45 vulnerability signatures which cover
34
3,519 Snort rules

Parsing Results
Trace

TH
DNS

TH
NU
TH
WINRPC WINRPC HTTP

NU
HTTP

DARPA
HTTP

Avg flow len (B)

77

879

596

6.6K 55K 2.1K

Throughput
(Gbps)
Binpac
Our parser

0.31
3.43

1.41
16.2

1.11
12.9

2.10 14.2 1.69


7.46 44.4 6.67

Speed up ratio

11.2

11.5

11.6

3.6

3.1

3.9

Max. memory per


connection
(bytes)

15

15

15

14

14

14

35

Matching Results
8-core 11.0
Trace

TH
NU
TH
WINRPC WINRPC HTTP

NU
HTTP

DARPA
HTTP

Avg flow length (B)

879

596

6.6K

55K

2.1K

10.68
14.37

9.23
10.61

0.34
2.63

2.37 0.28
17.63 1.85

Matching only time


speed up ratio

1.8

11.3

11.7

Avg # of Candidates

1.16

1.48

0.033 0.038 0.0023

Max. memory per


connection (bytes)

27

27

20

Throughput (Gbps)
Sequential
CS Matching

20

8.8

20
36

Scalability and Accuracy Results


Rule scaling results

Throughput (Gbps)
0
1
2
3

Performance
decrease
gracefully

200
400
600
# of rules used

800

Accuracy
Create two polymorphic
WINRPC exploits which
bypass the original Snort
rules but detect
accurately by our
scheme.
For 10-minute clean
HTTP trace, Snort
reported 42 alerts,
NetShield reported 0
alerts. Manually verify
the 42 alerts are false
positives
37

Research Contribution
Make vulnerability signature a practical solution
for NIDS/NIPS
Regular Expression Exists Vul. IDS

NetShield

Accuracy

Poor

Good

Good

Speed

Good

Poor

Good

Memory

Good

??

Good

Coverage

Good

??

Good

Multiple sig. matching candidate


selection algorithm
Parsing parsing state machine

Build a better Snort alternative!

38

Future work
Client

Server
Network Security
Data Center Security

Web/WebSecurity

WebPropeht[NSDI10]
WebShield

Social network security


39

Q&A
Thanks!
40

Observations
PDU parse tree
Leaf nodes are
numbers or strings

PDU

array

General V.S. Special Purpose


Keep the whole parse
Parsing and matching
V.S. on the fly
tree in memory
Parse all the nodes
in the tree

Only signature related


V.S. fields (leaf nodes)
41

Efficient Parsing with State Machines


Studied eight protocols: HTTP, FTP, SMTP,
eMule, BitTorrent, WINRPC, SNMP and DNS
as well as their vulnerability signatures
Common relationships among leaf nodes
Automated parsing state machine
generator: UltraPAC

Pre-construct parsing state machines based on


parse trees and vulnerability signatures
42

Example for WINRPC


Rectangles are states
Parsing variables: R0 .. R4
0.61 instruction/byte for BIND PDU

43

Experiences
Working in process
In collaboration with MSR, apply the semantic
rich analysis for cloud Web service profiling.
To understand why slow and how to improve.

Interdisciplinary research
Student mentoring (three undergraduates,
six junior graduates)

44

Future Work
Near term
Web security (browser security, web server security)
Data center security
High speed network intrusion prevention system with
hardware support

Long term research interests


Combating professional profit-driven attackers will be
a continuous arm race
Online applications (including Web 2.0 applications)
become more complex and vulnerable.
Network speed keeps increasing, which demands
highly scalable approaches.
45

Research Contributions
Demonstrate vulnerability signatures can be
applied to NIDS/NIPS, which can significantly
improve the accuracy of current NIDS/NIPS
Propose the candidate selection algorithm for
matching a large number of vulnerability
signatures efficiently
Propose parsing state machine for fast
protocol parsing
Implement the NetShield
46

Comparing With Regex

Memory for 973 Snort rules: DFA


5.29GB (XFA 863 rules1.08MB),
NetShield 2.3MB
Per flow memory: XFA 36 bytes,
NetShield 20 bytes.
Throughput: XFA 756Mbps, NetShield
1.9+Gbps
(*XFA [SIGCOMM08][Oakland08])
47

Measure Snort Rules


Semi-manually classify the rules.
1. Group by CVE-ID
2. Manually look at each vulnerability

Results
86.7% of rules can be improved by protocol semantic
vulnerability signatures.
Most of remaining rules (9.9%) are web DHTML and
scripts related which are not suitable for signature
based approach.
On average 4.5 Snort rules are reduced to one
vulnerability signature.
For binary protocol the reduction ratio is much higher
than that of text based ones.
For netbios.rules the ratio is 67.6.

48

Matcher order

Si 1 Si Ai 1 Bi 1
Reduce Si+1 Enlarge Si+1
Merging Overhead |Si| (use hash table to calculate
in Ai+1, O(1))

| Ai 1 Bi 1 |fixed, put the matcher later, reduce Bi+1


49

Matcher order optimization


Worth buffering only if estmaxB(Mj)<=MaxB
For Mi in AllMatchers
Try to clear all the Mj in the buffer which
estmaxB(Mj)<=MaxB
Buffer Mi if (estmaxB(Mi)>MaxB)
When len(Buf)>Buflen, remove the Mj with
minimum estmaxB(Mj)

50

51

Backup Slides

52

Motivation
Network security has been recognized as
the single most important attribute of their
networks, according to survey to 395
senior executives conducted by AT&T
Many new emerging threats make the
situation even worse

53

Candidate merge operation


Si Ai 1
Dont care
matcher i+1

Si

Si Ai 1
require
matcher i+1

In Ai+1

54

A Vulnerability Signature Example


Data representations
For all the vulnerability signatures we studied, we
only need numbers and strings
number operators: ==, >, <, >=, <=
String operators: ==, match_re(.,.), len(.).

Example signature for Blaster worm


Example:
BIND:
rpc_vers==5 && rpc_vers_minor==1 && packed_drep==\x10\x00\x00\x00
&& context[0].abstract_syntax.uuid=UUID_RemoteActivation
BIND-ACK:
rpc_vers==5 && rpc_vers_minor==1
CALL:
rpc_vers==5 && rpc_vers_minors==1 && packed_drep==\x10\x00\x00\x00
&& stub.RemoteActivationBody.actual_length>=40 && matchRE(
55
stub.buffer, /^\x5c\x00\x5c\x00/)

System Framework
Scalability
Accuracy &
Scalability &
Coverage

Accuracy &
adapt fast

Accuracy &
adapt56fast

Example of Vulnerability Signatures


At least 75%
vulnerabilities are due to
buffer overflow
Sample vulnerability
signature
Field length corresponding
to vulnerable buffer >
certain threshold
Intrinsic to buffer overflow
vulnerability and hard to
evade

Overflow!
Protocol message

Vulnerable
buffer

57

Old Slides

58

Conclusions
A novel network-based vulnerability
signature matching engine
Through measurement study on Snort ruleset,
prove the vulnerability signature can improve
most of the signatures in NIDS/IPS.
Proposed parsing state machine for fast
parsing
Propose a candidate selection algorithm for
matching a large number of vulnerability
signature simultaneously
59

Outline

Motivation
Feasibility Study: a measurement approach
Problem Statement
High Speed Parsing
High Speed Matching for massive
vulnerability Signatures.
Evaluation
Conclusions
61

Outline

Motivation
Feasibility Study: a measurement approach
Problem Statement
High Speed Parsing
High Speed Matching for massive
vulnerability Signatures.
Evaluation
Conclusions
62

Outline

Motivation
Feasibility Study: a measurement approach
Problem Statement
High Speed Parsing
High Speed Matching for a large number of
vulnerability Signatures.
Evaluation
Conclusions
63

Outline

Motivation
Feasibility Study: a measurement approach
Problem Statement
High Speed Parsing
High Speed Matching for massive
vulnerability Signatures.
Evaluation
Conclusions
64

Limitations of Regular Expression


Signatures
Signature: 10.*01
1010101

10111101

Internet

Traffic
Filtering
X
X

11111100

Our network

00010111

Polymorphism!
Polymorphic attack (worm/botnet)
might not have exact regular
expression based signature
65

What we do?
Build a NIDS/NIPS with much better accuracy
and similar speed comparing with Regular
Expression based approaches
Feasibility: Snort ruleset (6,735 signatures) 86.7%
can be improved by vulnerability signatures.
High speed Parsing: 2.7~12 Gbps
High speed Matching:
Efficient Algorithm for matching massive vulnerability rules
HTTP, 791 vulnerability signatures at ~1Gbps

66

Problem Formulation
Parsing problem formulation
Given a PDU and the protocol specification as
input, output the set of fields which required
by matching.

67

Publications

Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and
Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms,
in the Proc. of IEEE ICNP 2007.
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons,
Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible
sketches: Enabling monitoring and analysis over high speed data streams, in
the IEEE/ACM Transaction on Networking, Volume 15, Issue 5, Oct, 2007
Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao,
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with
Provable Attack Resilience, in Proc. of IEEE Symposium on Security and
Privacy, 2006
Zhichun Li, Yan Chen and Aaron Beach, Towards Scalable and Robust
Distributed Intrusion Alert Fusion with Good Load Balacing, in Proc. of ACM
SIGCOMM LSAD 2006
Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion
Detection Approach for High-speed Networks, In Proc. Of IEEE ICDCS 2006
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons,
Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing
for High-speed Network Monitoring: Algorithms, Evaluations, and Applications,
in the Proc. Of IEEE INFOCOM 2006
68

Current Status

Part I: Sketch based monitoring & detection


Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin
Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches:
Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM
Transaction on Networking, Volume 15, Issue 5, Oct, 2007
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin
Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for
High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the
Proc. Of IEEE INFOCOM 2006 (252/1400=18%)
Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion Detection
Approach for High-speed Networks, In Proc. Of IEEE International Conference on
Distributed Computing Systems (ICDCS) 2006 (75/536=14%)
(Alphabetical order)

Part II: Polymorphic worm signature generation


TOSG: Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao,
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable
Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006
(23/251=9%)
LESG: Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and
Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in
the Proc. of IEEE International Conference on Network Protocols (ICNP) 2007
(32/220=14%)
69

Current Status
Part III: Signature matching engines
Work in progress, will be focus of this talk
Zhichun Li, Gao Xia, Yi Tang, Jian Chen, Ying He, Yan Chen
and Bin Liu, NetShield : Towards High Performance Networkbased Semantic Signature Matching, in submission

Part IV: Network Situational Awareness


Work in process
Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson, Towards
Situational Awareness of Large-Scale Botnet Events using
Honeynets, in preparation
Zhichun Li, Anup Goyal, Yan Chen and Aleksandar
Kuzmanovic, P2P Doctor: Measurement and Diagnosis of
Misconfigured Peer-to-Peer Traffic, in submission
70

Current Status
Part I: Sketch based monitoring & detection
Result in [Infocom06,ToN,ICDCS06]

Part II: Polymorphic worm signature generation


Result in [Oakland06,ICNP07]

Part III: Signature matching engines


Work in progress, will be focus of this talk

Part IV: Network Situational Awareness


Work in process

71

Limitations of Exploit Based Signature


Signature: 10.*01
1010101

10111101

Internet

Traffic
Filtering
X
X

11111100

Our network

00010111

Polymorphism!
Polymorphic worm might not have
exact exploit based signature
72

Vulnerability Signature
Internet

Vulnerability
signature traffic
filtering
X
X

Our network

X
X

Vulnerability

Work for polymorphic worms


Work for all the worms which target the
same vulnerability
73

You might also like