You are on page 1of 92

Inputs, Characters and numbers

More user input


Input Validation (UTF-8 encoding)
 Assume :
 You are asked to write an application that must restrict users
to access just a specific subdirectory A/B/C/.
 In other words , your application, give users
access only to files in directory A/B/C/.
 The system (application) tries to restrict
access by prepending path A/B/C;
Input Validation (UTF-8 encoding)
 How it is supposed to work:
1. User enters filename as README
 full file name constructed as A/B/C/README
2. User enters filename as volume/home/docs/one
 full file name constructed as A/B/C/ volume/home/docs/one
 How It is not supposed to work:
3. Attacker: wants to bypass restriction to reach /etc/passwd
a. Attacker uses ../ a few times to step up to root directory first; e.g. get
password file with input /../../../../etc/passwd.
b. Attacker enters ../../../etc/passwd
c. full file name constructed as A/B/C/../../../etc/passwd
d. equivalent to /etc/passwd
e. !!!!!!
Input Validation (UTF-8 encoding)
 Countermeasure:
– input validation, filter out
– dangerous approach! Why? (UTF-8 encoding),
– does not solve all problems of same type
– Next slide …………………..
 Do not trust your inputs.
Unicode Characters

 Hexadecimal values in C have the prefix 0x;


 Hexadecimal values in a URL have the prefix %;
 UTF-8 encoding of Unicode characters [RFC 2279] on
systems that were designed for ASCII.
A. ASCII characters (U0000-U007F) are represented by
ASCII bytes (0x00-0x7F)
B. Non ASCII characters are represented by sequences of
non-ASCII bytes (0x80-0xF7)
Unicode characters UTF-8
 Standard – Table below gives encoding rules
 the XXX bits are the least significant bits of the binary
representation of the Unicode binary character
 The UTF-8 encoding of the “copy right sign” © :
U00A9 = 1010 1001 ( A = 1010, 9 = 1001)
 Is 11000010 10101001 = 0xC2 0xA9
From to
U000000 U00007F 0XXXXXXX
U000080 U0007FF 110XXXXX 10XXXXXX
U000800 U00FFFF 1110XXXX 10XXXXXX 10XXXXXX
U01000 U10FFFF 11110XXX 10XXXXXX 10XXXXXX 10XXXXXX
Unicode Character UTF-8
 Several ways of encoding same character
 Both UTF-8 and UTF-16 data can contain combining characters.
Combining character support allows a resulting character to be
comprised of more than one character. After the first character, up
to 300 different non-spacing accent characters (umlauts, accents,
etc.) can follow in the data string. The resulting character may
already be defined in the character set. In this case, there are
multiple representations for the same character. For example, in
UTF-16, an é can be represented either by X'00E9' (the normalized
representation) or X'00650301' (the non-normalized combining
character representation).
 only the shortest sequence is valid for any Unicode character
But many UTF decoders ( accept longer sequences)
Rule not always checked
Unicode Characters
 UTF-8 encoding of Unicode characters [RFC 2279]
 Multi-byte UTF-8 formats: a character has more than one
representation
 Example: “/”
format binary hex
 1 byte 0xxx xxxx 0010 1111 2F
 2 byte 110x xxxx 1100 0000 C0
10xx xxxx 1010 1111 AF
 3 byte 1110 xxxx 1110 0000 E0
10xx xxxx 1000 0000 80
10xx xxxx 1010 1111 AF
Exploit “Unicode bug”
 Vulnerability in Microsoft IIS; URL starting with
{IPaddress}/scripts/..%c0%af../winnt/system32/
 Translated to directory C:\winnt\system32
– The /scripts/ directory is usually C:\inetpub\scripts
– Because %c0%af is the 2 byte UTF-8 encoding of /
– ..%c0%af../ becomes ../../
– ../../ steps up two levels in the directory
 IIS did not filter illegal Unicode representations using
multi-byte UTF-8 formats for single byte characters.
Double Decode
 Consider URL starting with
{addr.}/scripts/..%25%32%66../winnt/system32/
 This URL is decoded to {addr.}/scripts/..%2f../winnt/system32/
– Convert %25%32%66 to Unicode: 00100101
00110010 01100110  %2f ( = /)
 If the URL is decoded a second time,
 it gets translated to directory
C:\winnt\system32
Beware of mistranslations (between
levels of abstraction) that change the
meaning of texts.
Unix rlogin
 Unix login command:
– login [[-p] [-h<host>] [[-f]<user>]
– -f option “forces” log in: user is not asked for password
 Unix rlogin command for remote login:
– rlogin [-l<user>] <machine>
– The rlogin daemon sends a login request for <user> to <machine>
 Attack (some versions of Linux, AIX):
– % rlogin -l -froot <machine>
 Results in forced login as root at the designated machine
– % login -froot <machine>
Unix rlogin
 Problem: Composition of two commands.
 Each command on its own is not vulnerable.
 However, rlogin does not check whether the “username”
has special properties when passed to login.
Programming with Integers
 In mathematics integers form an infinite set.
 On a computer systems, integers are represented in binary.
 The representation of an integer is a binary string of fixed
length (precision), so there is only a finite number of
“integers”.
 Programming languages:
– signed & unsigned integers,
– short & long (& long long) integers, …
What will happen here? (32 bit
machine)

int i = 1;
while (i > 0)
{
i = i * 2;
}
results
At the end of the loop the invariant i = 2c-1 holds
(and the loop has been executed c-1 times.)
For 32-bit integers, after the 31st iteration we
have i = 231;
for signed integers this number is negative and
the loop terminates.
For unsigned integers, after one more iteration
we have i = 232  0 and the loop also terminates.
Computing with Integers
 Unsigned 8-bit integers are performed modulo 256.
255 + 1 = 0 16  17 = 16
0 – 1 = 255
 Signed 8-bit integers are represented as 2’s complement.
 Sign bit = 0 number is positive, sign bit =1 number is negative.
127 + 1 = -128 -128/-1 = -1
In mathematics: a + b  a for b  0
As you can see, such obvious “facts” are
no longer true.
Two’s Complement
 Signed integers are usually represented as 2’s complement
numbers.
 Most significant bit (sign bit) indicates the sign of the integer:
– If sign bit is zero, the number is positive.
– If sign bit is one, the number is negative.
 Positive numbers given in normal binary representation.
 Negative numbers are represented as the binary number that
when added to a positive number of the same magnitude
equals zero.
Two’s complement Rule
To find the 8-bit two’s complement of a
positive integer a that is at most 255:
1. Write the 8-bit binary representation for a.
2. Flip the bits (that is, switch all the 1’s to 0’s
and all the 0’s to 1’s).
3. Add 1 in binary notation.
Finding the two’s complement of 19
1. Write the 8-bit binary representation for 19,
2. Switch all the 0’s to 1’s and all the 1’s to 0’s,
3. and add 1.
Checking your results

which is the two’s complement of 19


Finding a number with a given two’s
complement
To find the decimal representation of the
integer with a given 8-bit two’s complement:
1. Find the two’s complement of the given two’s
complement.
2. Write the decimal equivalent of the result.
Find a number given its two’s
complement
What is the decimal representation of the integer
with two’s complement 10101001

flip the bits


101010012 
 01010110
 010101112   64  16  4  2  110  8710
add 1
Integers overflows
Conversion between integer representations :
– SECURITY PROBLEMS
 If(size < sizeof(buf)) :
– compare a signed variable size with the result of sizeof(buf)) which
returns a result of data type site_t; i.e an unsigned integer.
 If size is negative , and if the compiler casts the result of
sizeof(buf)) to a signed integer, the buffer can overflow
INTEGER Truncation
Example - old UNIX vulnerability
program receives UID
checks UID not 0
uid of 0 is root (administrator)
UID later truncated to shorter integer type
 (INPUT)UID of 0x10000 became 0x0000
(root)
Reminder C/C++
 char * strncat ( char * destination, const char * source, size_t num );
 Append characters from string
 Appends the first num characters of source to destination,
plus a terminating null-character.
 char * strncpy ( char * destination, const char * source, size_t num );
 Copy characters from string
 Copies the first num characters of source to destination.
 If the end of the source C string (which is signaled by a null-
character) is found before num characters have been copied,
 then destination is padded with zeros until a total of num characters
have been written to it.
 No null-character is implicitly appended at the end of destination if
source is longer than num.
 Thus, in this case, destination shall not be considered a null
terminated C string (reading it as such would overflow).
 destination and source shall not overlap (see memmove for a safer
alternative when overlapping).
Code Example 2
 OS kernel system-call handler; checks string lengths to
defend against buffer overruns.
len1 < sizeof(buf)
char buf[128]; len2 = 0xffffffff
combine(char *s1, size_t len1,
char *s2, size_t len2)
{
if (len1 + len2 + 1 <= sizeof(buf)) {
strncpy(buf, s1, len1);
strncat(buf, s2, len2); len2 + 1 = 232-1 + 1
} = 0 mod 232
} strncat will be executed

Example from Markus Kuhn’s lecture notes


Array
 You are given an array
starting at memory location
0xBBBB (on a 16-bit
machine) 0xBBBB base
 Array elements are single 48059 0xC445
-15291
words. 0x8000
32768
 Which index do you write to
so that memory location
0x8000 is overwritten?
 You also must check lower
bounds for array indices.
Canonicalization
 Canonicalization: the process that determines how various
equivalent forms of a name are resolved to a single standard
name.
 The single standard name is also known as the canonical
name.
 In general, an issue whenever an object has different but
equivalent representations;
– Example: XML documents
 Canonicalization must be idempotent.
Napster File Filtering
 Napster was ordered by court to block access to certain songs.
 Napster implemented a filter that blocked downloads based
on the name of the song.
 Napster users found a way around by using variations of the
name of songs.
 This is a particularly difficult problem because the users
decide which names are equivalent.
Case-sensitive?
 Security mechanism is case sensitive:
– MYFILE is different from MyFile
 File system is case-insensitive:
– MYFILE is the same as MyFile
 Permissions are defined for one version of the name only:
Myfile.
– Attacker requests access to another version myfile.
– The security mechanism grants the request.
– The file system gives access to the resource that should have been
protected.
 Vulnerability in Apache web server with HFS+
Directory Traversal
 An application may try to keep users in a specific directory.
 Attack: walk out of the directory using ../; attack may try to
hide “..” by using alternative UTF-8 encodings.
 Relative file names: system starts from a list of predefined
directories to look for the file.
 Attack: put malicious code in a directory that is searched
before the directory used by the application being attacked.
 Don’t filter for patterns, filter for results.
CANONICALIZATION
 PERFORM CANONALIZTION BEFORE MAKING ACCESS
CONTROL DECISIONS.
 Canonicalization= resolves the various equivalent names to a
single standard name.
 Do not rely on the names received as user input.
 Convert them correctly to your standard representation
 Do not let the system generate the full pathnames
automatically.
 Preferably, DO not make decisions based on names at the
application level, but use the Operating system access
control.
Memory configuration
stack
 Stack: contains return address, local FFFF
variables and function arguments;
relatively easy to decide in advance
where a particular buffer will be placed
on the stack.
memory
 Heap: dynamically allocated memory;
more difficult but not impossible to
heap
decide in advance where a particular
libraries
buffer will be placed on the heap.
0000
Variables
 Buffer: concrete implementation of a variable.
 If the value assigned to a variable exceeds the size of the
allocated buffer, memory locations not allocated to this
variable are overwritten.
 If the memory location overwritten had been allocated to
some other variable, the value of that other variable is
changed.
 Depending on circumstances, an attacker can change the
value of a sensitive variable A by assigning a deliberately
malformed value to some other variable B.
Buffer Overruns
 Unintentional buffer overruns crash software, and have
been a focus for reliability testing.
 Intentional buffer overruns are a concern if an attacker
can modify security relevant data.
 Attractive targets are return addresses (specify the next
piece of code to be executed) and security settings.
 In languages like C or C++ the programmer allocates and
de-allocates memory.
 Type-safe languages like Java guarantee that memory
management is ‘error-free’.
Buffer Overrun (1980s)
 Login in one version of Digital’s VMS operating system: to log
in to a particular machine, enter
username/DEVICE =<machine>
 The length of the argument ‘machine’ was not checked;
 a device name of more than 132 bytes overwrote the
privilege mask of the process started by login;
 users could thus set their own privileges.
System Stack
 Function call: stack frame containing function arguments,
return address, statically allocated buffers pushed on the
stack.
 When the call returns, execution continues at the return
address specified.
 Stack usually starts at the top of memory and grows
downwards.
 Layout of stack frames is reasonably predictable.
Stack Frame – Layout
argument n
. extended instruction
. pointer (return address)
.
argument 1
saved EIP
saved EBP
extended base pointer
local (reference point for
variables relative addressing)
a.k.a. frame pointer
Stack overrun
Original stack Stack frame after
frame buffer overrun
Input to c Input to c
Input to b Input to b
Input to a Input to a
Return address Bad return address
Save frame ………………………… Value
pointer
Local variable x …………………………..
Assigned
Local variable y Overrun the buffer for y …………………………. to Y
Stack-based Overflows
 Find a buffer on the runtime stack of a privileged program
that can overflow the return address.
 Overwrite the return address with the start address of the
code you want to execute.
 Your code is now privileged too.

return
address write to A: my_address
value2
value1|
value2| value1
buffer for
my_address
variable A
Code Example
 Declare a local short string variable
char buffer[80];
use the standard C library routine call
gets(buffer);
to read a single text line from standard input and save it into
buffer.
 Works fine for normal-length lines, but corrupts the stack if
the input is longer than 79 characters.
 Attacker loads malicious code into buffer and redirects return
address to start of attack code.
Shellcode
 Overwrite return address so that execution jumps to the
attack code (‘shellcode’).
 Where to put the shellcode?
 Shellcode may be put on the stack as part of the malicious
input; a.k.a. argv[]-method.
– To guess the location, guess distance between return address and
address of the input containing the shellcode.
 Details e.g. in Smashing the Stack for Fun and Profit.
 return-to-libc method: attack calls system library; change
to control flow, but no shellcode inserted.
Heap Overruns
 More difficult to determine how to overwrite a specific buffer.
 More difficult to determine which other buffers will be
overwritten in the process; if you are an attacker, you may not
want to crash the system before you have taken over.
 Even attacks that do not succeed all the time are a threat.
 Can overwrite filenames and function pointers, and mess up
memory management.
Memory Allocation
Overwriting Pointers
 Modify return address with buffer overrun on stack.
– Attacker can fairly easily guess the location of this pointer relative to a
vulnerable buffer.
– Defender knows which target to protect.
 More powerful attack: overwrite arbitrary pointer with an
arbitrary value.
 More targets, hence more difficult to defend against.
 Attacker does not even have to overwrite the pointer!
 Attacker can lure the operating system into reading
malformed input and then do the job for the attacker.
Managing Memory in C
 Allocating memory:
– void * malloc (size_t size)
– Returns pointer to newly allocated block of size bytes.
– Contents of the block are not initialized.
– Returns null pointer if block cannot be allocated.
 Deallocating memory:
– void free (void *ptr)
– *ptr must have been returned by a previous call to malloc(),
calloc()or realloc().
– If ptr is null, no operation is performed.
– Behaviour undefined if free(ptr) has already been called.
Memory Organization
 Case study: Doug Lea malloc.
 Memory divided into chunks.
 A chunk contains user data and control data.
 Chunks allocated by malloc contain boundary tags.
 Free chunks are placed in bins; a bin is a double linked lists.
– Several bins, for chunks of different sizes.
 Free chunks contain boundary tags and forward and backward
pointer to their neighbours in the bin.
Allocated and Free Chunks
previously
unused used for
data
user data
size of bk offset 12
chunk fd offset 8
size
size PREV_INUSE offset 4

size of prev_size prev_size


previous allocated free
chunk chunk chunk
Control Flags
 Values for size always given in multiples of 8.
 Three last bits of size may be used for control flags:
– PREV_INUSE 0x1
– IS_MAPPED 0x2
– Some libraries also use the third bit.
 When a chunk is freed it is coalesced into a single chunk with
neighbouring free chunks.
 No adjacent free chunks in memory.
 Technicality: prev_size not used when the previous chunk is
allocated.
Coalescing Chunks
user PREV_INUSE = 0 so
chunk C data merge chunks A and B
allocated size C 0x0 and remove B from bin
prev_size
unused go to next chunk using
bk size as offset and
chunk B fd check PREV_INUSE
free size B 0x1
prev_size
user go to next chunk
chunk A data using size as offset
to be freed size A
prev_size
Double free vulnerability

Details
Double free vulnerability
The memory allocator divides the heap memory
at its disposal into contiguous chunks, which vary
in size as the various allocation routines (malloc,
free, realloc, . . . ) are called.
An invariant is that a free chunk never borders
another free chunk when one of these routines
has completed:
 if two free chunks had bordered, they would
have been coalesced into one larger free chunk.
Double free vulnerability
 These free chunks are kept in sorted doubly linked
lists:
 of the same size (for small chunks)
or of an interval of size (for larger chunks),
 which are accessed via an array (whose elements are
called bins).
 When the memory allocator at a later time requests a
chunk of the same size as one of these free chunks,
 the first chunk of appropriate size will be removed
from the list and will be made available for use in the
program (i.e. it will turn into an allocated chunk).
Double free
 Chunk1 is an allocated chunk containing information
about: the size of the chunk stored before it and its own
size.
 The rest of this chunk is available for the program to
write data in.
 Chunk2 is a free chunk: it is stored in the doubly linked
list of chunks of equal size.
 The information for this list is stored over the first 8 bytes
of what was user data when the chunk was allocated.
 This particular chunk is located between chunk3 in
chunk4 in the list.
Double free
Chunk3 is the first chunk in the chain:
• its backward pointer points to chunk2
• and its forward pointer points to a previous chunk in
the list.
Chunk2 is the next chunk,
• with its forward pointer pointing to chunk3
• and its backward pointer pointing to chunk4.
Chunk4 is the last chunk in our example:
• its backward pointer points to a next chunk in the list
• and its forward pointer points to chunk2.
Double free vulnerability
 The size of allocated chunks is always a multiple of
eight, so the three least significant bits of the size field
are used for management information:
– a bit to indicate if the previous chunk is in use or not
– and one to indicate if the memory is mapped or not.
– The last bit is currently unused.
 The "previous chunk in use"-bit can be modified by an
attacker to force coalescing of chunks.
 The representation of chunk2 is not entirely correct:
 if chunk1 is in use, it will be used to store 'user data'
for chunk1 and not the size of chunk1.
Exploitation- what may happen if an array that is
located in chunk1 is overflowed.
Double free
An attacker has overwritten the management
information of chunk2.
The size fields are left unchanged (although
these could be modified if needed).
The forward pointer has been changed to
point to 12 bytes before the return address
and the backward pointer has been changed
to point to code that will jump over the next
few bytes.
Double free
 When chunk1 is subsequently freed, it will be coalesced
together with chunk2 into a larger chunk.
 As chunk2 will no longer be a separate chunk after the
coalescing it must first be removed from the list of free
chunks.
 The unlink macro takes care of this: internally a free chunk is
represented by a structure containing the following unsigned
long integer fields (in this order):
– prev sizesize,
– fd
– and bk.
Double free vulnerability
chun Lower addresses chun Chu
k1 k2 nk3

Size of previous chunk Size of previous Size of previous


chunk chunk
Size of chunk Size of chunk Size of chunk
Forward ptr Forward ptr Forward ptr
Backward ptr Backward ptr Backward ptr
Old user data Old user data Old user data

Chunk1 is the biggest free chunk


Higher addresses Chunk 2 and 3 have equal size and are smaller
3 chucks of equal size
What if a new chunk of equal size to
chunk 2/chunk3 is freed?
New_chunk will be placed at the beginning of the
list of chunks of the same size.
The backward pointer of chunk 1 will be modified to
point to new_chunk
The forward pointer of chunk_2 will be modified to
point to new_chunk.
The forward pointer of new_chunk will be chunk1
The backward pointer of new_chunk will be chunk2.
A new chuck of equal size arrives
 BK= front of list of the same size chunks
 FD = BK->FD
 The forward pointer of the front list is FD
 New_chunk-> bk = BK;
 the new chunk backward ptr is set to BK ( chunk2)
 New_chunk - > fd = FD;
 The new chunk forward ptr is set to chunk1 (chunk2->fd =
chunk1)
 FD -> bk = BK-fd = new_chunk
 The backward pointer of the forward pointer ( chunk1-bk) will
be set to new chunk.
What if the new chunk is chunk2?
Data and Code
Scripting
 Scripting languages used to construct commands (scripts)
from predefined code fragments and user input.
 Script is then passed to another software component where it
is executed.
 Attacker may hide additional commands in user input.
 Defender has to check and sanitize user inputs.
 Both have to be aware of certain technical details of the
component executing the script:
– Symbols that terminate command parameters.
– Symbols that terminate commands.
– Dangerous commands, e.g. commands for executing the commands
they receive as input ( eval, exec, system, …).
Scripting
 Scripting languages: Perl, PHP, Python, Tcl, Safe-Tcl, JavaScript,

 Example: A CGI script for a Unix server that sends file to
clientaddress:
cat file | mail clientaddress
 With the “mail address” to@me | rm -rf / as input the server
executes
cat file | mail to@me | rm -rf /
 After mailing the file to@me, all files the script has permission
to delete are deleted.
SQL Injection
 Strings in SQL commands placed between single quotes.
 Example query from SQL database:
$sql = "SELECT * FROM client WHERE name= ‘$name’"
 Intention: insert legal user name like ‘Bob’ into query.

 Attack enters as user name: Bob’ OR 1=1 --


 SQL command becomes
SELECT * FROM client WHERE name = Bob’ OR 1=1--
 Because 1=1 is TRUE, name = Bob OR 1=1 is TRUE, and the
entire client database is selected; -- is a comment erasing
anything that would follow.
SQL Injection
 Countermeasures against code injection:
– Input validation: make sure that no unsafe input is used in the
construction of a command.
– Change the modus operandi: modify the way commands are
constructed and executed so that unsafe input can do no harm.
 Parametrized queries with bound parameters (DBI
placeholders in Perl) follow the second approach.
– Scripts compiled with placeholders instead of user input.
– Commands called by transmitting the name of the procedure and the
parameter values.
– During execution, placeholders are replaced by the actual input.
 This defence does not work for parametrized procedures
containing eval() statements that accept user inputs as
arguments.
Race Conditions
Race Conditions
 Multiple computations access shared data in a way that their
results depend on the sequence of accesses.
– Multiple processes accessing the same variable.
– Multiple threads in multi-threaded processes (as in Java servlets).
 An attacker can try to change a value after it has been
checked but before it is being used.
 TOCTTOU (time-to-check-to-time-of use) is a well-known
security issue.
Example – CTSS (1960s)
 Password file shown as message of the day.
 Every user had a unique home directory.
 When a user invoked the editor, a scratch file with fixed
name SCRATCH was created in this directory .
 Innovation: Several users may work concurrently system
manager.
Race Conditions
M-o-D Passwd M-o-D Passwd M-o-D Passwd
hello EsxT9 hello EsxT9 EsxT9 EsxT9

hello EsxT9 EsxT9

User1 User2 User1


edits M-o-D edits passwd saves M-o-D

The abstraction ‘atomic transaction’ has been broken.


Defences
Prevention – Hardware
 Hardware features can stop buffer overflow attacks from
overwrite control information.
 For example, a secure return address stack (SRAS) could
protect the return address.
 Separate register for the return address in Intel’s Itanium
processor.
 With protection mechanisms at the hardware layer there is
no need to rewrite or recompile programs; only some
processor instructions have to be modified.
 Drawback: existing software, e.g. code that uses multi-
threading, may work no longer.
Prevention – Non-executable Stack
 Stops attack code from being executed from the stack.
 Memory management unit configured to disable code
execution on the stack.
 Not trivial to implement if existing O/S routines are
executing code on the stack.
 General issue – backwards compatibility: security measures
may break existing code.
 Attackers may find ways of circumventing this protection
mechanism.
Prevention – Safer Functions
 C is infamous for its unsafe string handling functions:
strcpy, sprintf, gets, …
 Example: strcpy

char *strcpy( char *strDest, const char


*strSource );

– Exception if source or destination buffer are null.


– Undefined if strings are not null-terminated.
– No check whether the destination buffer is large enough.
Prevention – Safer Functions
 Replace unsafe string functions by functions where the
number of bytes/characters to be handled are specified:
strncpy, _snprintf, fgets, …
 Example: strncpy
char *strncpy( char *strDest, const char
*strSource, size_t count );
 You still have to get the byte count right.
– Easy if data structure used only within a function.
– More difficult for shared data structures.
Prevention – Filtering
 Filtering inputs has been a recommended defence several
times before in this course.
 Whitelisting: Specify legal inputs; accept legal inputs,
block anything else.
– Conservative, but if you forget about some specific legal inputs a
legitimate action might be blocked.
 Blacklisting: Specify forbidden inputs; block forbidden
inputs, accept anything else.
– If you forget about some specific dangerous input, attacks may
still get through.
 Taint analysis: Mark inputs from untrusted sources as
tainted, stop execution if a security critical function gets
tainted input; sanitizing functions produce clean output
from tainted input.
Prevention – Filtering
 White lists work well when valid inputs can be characterized
by clear rules, preferably expressed as regular expressions.
 Filtering rules could also refer to the type of an input; e.g., an
is_numeric() check could be applied when an integer is
expected as input.
 Dangerous characters can be sanitized by using safe
encodings.
– E.g., in HTML <, > and & should be encoded as &lt;, &gt;, and
&amp;.
 Escaping places a special symbol, often backslash, in front of
the dangerous character.
– E.g., escaping single quotes will turn d’Hondt into d\’Hondt.
Detection – Canaries
 Detect attempts at overwriting the return address.
 Place a check value (‘canary’) in the memory location just
below the return address.
 Before returning, check that the canary has not been
changed.
 Stackguard: random canaries.
– Alternatives: null canary, terminator canary
 Source code has to be recompiled to insert placing and
checking of the canary.
Canaries

return
address my_address
write to A:
check value value2 ≠ check value
canary value1|
value2| value1
buffer for my_address
variable A
to A attack
detected
Detection – Code Inspection
 Code inspection is tedious: we need automation.
 K. Ashcraft & D. Engler: Using Programmer-Written Compiler
Extensions to Catch Security Holes, IEEE Symposium on
Security &Privacy 2002.
 Meta-compilation for C source code; ‘expert system’
incorporating rules for known issues: untrustworthy sources
 sanitizing checks  trust sinks; raises alarm if
untrustworthy input gets to sink without proper checks.
 Code analysis to learn new design rules: Where is the sink
that belongs to the check we see?
 Microsoft has internal code inspection tools.
Detection – Testing
 White box testing: tester has access to source code.
 Black-box testing when source code is not available.
 You do not need source code to observe how memory is
used or to test how inputs are checked.
 Example: syntax testing of protocols based on formal
interface specification, valid cases, anomalies.
 Applied to SNMP implementations: vulnerabilities in trap
handling and request handling found
http://www.cert.org/advisories/CA-2002-03.html
– Found by Oulu University Secure Programming Group
http://www.ee.oulu.fi/research/ouspg/
Mitigation – Least Privilege
 Limit privileges required to run code; if code running with few
privileges is compromised, the damage is limited.
 Do not give users more access rights than necessary; do not
activate options not needed.
 Example – debug option in Unix sendmail: when switched on
at the destination, mail messages can contain commands that
will be executed on the destination system.
 Useful for system managers but need not be switched on all
the time; exploited by the Internet Worm of 1988.
Lesson Learned
 In the past, software was shipped in open configurations
(generous access permissions, all features activated); user
had to harden the system by removing features and
restricting access rights.
 Today, software often shipped in locked-down
configurations; users have to activate the features they
want to use.
Reaction – Keeping Up-to-date
 Information sources : CERT advisories, BugTraq at
www.securityfocus.com, security bulletins from software
vendors.
 Hacking tools use attack scripts that automatically search for
and exploit known type of vulnerabilities.
 Analysis tools following the same ideas will cover most real
attacks.
 Patching vulnerable systems is not easy: you have to get the
patches to the users and avoid introducing new vulnerabilities
through the patches.
Summary
 Many of the problems listed may look trivial.
 There is no silver bullet:
– Code-inspection: better at catching known problems, may raise
false alarms.
– Black-box testing: better at catching known problems.
– Type safety: guarantees from an abstract (partial) model need not
carry over to the real system.
 Experience in high-level programming languages may be a
disadvantage when writing low level network routines.

You might also like