Professional Documents
Culture Documents
Module 1, Strings
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees.
2011 Carnegie Mellon University This material is distributed by the SEI only to course attendees for their own individual study. Except for the U.S. government purposes described below, this material SHALL NOT be reproduced or used in any other manner without requesting formal permission from the Software Engineering Institute at permission@sei.cmu.edu. This material was created in the performance of Federal Government Contract Number FA872105-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The U.S. Government's rights to use, modify, reproduce, release, perform, display, or disclose this material are restricted by the Rights in Technical Data-Noncommercial Items clauses (DFAR 252-227.7013 and DFAR 252-227.7013 Alternate I) contained in the above identified contract. Any reproduction of this material or portions thereof marked with this legend must also reproduce the disclaimers contained on this slide. Although the rights granted by contract do not require course attendance to use this material for U S Government U.S. G t purposes, the th SEI recommends d attendance tt d t ensure proper understanding. to d t di THE MATERIAL IS PROVIDED ON AN AS IS BASIS, AND CARNEGIE MELLON DISCLAIMS ANY AND ALL WARRANTIES, IMPLIED OR OTHERWISE (INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE, RESULTS OBTAINED FROM USE OF THE MATERIAL, MERCHANTABILITY, AND/OR NON-INFRINGEMENT).
String Agenda
Strings Common errors using NTBS C Common errors using i basic_string String Vulnerabilities Mitigation Strategies Summary
Strings
Constitute most of the data exchanged between an end user and a software system
command-line command line arguments environment variables console input
l
length
o \0
Null-terminated byte strings (NTBS) consist of a contiguous sequence of characters terminated by and including the first null character. The C programming language supports the following types of nullterminated byte strings: single-byte character strings (type char) multibyte character strings (type char) wide character strings (type wchar_t)
null character.
Null-terminated byte strings are implemented as arrays of characters and are susceptible to the same problems as arrays. As a result, secure coding practices for arrays should also be applied to null-terminated byte strings.
Arrays
One of the problem with arrays is determining the number of elements:
void func(char s[]) { size_t num_elem = sizeof(s) / sizeof(s[0]); } Number of elements is 4 int main(void) { char str[] = "Bring on the dancing horses"; size_t num_elem = sizeof(str) / sizeof(str[0]); func(str); Number of elements is 28 }
The strlen() function can be used to determine the length of a (properly) null-terminated byte string but not the space available in an array.
7
Wide Strings
A wide string is a contiguous sequence of wide characters terminated by and including the first null wide character. character A pointer to a wide string points to its initial (lowest addressed) wide character. The length of a wide string is the number of wide characters preceding the null wide character.
plain char
the type of each element of a string literal used for character data (where signedness has little
10
int
Used for data that could be either EOF (a negative value) or character data interpreted as unsigned char and then converted to int.
00
00
00
FF
ungetc(). Also, accepted by the character handling functions from <ctype.h>, because they might be passed the result of fgetc() et al.
The type of a character constant. Its value is that of a plain char converted to int. In C++, a character literal that contains only one character has type char
11
unsigned char
Used internally for string comparison functions, even though these operate on character data. As a result, result the result of a string comparison does not depend on whether plain char is signed. Used for situations where the object being manipulated might be of any type and it is necessary to access all bits of that object, as with fwrite().
12
13
14
Casts are required to eliminate these warnings, but excessive casts can make code difficult to read and hide legitimate warning messages. If this code were compiled using a C++ compiler, conversions g char[] [] to const char * and from signed g from unsigned char[] to const char * would be flagged as errors requiring casts.
15
std::basic_string
Standardization of C++ has promoted the standard template class std::basic_string. g class represents p a sequence q of The basic_string characters.
Supports sequence operations as well as string operations such as
16
Strings in C++
The basic_string class is less prone to security vulnerabilities than null-terminated byte strings. Null-terminated Null terminated byte strings are still a common data type in C++ programs. The use of null-terminated byte strings in C++ program is unavoidable, except in rare circumstances:
no string t i literals lit l no interaction with existing libraries that accept null-
17
String Agenda
Strings Common errors using NTBS C Common errors using i basic_string String Vulnerabilities Mitigation Strategies Summary
18
19
C99, although the gets() function has been deprecated in C99 and eliminated from C1X.
10
char *p = dest; while (c != EOF && c != '\n') { *p++ = c; c = getchar(); } *p = '\0'; return dest; }
21
22
11
Simple Solution
Set width field to maximum input size. #include <iostream> The extraction operation be limited to a i td can using namespace std; specified number of
int main(void) { char buf[12]; cin.width(12); cin >> buf; cout << "echo: " }
23
After a call to the extraction operation, the value of the width is reset to 0 0. id h field fi ld i tt
Simple Solution
Test the length of the input using strlen() and dynamically allocate the memory. int main(int argc, char *argv[]) { char *buff = malloc(strlen(argv[1])+1); if (buff != NULL) { strcpy(buff, argv[1]); printf("argv[1] = %s.\n", buff); } else { /* Couldn't get the memory - recover */ } return 0; }
24
12
Null-Termination Errors
Another common problem with null-terminated byte strings is a failure to properly null terminate.
int main(void) i i ( id) { char a[16]; char b[16]; char c[32]; strncpy(a, "0123456789abcdef", sizeof(a)); strncpy(b, "0123456789abcdef", sizeof(b)); strncpy(c, a, sizeof(c));
}
13
27
String Truncation
Functions that restrict the number of bytes are often recommended to mitigate buffer overflow vulnerabilities. vulnerabilities
strncpy() instead of strcpy() fgets() instead of gets() snprintf() instead of sprintf()
Strings that exceed the specified limits are truncated. Truncation T ti results lt in i al loss of fd data t and di in some cases leads to software vulnerabilities.
28
14
29
Off-by-One Errors
Can you find all the off-by-one errors in this program? int main(void) { int i; char source[10]; strcpy(source, "0123456789"); char *dest = malloc(strlen(source)); for (i=1; i <= 11; i++) { dest[i] = source[i]; } dest[i] = '\0'; printf("dest = %s", dest); }
30
15
[Viega 03] Viega, J., & Messier, M. Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Networking, Input Validation & More. Sebastopol, CA: O'Reilly, 2003.
31
Injection
There are many types of injection:
Command injection Format string injection SQL injection XML/Xpath injection Cross-site scripting (XSS)
Enabled by not properly sanitizing a string that is then interpreted by a complex subsystem (such as an HTML parser)
32
16
Black Listing
Replaces dangerous characters in input strings with underscores or other harmless characters
requires the programmer to identify all dangerous
characters and character combinations may be difficult without having a detailed understanding of the program, process, library, or component being called may be possible to encode or escape dangerous characters h t after ft successfully f ll bypassing b i bl black k li list t checking
33
White Listing
Defines a list of acceptable characters and removes any characters that are unacceptable The list of valid input values is typically a predictable predictable, well-defined set of manageable size. White listing can be used to ensure that a string only contains characters that are considered safe by the programmer.
34
17
String Agenda
Strings Common errors using NTBS C Common errors using i basic_string String Vulnerabilities Mitigation Strategies Summary
35
basic_string class
Size is not an issue string str1 = "hello, world."; size t size = str1 size_t str1.size(); size(); Concatenation is not an issue string str1 = "hello, "; string str2 = "world"; string str3 = str1 + str2;
36
18
basic_string iterators
Iterators can be used to iterate over the contents of a string:
string::iterator i; for (i=str.begin(); i != str.end(); ++i) { cout << *i; }
References, pointers, and iterators referencing string objects are invalidated by operations that modify the string which can lead to errors string, errors.
37
Invalid Iterator
The following code attempts to sanitize an email address before passing it to a command shell.
char input[]; string email; string::iterator loc = email.begin(); // copy into string converting ";" to " " for (size_t i=0; i <= strlen(input); i++) { if (input[i] != ';') { Iterator loc email.insert(loc++, input[i]); invalidated after first call } to insert(). else email.insert(loc++, ' '); }
38
19
Valid Iterator
char input[]; string email; string::iterator g loc = email.begin(); g // copy into string converting ";" to " " for (size_t i=0; i <= strlen(input); ++i) { if (input[i] != ';') { loc = email.insert(loc, input[i]); } else l l loc = email.insert(loc, il i t(l ' ') '); ++loc; The value of the iterator loc is }
updated as a result of each insertion.
39
The at() method behaves in a similar fashion to the index operator[] but throws an out_of_range exception if pos >= size().
string bs("01234567"); try { size_t i = f(); bs.at(i) ( ) = '\0'; } catch (...) { cerr << "Index out of range" << endl; }
40
20
string str = x; cout << strlen(str.c_str()); The c_str() method returns a const value.
Calling free() or delete on the returned string is an
If you need to modify the string, make a copy first and then modify the copy.
41
String Agenda
Strings Common errors using NTBS C Common errors using i basic_string String Vulnerabilities
Program Stack Buffer Overflow Code Injection Arc Injection
21
Program Stack
The stack supports nested invocation calls. Information pushed on the t k as a result lt of f a function f ti stack call is called a frame. b() {} a() { b(); } main() { a(); }
Low memory Unallocated Stack frame for b() Stack frame for a() Stack frame for main() High memory
43
A stack frame is created for each subroutine and destroyed upon return.
Stack Frames
A program stack is used to keep track of program execution and state by storing
the return address in the calling function actual arguments to the function local variables of automatic storage duration
The address of the current frame is stored in a register (for example, EBP on Intel architectures). The frame pointer is used as a fixed point of reference within the stack. The stack is modified during
function calls function initialization return from a function
44
22
Notation
There are two notations for Intel instructions.
Microsoft uses the Intel notation (show here). GNU C uses AT&T syntax. syntax
Both of these instructions move the immediate value 4 into the EAX register
45
Function Calls
void function(int arg1, int arg2);
Push 2nd arg on stack
46
23
Function Initialization
Function Return
return(); mov esp, ebp pop ebp ret
Restores the frame pointer Pops return address off the stack and transfers control to that location Restores the stack pointer
24
Sample Program
bool IsPasswordOK(void) { char Password[12]; // Memory storage for pwd gets(Password); // Get input from keyboard return 0 == strcmp(Password, "goodpass"); } int main(void) { bool PwStatus; // puts("Enter Password:"); // PwStatus=IsPasswordOK(); // if (!PwStatus) { puts("Access denied"); // exit( 1); exit(-1); // } else puts("Access granted");// }
Password status Print Get and check password Print Terminate program Print
50
25
51
EIP
Stack
ESP Storage for PwStatus (4 bytes) Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
52
26
Stack
ESP Storage for Password (12 Bytes) Caller EBP Frame Ptr main (4 bytes) Return Addr Caller main (4 Bytes) Storage for PwStatus (4 bytes) Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
Note: The stack grows and shrinks as a result of function calls made by IsPasswordOK(void).
53
puts("Enter Password:"); PwStatus = IsPasswordOk(); if (!PwStatus) { puts("Access denied"); exit(-1); } else puts("Access granted");
Storage for Password (12 Bytes)
Stack
ESP
Caller EBP Frame Ptr main (4 bytes) R t Return Add Addr C Caller ll main i (4 B Bytes) t ) Storage for PwStatus (4 bytes) Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
54
27
String Agenda
Strings Common errors using NTBS Common errors using basic_string basic string String Vulnerabilities
Program stacks Buffer overflows Code Injection Arc Injection
55
Other Memory
56
28
Buffer Overflows
Are caused when buffer boundaries are neglected and unchecked Can occur in any memory segment Can be exploited to modify a
variable data pointer function pointer return address on the stack
Smashing the Stack for Fun and Profit (Aleph One, Phrack 4914, 1996) provides the classic description of buffer overflows.
57
58
29
59
ESP
The return address and other data on the stack is overwritten because the memory space allocated for the password can only hold a maximum of 11 characters plus the null terminator.
Storage for PwStatus (4 bytes) \0 Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
60
30
The Vulnerability
A specially crafted string 1234567890123456j*! produced the following result.
What happened?
61
What Happened?
1234567890123456j*! overwrites 9 bytes of memory on the stack, changing the callers return address, skipping lines 3-5, and starting execution at line 6.
Line 1 2 3 4 5 6 Statement puts("Enter Password:"); PwStatus=ISPasswordOK(); if (!PwStatus) puts("Access denied"); exit(-1); else puts("Access granted");
Stack
Storage for Password (12 Bytes) 123456789012 Caller EBP Frame Ptr main (4 bytes) 3456 Return Addr Caller main (4 Bytes) W*! (return to line 6 was line 3) Storage for PwStatus (4 bytes) \0 Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
Note: This vulnerability also could have been exploited to execute arbitrary code contained in the input string.
62
31
String Agenda
Strings Common errors using NTBS Common errors using basic_string basic string String Vulnerabilities
Program stacks Buffer overflows Code Injection Arc Injection
63
Question
64
32
Code Injection
Attacker creates a malicious argumenta specially crafted string that contains a pointer to malicious code provided by the attacker attacker. When the function returns, control is transferred to the malicious code.
Injected code runs with the permissions of the vulnerable
program when the function returns. Programs running with root or other elevated privileges are normally targeted.
65
Malicious Argument
Must be accepted by the vulnerable program as legitimate input. The argument, argument along with other controllable inputs inputs, must result in execution of the vulnerable code path. The argument must not cause the program to terminate abnormally before control is passed to the malicious code.
66
33
"
31 37 31 F9 31
32 38 C0 FF 31
33 39 A3 BF 31
34 30 FF 8B 2F
35 31 F9 15 75
36 32 FF FF 73
37 33 BF F9 72
38 34 B0 FF 2F
39 35 0B BF 62
30 36 BB CD 69
31 37 03 80 6E
32 38 FA FF 2F
33 E0 FF F9 63
34 F9 BF FF 61
35 FF B9 BF 6C
36 BF FB 31 0A
NOTE: The version of the GCC compiler used allocates stack data in multiples of 16 bytes.
68
34
31 37 31 F9 31
32 38 C0 FF 31
33 39 A3 BF 31
34 30 FF 8B 2F
35 31 F9 15 75
36 32 FF FF 73
37 33 BF F9 72
38 34 B0 FF 2F
39 35 0B BF 62
30 36 BB CD 69
31 37 03 80 6E
32 38 FA FF 2F
33 E0 FF F9 63
34 F9 BF FF 61
35 FF B9 BF 6C
36 BF FB 31 0A
The ne next bytes binary data fill the storage allocated b by t 12 b tes of binar the compiler to align the stack on a 16-byte boundary.
69
31 37 31 F9 31
32 38 C0 FF 31
33 39 A3 BF 31
34 30 FF 8B 2F
35 31 F9 15 75
36 32 FF FF 73
37 33 BF F9 72
38 34 B0 FF 2F
39 35 0B BF 62
30 36 BB CD 69
31 37 03 80 6E
32 38 FA FF 2F
33 E0 FF F9 63
34 F9 BF FF 61
35 FF B9 BF 6C
36 BF FB 31 0A
This value overwrites the return address on the stack to reference injected code. f
70
35
Malicious Code
The object of the malicious argument is to transfer control to the malicious code.
may be included in the malicious argument (as in this
example) may be injected elsewhere during a valid input operation can perform any function that can otherwise be programmed may simply open a remote shell on the compromised machine (as a result, is often referred to as shellcode)
71
72
36
Create a Zero
Create a zero value. Because the exploit cannot contain null characters until the last byte, the null pointer must be set by the exploit code.
xor %eax,%eax #set eax to zero mov %eax,0xbffff9ff # set to NULL word
Use it to null terminate the argument list. This is necessary because an argument to a system call consists of a list of pointers terminated by a null pointer.
73
Shell Code
xor %eax,%eax #set eax to zero mov %eax,0xbffff9ff #set to NULL word mov $0xb,%al #set code for execve
The system call is set to 0xb, which equates to the execve() system call in Linux.
74
37
Shell Code
mov $0xb,%al #set code for execve sets up three mov $0xbffffa03,%ebx #arg 1 ptr arguments for the execve() $0xbffff9fb %ecx #arg 2 ptr mov $0xbffff9fb,%ecx call mov 0xbffff9ff,%edx #arg 3 ptr arg 2 array pointer array points to a NULL byte char * []={0xbffff9ff "1111"}; "/usr/bin/cal\0"
Data for the arguments is also included in the shell code.
Shell Code
mov mov mov mov int $0xb,%al #set code for execve $0xbffffa03,%ebx #ptr to arg 1 $0xbffff9fb,%ecx #ptr to arg 2 0xbffff9ff,%edx #ptr to arg 3 $80 # make system call to execve
The execve() system call results in execution of the Linux calendar program.
76
38
Null Characters
The gets() function also has an interesting property in that it reads characters from the input stream pointed to by stdin until end-of-file is encountered or a new-line character is read. Any new-line character is discarded, and a null character is written immediately after the last character read into the array. As a result, there might be null characters embedded in the string returned by gets() if for example input is redirected from a file. Similarly data read by the fgets() function may also contain Similarly, null characters. This issue is further document in The CERT C Secure Coding Standard rule FIO37-C. Do not assume that fgets() returns a nonempty string when successful.
77
String Agenda
Strings Common errors using NTBS Common errors using basic_string basic string String Vulnerabilities
Buffer overflows Program stacks Code Injection Arc Injection
78
39
transfer) into the programs control-flow graph as opposed to injecting code can install the address of an existing function (such as system() or exec(), which can be used to execute programs on the local system
allows ll f for even more sophisticated hi ti t d attacks tt k
79
Vulnerable Function
#include <string.h> int get_buff(char *user_input, size_t size){ char buff[40]; memcpy(buff, user_input, size); return 0; } int main(void) { /* */ get_buff(tainted_char_array, tainted_size); /* */ }
80
40
Exploit
Overwrites return address with address of existing function. Creates stack frames to chain function calls calls. Recreates original frame to return to program and resume execution without detection.
81
Resultofmemcpy()in get_buff()
BeforeOverflow
esp ebp buff[40] ebp (main) return addr(main) stack frame main esp ebp
AfterOverflow
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
82
41
get_buff() Returns1
esp ebp buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
eip mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
83
get_buff() Returns 2
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
esp
ebp
Frame1
return 0;
eip mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
84
42
get_buff() Returns3
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
esp
Frame1
ebp
return 0;
eip mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
85
get_buff() Returns4
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
esp ebp
Frame1
return 0;
mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
86
43
seteuid() Returns1
seteuid() return transferscontrolto leave/returnsequence
esp ebp buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
eip mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
87
seteuid() Returns2
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
esp
ebp
return 0;
eip mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
88
44
seteuid() Returns3
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
esp
return 0;
eip mov esp,ebp esp ebp pop ebp ret ebp
Frame2
Original Frame
89
seteuid() Returns4
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
mov esp,ebp esp ebp pop ebp ret
esp
Frame2
ebp
Original Frame
90
45
system() Returns1
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
eip mov esp,ebp esp ebp pop ebp ret
Frame2
esp
ebp
Original Frame
91
system() Returns2
Originalesp restored!
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
eip mov esp esp,ebp ebp pop ebp ret esp
Frame2
ebp
Original Frame
92
46
system() Returns3
Originalebp restored!
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
eip mov esp,ebp esp ebp pop ebp ret esp
Frame2
Original Frame
93
system() Returns1
ret instruction returnscontrol tomain()
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
eip mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
94
47
95
Return-oriented Programming
The return-oriented programming exploit technique is similar to arcinjection, but instead of returning to functions the exploit code returns to sequences of instructions followed by a return instruction. Any such useful sequence of instructions is referred to as a gadget. A Turing-complete set of gadgets shave been identified for the x86 architecture [Shacham 2007], allowing arbitrary programs to be written in the return-oriented language. A Turing-complete library of code gadgets using snippets of the Solaris libc, a general purpose programming language, and a compiler for constructing return-oriented exploits has also been developed [Buchanan 2008] . Consequently, there is an assumed risk that return-oriented programming exploits could be effective on other architectures as well.
96
48
Gadgets
Gadgets form the programming language. Each gadget specifies certain values to be placed on the stack that make use of one or more sequences q of instructions in the code segment. Gadgets perform well-defined operations, such as a load, an xor, or a jump. Return-oriented programming consists in putting gadgets together that will perform the desired operations. Gadgets are executed e ec ted by b a return ret rn instruction instr ction with ith the stack pointer referring to the address of the gadget.
97
98
49
Low memory
Because of the immediate offset in the movl instruction, the address in %edx must be 24 bytes less than the address we wish to write to.
The gadget movl %eax, 24(%edx); ret can be used to store the contents of %eax into memory. The address to be written is copied into %edx using the immediate constant gadget.
99
Ordinary programming:
set eip to new value
Return-oriented equivalent:
set esp to new value
50
Iteration
An unconditional branch can be used to branch to an earlier gadget on the stack, resulting in an infinite loop. loop Conditional iteration can be implemented by a conditional branch out of the loop.
101
Return-oriented Programming
Shachams paper contains a more complete tutorial on return-oriented programming [Shacham 2007]. While return-oriented return oriented programming might seem very complex, this complexity can be abstracted behind a programming language and compiler making this a viable technique for writing exploits.
102
51
String Agenda
Strings Common errors using NTBS C Common errors using i basic_string String Vulnerabilities Mitigation Strategies Summary
103
Mitigation Strategies
Include strategies designed to
prevent buffer overflows from occurring detect buffer overflows and securely recover without
104
52
Input Validation
Buffer overflows are often the result of unbounded string or memory copies. Buffer overflows can be prevented by ensuring that input data does not exceed the size of the smallest buffer in which it is stored. int myfunc(const char *arg) { char buff[100]; if ( (strlen(arg) ( g) >= sizeof(buff)) ( )) { abort(); } }
105
String Handling
The CERT C Secure Coding Standard rule STR01-C. Adopt and implement a consistent plan for managing strings recommends selecting a single approach to handling character strings and applying it consistently across a project. Otherwise, the decision is left to individual programmers who are likely to make different, inconsistent choices.
106
53
107
54
<string.h>
the OpenBSD functions strlcpy() and strlcat() the C1X Annex K bounds-checking interfaces.
Memory can be statically or dynamically allocated prior to invoking these functions, making this model optimally efficient. efficient
109
Bounds-checking Interfaces
The bounds-checking interfaces are alternative library functions that promote safer, more secure programming. programming The alternative functions verify that output buffers are large enough for the intended result and return a failure indicator if they are not. Data is never written past the end of an array. All string t i results lt are null ll t terminated. i t d
110
55
History
The C1X Annex K functions were created by Microsoft to help retrofit its existing, legacy code base in response to numerous, well-publicized security incidents over the past decade. These functions were subsequently proposed to the ISO/IEC JTC1/SC22/WG14 international standardization working group for the programming language C for standardization. These functions were published as ISO/IEC TR 24731-1 [ISO/IEC TR 24731-1:2007] and then later incorporated in C1X in the form of a set of optional p extensions specified p in a normative annex.
111
Goals
Mitigate risk of
buffer overrun attacks default protections associated with program program-created created file
Do not produce unterminated strings. Do not unexpectedly truncate strings. Preserve the null-terminated string data type. Support pp compile-time p checking. g Make failures obvious. Have a uniform function signature.
112
56
113
Runtime-constraints
Most bounds-checked functions, upon detecting an error such as invalid arguments or not enough room in an output buffer, call a special runtime-constraint handler function. This function might print an error message and/or abort the program. The programmer can control which handler function is called via the set_constraint_handler_s() function, and can make the handler simply return if desired.
114
57
set_constraint_handler_s()
The set_constraint_handler_s() function sets the function (handler) called when a library function detects a runtime-constraint violation. The behavior of the default handler is implementation-defined, and it may cause the program to exit or abort. There are two predefined handlers (in addition to the default handler)
abort_handler_s() writes a message on the standard error
stream and then calls abort(). ignore_handler_s() function does not write to any stream. It simply returns to its caller.
115
Runtime-constraint Handler
If the handler simply returns, the function that invoked the handler indicates a failure to its caller using its return value. g that install a handler that returns must check the Programs return value of each call to any of the bounds checking functions and handle errors appropriately. The CERT C Secure Coding Standard Recommendation ERR03-C. Use runtime-constraint handlers when calling functions defined by TR24731-1 recommends installing a runtime-constraint handler to eliminate the implementationp defined behavior.
116
58
Functions are still capable of overflowing a buffer if the maximum length of the destination buffer is incorrectly specified. The ISO/IEC TR 24731-1 functions are not foolproof Because the C1X Annex K functions can often be used as simple replacements for the original library functions in legacy code, The CERT C S Secure C Coding di S Standard d d rule l STR07-C. STR07 C U Use TR 24731 f for remediation di i of f existing string manipulation code recommends using them for this purpose on implementations that implement the Annex. (Such implementations are expected to define the __STDC_LIB_EXT1__ macro.)
118
59
119
60
122
61
C99
Not all uses of strcpy() are flawed. For example, it is often possible to dynamically allocate the required space:
dest = (char *)malloc(strlen(source) + 1); if (dest) { strcpy(dest, source); } else { /* handle error / */ / ... }
123
124
62
125
Size Matters
To help prevent buffer overflows, strlcpy() and strlcat() accept the size of the destination string as an argument argument.
For statically allocated destination buffers, this value is
easily computed at compile time using the sizeof() operator. Dynamic buffer size is not easily computed.
Both functions g guarantee the destination string g is null terminated for all non-zero-length buffers.
126
63
String Truncation
The strlcpy() and strlcat() functions return the total length of the string they tried to create.
For strlcpy() that is simply the length of the source source. For strlcat() it is the length of the destination (before
To check for truncation, the programmer must verify that the return value is less than the size argument. If the resulting string is truncated truncated, the programmer
knows the number of bytes needed to store the string may reallocate and recopy
127
128
64
129
Bounds-checking Interfaces
Defines less error-prone versions of C standard functions:
strcpy_s() strcpy s() instead of strcpy() strcat_s() instead of strcat() strncpy_s() instead of strncpy() strncat_s() instead of strncat()
130
65
strcpy_s() Function
Copies characters from a source string to a destination character array up to and including the terminating null character. H th Has the signature i t errno_t strcpy_s( char * restrict s1, rsize_t s1max, const char * restrict s2); Similar S a to st strcpy() cpy() with t extra e taa argument gu e t o of type rsize s e_t t that specifies the maximum length of the destination buffer Only succeeds when the source string can be fully copied to the destination without overflowing the destination buffer
131
Runtime-constraints
Neither s1 nor s2 shall be a null pointer. s1max shall not be > RSIZE_MAX. s1max shall h ll not t equal l zero. s1max shall be > strnlen_s(s2, s1max). Copying shall not take place between objects that overlap. If there is a runtime-constraint violation, , then if s1 is not a null pointer and s1max is greater than zero and not greater than RSIZE_MAX, then strcpy_s() sets s1[0] to the null character.
132
66
strcpy_s() Example
int main(int argc, char* argv[]) { char a[16]; strcpy_s() fails and generates a char b[16]; ti t i t error. runtime constraint char c[24]; strcpy_s(a, strcpy_s(b, strcpy_s(c, strcat_s(c, t t ( } sizeof(a), sizeof(b), sizeof(c), sizeof(c), i f( ) "0123456789abcdef"); "0123456789abcdef"); a); b); b)
133
134
67
135
136
68
strndup() copies at most n plus one bytes into the newly allocated memory the newly created string is always properly terminated
137
std::basic_string
The basic_string class is less prone to security vulnerabilities than null-terminated byte strings. However some mistakes are still common: However,
using an invalidated or uninitialized iterator passing an out-of-bounds index using an iterator range that really
isnt a range
passing an invalid iterator position using an invalid ordering
138
69
139
Beyond basic_string
std::basic_string is implemented in various ways on different platforms and is consequently subject to different types of problems depending on
threading model use of reference counting etc.
it's implemented.
Most local character buffers could be more efficiently implemented
140
70
String Summary
Buffer overflows occur frequently in C and C++ because these languages
use null-terminated byte strings do not perform implicit bounds bo nds checking provide standard library calls for strings that do not enforce
bounds checking
The basic_string class is less error prone for C++ programs. String functions defined by ISO/IEC Security TR 24731-1 g y system y remediation. are useful for legacy New C language development might consider using dynamic allocation functions, or other managed string libraries.
141
142
71
143
References
[Buchanan 2008] Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage. 2008. When good instructions go bad: generalizing returnoriented programming to RISC. In Proceedings of the 15th ACM conference on Computer and communications security (CCS '08). 08). ACM, New York, NY, USA, 27-38. DOI=10.1145/1455770.1455776 http://doi.acm.org/10.1145/1455770.1455776 [Shacham 2007] Hovav Shacham. The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86). In Proceedings of CCS 2007, ACM Press.
144
72
145
73