I am interested regarding the pentesting, so, I decided to write a set of articles regarding that field. In this first article, I will share with you how to do a reverse-engineering binaries with a simple C program and we will familiarize with the GNU debugger which is an important tool.
How memory works
In modern system, when an application is executed, the system allocate memory and each variables, functions and other components from the application is stored in the main memory.
In the diagram above, we have two parts important in the memory: the stack and the heap. The stack contains all local variables declared in the function and the size is 8KBytes and you can see with the ulimit
command:
$ ulimit -a | grep stack
stack size (kbytes, -s) 8192
The heap contains all dynamic variables allocated with malloc
and the heap size is unlimited, but, the access is slower than the stack.
Before to start
For this article, we will use this code:
#include <stdlib.h>
#include <stdio.h>
int sum(int a, int b){
return a + b;
}
int main(void){
int a = 0;
int b = 5;
int c = 10;
printf("%d\n", a);
a = sum(a, b);
printf("%d\n", a);
printf("%d\n", c);
c = sum(b, c);
printf("%d\n", c);
}
As you see, we created with variables a
, b
and c
and these variables are stored in stack memory, also, I created a small function for doing just the sum and return the result.
Now, we need to compile the program with gcc
compiler:
gcc -Wno-all -ggdb -O0 -o main main.c
- The parameter
-Wno-all
indicate to disable all warnings during the compilation - the parameter
-ggdb
produces debugging information - The option
-O
enable the level of the optimization, but, for reading some variables in the program, we need to disable all optimizations with-O0
.
Now, we created the output file main
, we are going to analyze it with the GNU Debugger, gdb
. Execute it with the output file of gcc in parameter: gdb -q main
. The parameter -q
means, quiet, so, without introduction messages.
Assembly register
Now, we are in the debugger and we can do a lot of things for debugging the program. You can use the command help
for having a help.
The command disassemble
is very important, that’s disassemble the program in assembly language:
(gdb) disas main
Dump of assembler code for function main:
0x0000555555555149 <+0>: push %rbp
0x000055555555514a <+1>: mov %rsp,%rbp
0x000055555555514d <+4>: sub $0x10,%rsp
0x0000555555555151 <+8>: movl $0x0,-0x4(%rbp)
0x0000555555555158 <+15>: movl $0x5,-0x8(%rbp)
0x000055555555515f <+22>: movl $0xa,-0xc(%rbp)
0x0000555555555166 <+29>: mov -0x4(%rbp),%eax
0x0000555555555169 <+32>: mov %eax,%esi
0x000055555555516b <+34>: lea 0xe92(%rip),%rdi # 0x555555556004
0x0000555555555172 <+41>: mov $0x0,%eax
0x0000555555555177 <+46>: call 0x555555555030 <printf@plt>
0x000055555555517c <+51>: mov -0x8(%rbp),%edx
0x000055555555517f <+54>: mov -0x4(%rbp),%eax
0x0000555555555182 <+57>: mov %edx,%esi
0x0000555555555184 <+59>: mov %eax,%edi
0x0000555555555186 <+61>: call 0x555555555135 <sum>
0x000055555555518b <+66>: mov %eax,-0x4(%rbp)
0x000055555555518e <+69>: mov -0x4(%rbp),%eax
0x0000555555555191 <+72>: mov %eax,%esi
0x0000555555555193 <+74>: lea 0xe6a(%rip),%rdi # 0x555555556004
0x000055555555519a <+81>: mov $0x0,%eax
0x000055555555519f <+86>: call 0x555555555030 <printf@plt>
0x00005555555551a4 <+91>: mov -0xc(%rbp),%eax
0x00005555555551a7 <+94>: mov %eax,%esi
0x00005555555551a9 <+96>: lea 0xe54(%rip),%rdi # 0x555555556004
0x00005555555551b0 <+103>: mov $0x0,%eax
0x00005555555551b5 <+108>: call 0x555555555030 <printf@plt>
0x00005555555551ba <+113>: mov -0xc(%rbp),%edx
0x00005555555551bd <+116>: mov -0x8(%rbp),%eax
0x00005555555551c0 <+119>: mov %edx,%esi
0x00005555555551c2 <+121>: mov %eax,%edi
0x00005555555551c4 <+123>: call 0x555555555135 <sum>
0x00005555555551c9 <+128>: mov %eax,-0xc(%rbp)
0x00005555555551cc <+131>: mov -0xc(%rbp),%eax
0x00005555555551cf <+134>: mov %eax,%esi
0x00005555555551d1 <+136>: lea 0xe2c(%rip),%rdi # 0x555555556004
0x00005555555551d8 <+143>: mov $0x0,%eax
0x00005555555551dd <+148>: call 0x555555555030 <printf@plt>
0x00005555555551e2 <+153>: mov $0x0,%eax
0x00005555555551e7 <+158>: leave
0x00005555555551e8 <+159>: ret
End of assembler dump.
And we can do the same for the sum
function: disas sum
.
You can see above, the assembly language can be undigestible. For each line, you have the address memory(in 64bit) and the instructions associated (call, add, mov, push, etc….) and the assembly register (eax, esp, ebp, etc.).
What is assembly registers ?
The assembly register is a storage area in CPU, called General-Purpose Registers. In the array below, you have an exhaustive list of differents registers:
Register | 32 bits | 64 bits | Comment |
---|---|---|---|
Accumulator | EAX | RAX | It’s used for arithmetics, logical and I/O instructions |
Counter | ECX | RCX | It’s a counter for loops |
Data | EDX | RDX | Also used for I/O instructions |
Base | EBX | RBX | It’s a index for the value |
Pointer | ESP | RSP | It’s the stack pointer of the current data |
Pointer | EBP | RBP | Pointer to the base of the current stack frame |
Index | EDI | RDI | Pointer for maniupulating string |
Index | EIP | RIP | Contain the next instruction pointer |
We will not learn all these registers, it’s not the purpose of this article, but, if you want to learn them, you may try to understand how the x32-64 architectures works.
With GDB, you can print the register:
(gdb) info register
eax 0x0 0
ecx 0xffffd140 -11968
edx 0x1 1
ebx 0x0 0
esp 0xffffd12c 0xffffd12c
ebp 0x0 0x0
esi 0xf7fa6000 -134586368
edi 0xf7fa6000 -134586368
eip 0x565561ec 0x565561ec <main+83>
If you want to print a specific register:
(gdb) info register $esp
esp 0xffffd12c 0xffffd12c
Working with assembly registers
For understanding the assembly registers, in this section, we will play with these registers: RSP and RBP, because they manipulate the memory stack.
First, we need to run the application, for doing that, you have the command run
:
(gdb) run
Starting program: /home/geoffrey/Documents/C/overflow/test/main
0
5
10
15
[Inferior 1 (process 8114) exited normally]
In the disassemble above, you have these lines, they are very interesting, because they are our variables,a
, b
and c
:
0x0000555555555151 <+8>: movl $0x0,-0x4(%rbp)
0x0000555555555158 <+15>: movl $0x5,-0x8(%rbp)
0x000055555555515f <+22>: movl $0xa,-0xc(%rbp)
When the compilator generate the program, they allocate local variable in a function in the stack memory.
The high address is the first variable and the low address is the last variable. In our case, the variable a
has the address 0xc
(high address) and the variable c
has the address 0x4
(low address) and these values are put in the variables with the movl
instructions and they are stored in the stack memory.
The purpose of a debugger is to debug the program when you have an issue, so, each debugger have an interesting tool, called the breakpoint. A breakpoint stop the program in a specific location for debugging the issue. With gcc
, you can use make a breakpoint in your program.
When we want to create a breakpoint with the hex address, we need to put a ‘*’ before the address.
We need to create two breadpoints. One before to do the sum for the c
variable and one after it. I specified the address for the printf call function.
(gdb) break *0x00005555555551b5
Breakpoint 1 at 0x5555555551b5
(gdb) break *0x00005555555551dd
Breakpoint 2 at 0x5555555551dd
We run our program:
(gdb) run
Starting program: /home/geoffrey/Documents/C/overflow/test/main
0
5
Breakpoint 1, 0x00005555555551b5 in main ()
Now, the program is blocked at the breakpoint we created and we will print the result of the variable c
stored at the location ESP + 0x4. For doing that, we have the command x
which mean examine.
(gdb) x/x $rsp+0x4
0x7fffffffdf44: 0x0000000a
We can see the result of the command x
, the value of the variable c
is 0xa
(10).
And now, we can continue the program until the next breakpoint and display the value of c
:
(gdb) continue
Continuing.
10
Breakpoint 2, 0x00005555555551dd in main ()
(gdb) x/x $rsp+0x4
0x7fffffffdf64: 0x0000000f
If we want to put print the command in decimal:
(gdb) x/d $rsp+0x4
0x7fffffffdf64: 15
That’s works. It’s our variables c
and we can see the value changed.
We can also print the address of our variable with the command print
:
(gdb) print/x $rsp+0x4
$10 = 0x7fffffffdf64
As you see, the breakpoint is very useful for troubleshooting your program. If you want to remove the breakpoint, you can use the command delete
and the id of the breakpoint:
(gdb) del 1
You have an alternative to display the value of a variable:
(gdb) print c
$1 = 10
(gdb) print/x c
$2 = 0xa
If you see this error:
(gdb) print c
No symbol "c" in current context.
That’s means you compiled with optimization, you must use the parameter -o0
for the gcc
program.
We finished with this first article for reverse-engineering binaries and for the next articles, we will exploit GDB for making some pentest and try to have an access to a system Linux. In the section below some useful command you can use in GDB.