Linux - Memory deep dive

· Read in about 18 min · (3637 words)

Anatomy of the memory

When we execute a program, each data are stored in the memory on the system. These data can be used by the program for doing stuff, for instance, if you program is a video game, some data of the game are stored in the memory. These data are temporary, because, when you close your program or shutdown your system, these data are destroyed. They are only used during the execution of the program.

When a new process start, the kernel will allocate a virtual memory in the user space and the process can read and write to that virtual memory. These data are stored in the physical memory through the Kernel space, in different part, it’s not linear. So, the kernel need to map from the virtual memory (VM) to the physical memory. For doing that, the kernel use a Page Table:

Memory

I will explain memory paging in another article, because it’s a big subject.

Virtual memory

When the kernel load the process into the memory, the virtual memory is segmented and looks like that:

Memory stack

The stack section contains all local variables, the returns address and arguments passed to the functions being executed.

The heap section as a variable size and contains all variables allocated dynamically, with malloc() or calloc(). When you use these functions, the size of the heap size increase and it’s important to used the function free() for cleaning the memory. You also have the function brk() and sbrk() for managing the size of the heap.

The section data initialized section contains all static and global variables initialized, and the uninitialized data (bss) contains all global variables which are not allocated. We can play with this small program to understand:

$ cat stack.c
#include <stdio.h>
#include <stdlib.h>

int main(void){
    static int a;
    return 0;
}
$ gcc -o0 -ggdb stack.c -o stack && size stack
   text	   data	    bss	    dec	    hex	filename
   1358	    528	      8	   1894	    766	stack

If you see in the output of the command size, the the length of the bss is 8 bytes, so, the variables is stored in the bss section. If we initialized the variables, the size of the bss section decrease of 4 bytes (size of interger variable) but the data section is increased of 4 bytes:

cat stack.c
#include <stdio.h>
#include <stdlib.h>

int main(void){
    static int a = 5;
    return 0;
}
$ gcc -o0 -ggdb stack.c -o stack && size stack
   text	   data	    bss	    dec	    hex	filename
   1358	    532	      4	   1894	    766	stack

It’s the same for a global variable. If it is not initialized, the variable is stored in the bss section otherwise, it’s store in the data initialized section.

The text section contain all instructions of the program and the program itself compiled and this section is in read-only for avoiding to be modified after the compilation. You can analyse the text section with the objdump command:

$ objdump -d stack
Disassembly of section .text:

0000000000001040 <_start>:
    1040:	31 ed                	xor    %ebp,%ebp
    1042:	49 89 d1             	mov    %rdx,%r9
    1045:	5e                   	pop    %rsi
    1046:	48 89 e2             	mov    %rsp,%rdx
    1049:	48 83 e4 f0          	and    $0xfffffffffffffff0,%rsp
    104d:	50                   	push   %rax
    104e:	54                   	push   %rsp
    104f:	4c 8d 05 3a 01 00 00 	lea    0x13a(%rip),%r8        # 1190 <__libc_csu_fini>
    1056:	48 8d 0d d3 00 00 00 	lea    0xd3(%rip),%rcx        # 1130 <__libc_csu_init>
    105d:	48 8d 3d c1 00 00 00 	lea    0xc1(%rip),%rdi        # 1125 <main>
    1064:	ff 15 76 2f 00 00    	callq  *0x2f76(%rip)        # 3fe0 <__libc_start_main@GLIBC_2.2.5>
    106a:	f4                   	hlt
    106b:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)

The stack section

When a process is loaded in the memory, the kernel will allocate a stack which contain all local variables of the executed function. Take this small program and compile it:

$ cat main.c
#include <stdio.h>
#include <stdlib.h>

int main(void){
    int a = 5;
    int b = 10;
    char c = 'a';
    printf("%d %p\n", a, &a);
    printf("%d %p\n", b, &b);
    printf("%c %p\n", c, &c);
    return 0;
}
$ gcc -m32 -o0 -ggdb main.c -o main && ./main
5 0x7fff21da268c
10 0x7fff21da2688
a 0x7fff21da2687

In the example above, I created three variables and print the value and the address of them. You can see, the addresses of these variables are linear and the first variable a has a higher address and the last variable c has a lower address, because it’s based on the LIFO (Last In, First Out) for stacking variables. Each variables are stacked in the stack like that:

Stack EBP and ESP

ESP and EBP are just pointers to the current stack and they are called Assembly Registers but I explained what are these register in my first article regarding GDB. If you are noticed, the gap between the variable a and b are 4 bytes, that’s correspond to the size of the integer. You can easily do the subtract of these addresses with this command below and the result is 4 bytes:

python3 -c 'print(hex(int("0x7fff21da268c", base=16) - int("0x7fff21da2688", base=16)))'
0x4

And the size of the variable of type char is one byte. In the example below, you have the size of different type of variables:

$ cat size.c
#include <stdio.h>
#include <stdlib.h>

int main(void){
    printf("Char: %ld byte\n", sizeof(char));
    printf("Unsigned char: %ld byt\n", sizeof(unsigned char));
    printf("Int %ld bytes\n", sizeof(int));
    printf("Unsigned int: %ld bytes\n", sizeof(unsigned int));
    printf("Long: %ld bytes\n", sizeof(long));
    printf("Unsigned long: %ld bytes\n", sizeof(unsigned long));

    return 0;
}
$ gcc -o0 -ggdb size.c -o size && ./size
Char: 1 bytes
Unsigned char: 1 bytes
Int 4 bytes
Unsigned int: 4 bytes
Long: 8 bytes
Unsigned long: 8 bytes

Let’s go down to the business. We have three variables a , b and c respectively located at these addresses 0x7fff21da268c, 0x7fff21da2688 and 0x7fff21da2687. We can easily read the value directly through using the address. For doing that, we will use gdb for debugging the program. In the example below, I will add a breakpoint a the line 11 of my code (just before the return 0) and run the program for having the address memory of my variable:

$ gdb -q ./main
Reading symbols from ./main...
(gdb) break 11
Breakpoint 1 at 0x121a: file main.c, line 11.
(gdb) run
Starting program: /home/geoffrey/C/memory/main
5 0xffffd14c
10 0xffffd148
a 0xffffd147

Breakpoint 1, main () at main.c:11
11	    return 0;

The variable a is located at the address 0xffffd14c and if we disassemble the program to get the location of our variable with the EBP pointer:

(gdb) disas
Dump of assembler code for function main:
   0x56556199 <+0>:	lea    0x4(%esp),%ecx
   0x5655619d <+4>:	and    $0xfffffff0,%esp
   0x565561a0 <+7>:	push   -0x4(%ecx)
   0x565561a3 <+10>:	push   %ebp
   0x565561a4 <+11>:	mov    %esp,%ebp
   0x565561a6 <+13>:	push   %ebx
   0x565561a7 <+14>:	push   %ecx
   0x565561a8 <+15>:	sub    $0x10,%esp
   0x565561ab <+18>:	call   0x565560a0 <__x86.get_pc_thunk.bx>
   0x565561b0 <+23>:	add    $0x2e50,%ebx
   0x565561b6 <+29>:	movl   $0x5,-0xc(%ebp)
   0x565561bd <+36>:	movl   $0xa,-0x10(%ebp)
   0x565561c4 <+43>:	movb   $0x61,-0x11(%ebp)
   0x565561c8 <+47>:	mov    -0xc(%ebp),%eax
   0x565561cb <+50>:	sub    $0x4,%esp
   0x565561ce <+53>:	lea    -0xc(%ebp),%edx
   0x565561d1 <+56>:	push   %edx
   0x565561d2 <+57>:	push   %eax
   0x565561d3 <+58>:	lea    -0x1ff8(%ebx),%eax
   0x565561d9 <+64>:	push   %eax
   0x565561da <+65>:	call   0x56556030 <printf@plt>
   0x565561df <+70>:	add    $0x10,%esp
   0x565561e2 <+73>:	mov    -0x10(%ebp),%eax
   0x565561e5 <+76>:	sub    $0x4,%esp
(gdb) info register esp ebp
esp            0xffffd140          0xffffd140
ebp            0xffffd158          0xffffd158

The variable is located at the position 0xffffd158 (EBP) - 0xc. If we subtract 0xffffd158 - 0xc, we have the address 0xffffd14c. So, it’s the address of our variable a. This command line in python can do that subtract:

$ python3 -c 'print(hex(int("0xffffd158", base=16) - int("0xc", base=16)))'
0xffffd14c

Now, we know the address of our variable, we can print the value of it in different ways:

(gdb) x/d $esp+0xc
0xffffd14c:	5
(gdb) x/d $ebp-0xc
0xffffd14c:	5
(gdb) print/d *0xffffd14c
$7 = 5

Memory mapping

Under Linux, each memory regions are mapped and it’s possible to identify the maps for each process in /proc/<pid>/maps. Take the last C program file and for examinating the maps file. I just added a sleep:

cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(void){
    int a = 5;
    int b = 10;
    char c = 'a';
    printf("%d %p\n", a, &a);
    printf("%d %p\n", b, &b);
    printf("%c %p\n", c, &c);
    sleep(120);
    return 0;
}
$ gcc -o0 -ggdb main.c -o main && ./main
5 0x7ffd1093afec
10 0x7ffd1093afe8
a 0x7ffd1093afe7

Now, we can get the process ID for the program and we will analyse the maps file:

$ cat /proc/13181/maps
562774b9a000-562774b9b000 r--p 00000000 fe:05 15207015                   /home/geoffrey/C/memory/main
562774b9b000-562774b9c000 r-xp 00001000 fe:05 15207015                   /home/geoffrey/C/memory/main
562774b9c000-562774b9d000 r--p 00002000 fe:05 15207015                   /home/geoffrey/C/memory/main
562774b9d000-562774b9e000 r--p 00002000 fe:05 15207015                   /home/geoffrey/C/memory/main
562774b9e000-562774b9f000 rw-p 00003000 fe:05 15207015                   /home/geoffrey/C/memory/main
562774be8000-562774c09000 rw-p 00000000 00:00 0                          [heap]
7f79ebaeb000-7f79ebb0d000 r--p 00000000 fe:01 1051348                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f79ebb0d000-7f79ebc67000 r-xp 00022000 fe:01 1051348                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f79ebc67000-7f79ebcb6000 r--p 0017c000 fe:01 1051348                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f79ebcb6000-7f79ebcba000 r--p 001ca000 fe:01 1051348                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f79ebcba000-7f79ebcbc000 rw-p 001ce000 fe:01 1051348                    /usr/lib/x86_64-linux-gnu/libc-2.31.so
7f79ebcbc000-7f79ebcc2000 rw-p 00000000 00:00 0
7f79ebce5000-7f79ebce6000 r--p 00000000 fe:01 1051344                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f79ebce6000-7f79ebd06000 r-xp 00001000 fe:01 1051344                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f79ebd06000-7f79ebd0e000 r--p 00021000 fe:01 1051344                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f79ebd0f000-7f79ebd10000 r--p 00029000 fe:01 1051344                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f79ebd10000-7f79ebd11000 rw-p 0002a000 fe:01 1051344                    /usr/lib/x86_64-linux-gnu/ld-2.31.so
7f79ebd11000-7f79ebd12000 rw-p 00000000 00:00 0
7ffd1091d000-7ffd1093e000 rw-p 00000000 00:00 0                          [stack]
7ffd1099e000-7ffd109a2000 r--p 00000000 00:00 0                          [vvar]
7ffd109a2000-7ffd109a4000 r-xp 00000000 00:00 0                          [vdso]

For each entries, you have the virtual address start and the virtual address end, the permission and the pathname. If you want to know more about the structure of this file, I suggest you to read the manual: man 5 proc.

The entry below is interesting. You can see the stack which contains our variables:

7ffd1091d000-7ffd1093e000 rw-p 00000000 00:00 0                          [stack]

And if you compare with the output of the C program file, you will noticed our variables are located in this range:

$ gcc -o0 -ggdb main.c -o main && ./main
5 0x7ffd1093afec
10 0x7ffd1093afe8
a 0x7ffd1093afe7

You can do the same for the heap section:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(void){
  int *foo = malloc(10 * sizeof(int));
  int i;
  printf("%p\n", &(*foo));

	for (i = 0; i < 10; i++){
		foo[i] = i;
    printf("%d %p\n", foo[i], &foo[i]);
  }

  sleep(60);
	free(foo);
  return 0;
}

Memory regions

In the maps file above, each entries correspond to a struct vm_area_struct handled by the kernel and contain the virtual address start (field vm_start), the virtual address end (field vm_end), a pointer to the next vm_area_struct (field vm_next), the permission, the offset, etc.

For handling all these vm_area_struct, the kernel create a data structure called the Memory descriptor, it’s the structure mm_struct. In the figure below, you can see the relationships between all these structures we describe in this section:

Memory region

It’s possible to allocate a vm_area_struct, but, we can do only in the kernel space, so, we need to create a module. If we want to get the informations of the memory allocation, we need to get the running process in the kernel. For managing all processes, the kernel store all informations regarding the process in a data structure called Process descriptor. We can get these informations in the structure task_struct. This structure store all data about the process like the process name, the state of it, the process id and of course a pointer to the mm_struct. So, during the execution of the module, we get the current process which is a pointer to the task_struct. If you want to know more about the current process, I suggest you to read this article.

Let’s go down to the business, after we get the current process, I loop each process linked to the itself and for the process id which match with the task id, I will call the function print_mem and display the informations in the mm_struct and the vm_area_struct:

$ cat kmemory.c
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/mm.h>

#define BUFFER_SIZE 128

static void printing(char *data){
    printk(KERN_INFO "MEMORY_VA: %s\n", data);
}
static void print_mem(struct task_struct *task){
        struct mm_struct *mm;
        struct vm_area_struct *vma;
        int count = 0;
        char buffer[BUFFER_SIZE];
        mm = task->mm;
        snprintf(buffer, BUFFER_SIZE, "mm_struct infos:\n  map_count: %d.\n", mm->map_count);
        printing(buffer);
				/* For each vm_area_structure, we display the information of it */
        for (vma = mm->mmap ; vma ; vma = vma->vm_next) {
                memset(buffer, 0, BUFFER_SIZE);
                snprintf(buffer, BUFFER_SIZE,
                    "VMA number %d: Start at 0x%lx to 0x%lx\n",
                    ++count, vma->vm_start, vma->vm_end);
                printing(buffer);
        }
        snprintf(buffer, BUFFER_SIZE, "Code Segment start 0x%lx to 0x%lx \n",
                 mm->start_code, mm->end_code);
        printing(buffer);
        memset(buffer, 0, BUFFER_SIZE);

        snprintf(buffer, BUFFER_SIZE, "Data Segment start 0x%lx to 0x%lx \n",
                 mm->start_data, mm->end_data);
        printing(buffer);
        memset(buffer, 0, BUFFER_SIZE);

        snprintf(buffer, BUFFER_SIZE, "Stack Segment start 0x%lx\n",
                 mm->start_stack);
        printing(buffer);
        memset(buffer, 0, BUFFER_SIZE);
}

int init_module(void){
				/* Get the current process, so itself */
        struct task_struct *task = current;
        char buffer[BUFFER_SIZE];

        snprintf(buffer, BUFFER_SIZE, "Analysing the current process %d.\n", current->pid);
        printing(buffer);
        memset(buffer, 0, BUFFER_SIZE);
        for_each_process(task) {
                if (task->pid == current->pid) {
                        // task-> comm -> name of the process
                        snprintf(buffer, BUFFER_SIZE, "Task name: %s[%d]\n", task->comm, task->pid);
                        printing(buffer);
                        memset(buffer, 0, BUFFER_SIZE);
                        print_mem(task);
                }
        }
        return 0;
}

void cleanup_module(void){
        printing("Exiting kmemory.\n");
}

MODULE_AUTHOR ("G. Bucchino");
MODULE_DESCRIPTION ("Testing memory");
MODULE_LICENSE("GPL");

Now, we need to create our Makefile, to compile it and to insert into the kernel:

$ cat Makefile
obj-m += kmemory.o

kmemory.ko:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
	make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean
$ make && insmod kmemory.ko

After we injected our module, we can see the output of the module in the /var/log/kernel.log :

$ grep 'MEMORY_VA' /var/log/kernel.log
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620932] MEMORY_VA: Analysing the current process 17402.
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620963] MEMORY_VA: Task: insmod[17402]
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620963] MEMORY_VA: mm_struct infos:
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620964] MEMORY_VA: VMA number 1: Start at 0x5654d353d000 to 0x5654d3541000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620965] MEMORY_VA: VMA number 2: Start at 0x5654d3541000 to 0x5654d355b000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620966] MEMORY_VA: VMA number 3: Start at 0x5654d355b000 to 0x5654d3565000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620966] MEMORY_VA: VMA number 4: Start at 0x5654d3565000 to 0x5654d3567000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620967] MEMORY_VA: VMA number 5: Start at 0x5654d3567000 to 0x5654d3568000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620967] MEMORY_VA: VMA number 6: Start at 0x5654d4231000 to 0x5654d4252000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620968] MEMORY_VA: VMA number 7: Start at 0x7f6aba6cc000 to 0x7f6aba708000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620969] MEMORY_VA: VMA number 8: Start at 0x7f6aba708000 to 0x7f6aba70a000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620969] MEMORY_VA: VMA number 9: Start at 0x7f6aba70a000 to 0x7f6aba70b000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620970] MEMORY_VA: VMA number 10: Start at 0x7f6aba70b000 to 0x7f6aba70d000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620970] MEMORY_VA: VMA number 11: Start at 0x7f6aba70d000 to 0x7f6aba70e000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620971] MEMORY_VA: VMA number 12: Start at 0x7f6aba70e000 to 0x7f6aba70f000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620972] MEMORY_VA: VMA number 13: Start at 0x7f6aba70f000 to 0x7f6aba710000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620972] MEMORY_VA: VMA number 14: Start at 0x7f6aba710000 to 0x7f6aba716000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620973] MEMORY_VA: VMA number 15: Start at 0x7f6aba716000 to 0x7f6aba726000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620973] MEMORY_VA: VMA number 16: Start at 0x7f6aba726000 to 0x7f6aba72c000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620974] MEMORY_VA: VMA number 17: Start at 0x7f6aba72c000 to 0x7f6aba72d000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620975] MEMORY_VA: VMA number 18: Start at 0x7f6aba72d000 to 0x7f6aba72e000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620975] MEMORY_VA: VMA number 19: Start at 0x7f6aba72e000 to 0x7f6aba732000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620976] MEMORY_VA: VMA number 20: Start at 0x7f6aba732000 to 0x7f6aba754000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620976] MEMORY_VA: VMA number 21: Start at 0x7f6aba754000 to 0x7f6aba8ae000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620977] MEMORY_VA: VMA number 22: Start at 0x7f6aba8ae000 to 0x7f6aba8fd000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620978] MEMORY_VA: VMA number 23: Start at 0x7f6aba8fd000 to 0x7f6aba901000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620978] MEMORY_VA: VMA number 24: Start at 0x7f6aba901000 to 0x7f6aba903000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620979] MEMORY_VA: VMA number 25: Start at 0x7f6aba903000 to 0x7f6aba907000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620979] MEMORY_VA: VMA number 26: Start at 0x7f6aba907000 to 0x7f6aba98d000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620980] MEMORY_VA: VMA number 27: Start at 0x7f6aba98d000 to 0x7f6abab35000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620981] MEMORY_VA: VMA number 28: Start at 0x7f6abab35000 to 0x7f6ababc6000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620981] MEMORY_VA: VMA number 29: Start at 0x7f6ababc6000 to 0x7f6ababf6000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620982] MEMORY_VA: VMA number 30: Start at 0x7f6ababf6000 to 0x7f6ababf8000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620982] MEMORY_VA: VMA number 31: Start at 0x7f6ababf8000 to 0x7f6ababfc000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620983] MEMORY_VA: VMA number 32: Start at 0x7f6ababfc000 to 0x7f6ababff000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620984] MEMORY_VA: VMA number 33: Start at 0x7f6ababff000 to 0x7f6abac17000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620984] MEMORY_VA: VMA number 34: Start at 0x7f6abac17000 to 0x7f6abac22000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620985] MEMORY_VA: VMA number 35: Start at 0x7f6abac22000 to 0x7f6abac23000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620985] MEMORY_VA: VMA number 36: Start at 0x7f6abac23000 to 0x7f6abac24000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620986] MEMORY_VA: VMA number 37: Start at 0x7f6abac24000 to 0x7f6abac26000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620986] MEMORY_VA: VMA number 38: Start at 0x7f6abac49000 to 0x7f6abac4a000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620987] MEMORY_VA: VMA number 39: Start at 0x7f6abac4a000 to 0x7f6abac6a000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620988] MEMORY_VA: VMA number 40: Start at 0x7f6abac6a000 to 0x7f6abac72000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620988] MEMORY_VA: VMA number 41: Start at 0x7f6abac73000 to 0x7f6abac74000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620989] MEMORY_VA: VMA number 42: Start at 0x7f6abac74000 to 0x7f6abac75000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620989] MEMORY_VA: VMA number 43: Start at 0x7f6abac75000 to 0x7f6abac76000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620990] MEMORY_VA: VMA number 44: Start at 0x7ffc7eee3000 to 0x7ffc7ef04000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620991] MEMORY_VA: VMA number 45: Start at 0x7ffc7ef8e000 to 0x7ffc7ef92000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620991] MEMORY_VA: VMA number 46: Start at 0x7ffc7ef92000 to 0x7ffc7ef94000
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620992] MEMORY_VA: Code Segment start 0x5654d3541000 to 0x5654d355a21d
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620992] MEMORY_VA: Data Segment start 0x5654d3565c70 to 0x5654d3567080
Nov 21 12:40:42 pc-geoffrey kernel: [ 9597.620993] MEMORY_VA: Stack Segment start 0x7ffc7ef02360

Mapping to devices

The Linux system provide a function called mmap(), for mapping devices or files into the memory directly from the User-space. In the two next section, we will see how to create a Copy-On-Write (COW)and how to shared the memory between processes.

  • Copy-On-Write
  • Shared

When you write in a Copy-On-Write region, nothing will be write in the file descriptor and no process can read it. It’s a private region.

The shared memory permit to different process to read the same area in the memory. This implementation is a fatest way for sharing data between processes. The illustration show how it’s works for sharing the memory:

Shared memory

The program below will create new mapping to the file descriptor specified in argument of the mmap function, and it create two thread. One thread will put data into the new mapping and the other one read it:

$ touch foo.txt
$ cat shared.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <pthread.h>

struct book{
    int index;
    char author[64];
    char title[64];
};
enum state{
    PAUSE,
    RESUME
};
void *thread_p1(void *data){
    int indexSharedMem = 0;
    struct book s_book;
    int *m;
    int *fd = (void*)data;
    size_t s = sizeof(struct book) + sizeof(int);
    int i;

    m = mmap(NULL, s, PROT_READ | PROT_WRITE, MAP_SHARED, *fd, 0);

    if (m == MAP_FAILED){
        printf("Failed to init mmap\n");
        perror("mmap()");
    }
    s_book.index = 1;
    strcpy(s_book.author, "Alexandre Dumas");
    strcpy(s_book.title, "Les trois mousquetaires");

    m[0] = PAUSE;
    m[1] = s_book.index;

    memset(m + 2, 0, 12);

    indexSharedMem = 2;
    for (i = 0; i < strlen(s_book.author) + 1 /* +1 for the \0 */; i++)
        m[indexSharedMem++] = s_book.author[i];

    for (i = 0; i < strlen(s_book.title); i++){
        m[indexSharedMem++] = s_book.title[i];
    }
    // We indicate to read
    m[0] = RESUME;
    printf("Thread1: Writing to the shared memory:\n"
           "\tId: %d\n\tAuthor: %s\n\tTitle: %s\n",
        s_book.index, s_book.author, s_book.title);
    munmap(m, s);
    return NULL;
}
void *thread_p2(void *data){
    int done = 0;
    int indexSharedMem = 0;
    struct book s_book;
    int *m;
    size_t s = sizeof(struct book) + sizeof(int);
    int *fd = (void*)data;
    int i = 0;

    m = mmap(NULL, s, PROT_READ | PROT_WRITE, MAP_SHARED, *fd, 0);

    if (m == MAP_FAILED){
        printf("Failed to init mmap\n");
        perror("mmap()");
    }

    while (!done){
        if (m[0] == RESUME){
            char c;
            s_book.index = m[1];

            do{
                c = m[indexSharedMem];
                s_book.author[indexSharedMem++] = c;
            }while (c != '\0');

            do{
                c = m[indexSharedMem++];
                s_book.title[i++] = c;
            }while (c != '\0');

            done = 1;
        }
    }
    printf("Thread2: Reading from the shared memory:\n"
           "\tId: %d\n\tAuthor: %s\n\tTitle: %s\n",
        s_book.index, s_book.author, s_book.title);
    munmap(m, s);
    return NULL;
}
int main(void){
    int fd;
    int i;
    size_t s = sizeof(struct book) + sizeof(int);
    pthread_t thread;
    char buf[s];

    fd = open("foo.txt", O_RDWR);
    if (fd < 0){
        printf("Failed to open the file\n");
        perror("open()");
        return -1;
    }
    /* Increase the size of the file */
    for (i = 0; i < s; i++)
        write(fd, buf, i);

    // Create our thread
    pthread_create(&thread, NULL, thread_p1, (void*)&fd);
    pthread_create(&thread, NULL, thread_p2, (void*)&fd);

    pthread_join(thread, NULL);

    close(fd);

    return 0;
}
$ gcc -Wall -o0 -ggdb shared.c -lpthread -o shared && ./shared
Thread1: Writing to the shared memory:
	Id: 1
	Author: Alexandre Dumas
	Title: Les trois mousquetaires
Thread2: Reading from the shared memory:
	Id: 1
	Author: Alexandre Dumas
	Title: Les trois mousquetaires

For copy-on-write, you might call the mmap function like that:

int *m = mmap(NULL, s, PROT_READ | PROT_WRITE, MAP_PRIVATE, *fd, 0);

Each process can read and write to the new mapping but they cannot read or write from the other process.

We finally finished with that article, it is a big article, but I really enjoyed to write it, but, we can dig more into the memory, but that will be the subject for other articles. In another article, I will explain how pages in memory works.

References