深入浅出虚拟内存

517次阅读  |  发布于4年以前

大家好,我是程序喵!

今天的文章讲的是如何通过/proc文件系统找到正在运行的进程的字符串所在的虚拟内存地址,并通过更改此内存地址的内容来更改字符串内容,闲言少叙,进入正题。

虚拟内存

虚拟内存是一种实现在计算机软硬件之间的内存管理技术,它将程序使用到的内存地址(虚拟地址)映射到计算机内存中的物理地址,虚拟内存使得应用程序从繁琐的管理内存空间任务中解放出来,提高了内存隔离带来的安全性,虚拟内存地址通常是连续的地址空间,由操作系统的内存管理模块控制,在触发缺页中断时利用分页技术将实际的物理内存分配给虚拟内存,而且64位机器虚拟内存的空间大小远超出实际物理内存的大小,使得进程可以使用比物理内存大小更多的内存空间。

在深入研究虚拟内存前,有几个关键点:

图片来源:Holberton School

上图并不是特别详细的内存管理图,高地址其实还有内核空间等等,但这不是这篇文章的主题。从图中可以看到高地址存储着命令行参数和环境变量,之后是栈空间、堆空间和可执行程序,其中栈空间向下延伸,堆空间向上增长,堆空间需要使用malloc分配,是动态分配的内存的一部分。

首先通过一个简单的C程序探究虚拟内存。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
 * main - 使用strdup创建一个字符串的拷贝,strdup内部会使用malloc分配空间,
 * 返回新空间的地址,这段地址空间需要外部自行使用free释放
 *
 * Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    char *s;

    s = strdup("test_memory");
    if (s == NULL)
    {
        fprintf(stderr, "Can't allocate mem with malloc\n");
        return (EXIT_FAILURE);
    }
    printf("%p\n", (void *)s);
    return (EXIT_SUCCESS);
}

编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test
输出:0x88f010

我的机器是64位机器,进程的虚拟内存高地址为0xffffffffffffffff, 低地址为0x0,而0x88f010远小于0xffffffffffffffff,因此大概可以推断出被复制的字符串的地址(堆地址)是在内存低地址附近,具体可以通过/proc文件系统验证。 ls /proc目录可以看到好多文件,这里主要关注/proc/[pid]/mem和/proc/[pid]/maps

mem & maps

man proc

/proc/[pid]/mem
    This file can be used to access the pages of a process's memory through open(2), read(2), and lseek(2).

/proc/[pid]/maps
    A  file containing the currently mapped memory regions and their access permissions.
          See mmap(2) for some further information about memory mappings.

              The format of the file is:

       address           perms offset  dev   inode       pathname
       00400000-00452000 r-xp 00000000 08:02 173521      /usr/bin/dbus-daemon
       00651000-00652000 r--p 00051000 08:02 173521      /usr/bin/dbus-daemon
       00652000-00655000 rw-p 00052000 08:02 173521      /usr/bin/dbus-daemon
       00e03000-00e24000 rw-p 00000000 00:00 0           [heap]
       00e24000-011f7000 rw-p 00000000 00:00 0           [heap]
       ...
       35b1800000-35b1820000 r-xp 00000000 08:02 135522  /usr/lib64/ld-2.15.so
       35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522  /usr/lib64/ld-2.15.so
       35b1a20000-35b1a21000 rw-p 00020000 08:02 135522  /usr/lib64/ld-2.15.so
       35b1a21000-35b1a22000 rw-p 00000000 00:00 0
       35b1c00000-35b1dac000 r-xp 00000000 08:02 135870  /usr/lib64/libc-2.15.so
       35b1dac000-35b1fac000 ---p 001ac000 08:02 135870  /usr/lib64/libc-2.15.so
       35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870  /usr/lib64/libc-2.15.so
       35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870  /usr/lib64/libc-2.15.so
       ...
       f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0    [stack:986]
       ...
       7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0   [stack]
       7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0   [vdso]

              The address field is the address space in the process that the mapping occupies.
          The perms field is a set of permissions:

                   r = read
                   w = write
                   x = execute
                   s = shared
                   p = private (copy on write)

              The offset field is the offset into the file/whatever;
          dev is the device (major:minor); inode is the inode on that device.   0  indicates
              that no inode is associated with the memory region,
          as would be the case with BSS (uninitialized data).

              The  pathname field will usually be the file that is backing the mapping.
          For ELF files, you can easily coordinate with the offset field
              by looking at the Offset field in the ELF program headers (readelf -l).

              There are additional helpful pseudo-paths:

                   [stack]
                          The initial process's (also known as the main thread's) stack.

                   [stack:<tid>] (since Linux 3.4)
                          A thread's stack (where the <tid> is a thread ID).
              It corresponds to the /proc/[pid]/task/[tid]/ path.

                   [vdso] The virtual dynamically linked shared object.

                   [heap] The process's heap.

              If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function.
          There is no easy  way  to  coordinate
              this back to a process's source, short of running it through gdb(1), strace(1), or similar.

              Under Linux 2.0 there is no field giving pathname.

通过mem文件可以访问和修改整个进程的内存页,通过maps可以看到进程当前已映射的内存区域,有地址和访问权限偏移量等,从maps中可以看到堆空间是在低地址而栈空间是在高地址. 从maps中可以看到heap的访问权限是rw,即可写,所以可以通过堆地址找到上个示例程序中字符串的地址,并通过修改mem文件对应地址的内容,就可以修改字符串的内容啦,程序:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

/**              
 * main - uses strdup to create a new string, loops forever-ever
 *                
 * Return: EXIT_FAILURE if malloc failed. Other never returns
 */
int main(void)
{
     char *s;
     unsigned long int i;

     s = strdup("test_memory");
     if (s == NULL)
     {
          fprintf(stderr, "Can't allocate mem with malloc\n");
          return (EXIT_FAILURE);
     }
     i = 0;
     while (s)
     {
          printf("[%lu] %s (%p)\n", i, s, (void *)s);
          sleep(1);
          i++;
     }
     return (EXIT_SUCCESS);
}
编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o loop; ./loop
输出:
[0] test_memory (0x21dc010)
[1] test_memory (0x21dc010)
[2] test_memory (0x21dc010)
[3] test_memory (0x21dc010)
[4] test_memory (0x21dc010)
[5] test_memory (0x21dc010)
[6] test_memory (0x21dc010)
...

这里可以写一个脚本通过/proc文件系统找到字符串所在位置并修改其内容,相应的输出也会更改。

首先找到进程的进程号

ps aux | grep ./loop | grep -v grep
zjucad    2542  0.0  0.0   4352   636 pts/3    S+   12:28   0:00 ./loop

2542即为loop程序的进程号,cat /proc/2542/maps得到

00400000-00401000 r-xp 00000000 08:01 811716                             /home/zjucad/wangzhiqiang/loop
00600000-00601000 r--p 00000000 08:01 811716                             /home/zjucad/wangzhiqiang/loop
00601000-00602000 rw-p 00001000 08:01 811716                             /home/zjucad/wangzhiqiang/loop
021dc000-021fd000 rw-p 00000000 00:00 0                                  [heap]
7f2adae2a000-7f2adafea000 r-xp 00000000 08:01 8661324                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2adafea000-7f2adb1ea000 ---p 001c0000 08:01 8661324                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2adb1ea000-7f2adb1ee000 r--p 001c0000 08:01 8661324                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2adb1ee000-7f2adb1f0000 rw-p 001c4000 08:01 8661324                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2adb1f0000-7f2adb1f4000 rw-p 00000000 00:00 0
7f2adb1f4000-7f2adb21a000 r-xp 00000000 08:01 8661310                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2adb3fa000-7f2adb3fd000 rw-p 00000000 00:00 0
7f2adb419000-7f2adb41a000 r--p 00025000 08:01 8661310                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2adb41a000-7f2adb41b000 rw-p 00026000 08:01 8661310                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2adb41b000-7f2adb41c000 rw-p 00000000 00:00 0
7ffd51bb3000-7ffd51bd4000 rw-p 00000000 00:00 0                          [stack]
7ffd51bdd000-7ffd51be0000 r--p 00000000 00:00 0                          [vvar]
7ffd51be0000-7ffd51be2000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

看见堆地址范围021dc000-021fd000,并且可读可写,而且 021dc000<0x21dc010<021fd000,这就可以确认字符串的地址在堆中,在堆中的索引是0x10(至于为什么是0x10,后面会讲到),这时可以通过mem文件到0x21dc010地址修改内容,字符串输出的内容也会随之更改,这里通过python脚本实现此功能。

#!/usr/bin/env python3
'''            
Locates and replaces the first occurrence of a string in the heap
of a process    

Usage: ./read_write_heap.py PID search_string replace_by_string
Where:          
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
  search_string with
'''

import sys

def print_usage_and_exit():
    print('Usage: {} pid search write'.format(sys.argv[0]))
    sys.exit(1)

# check usage  
if len(sys.argv) != 4:
    print_usage_and_exit()

# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
    print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string  == "":
    print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string  == "":
    print_usage_and_exit()

# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))

# try opening the maps file
try:
    maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
    print("[ERROR] Can not open file {}:".format(maps_filename))
    print("        I/O error({}): {}".format(e.errno, e.strerror))
    sys.exit(1)

for line in maps_file:
    sline = line.split(' ')
    # check if we found the heap
    if sline[-1][:-1] != "[heap]":
        continue
    print("[*] Found [heap]:")

    # parse line
    addr = sline[0]
    perm = sline[1]
    offset = sline[2]
    device = sline[3]
    inode = sline[4]
    pathname = sline[-1][:-1]
    print("\tpathname = {}".format(pathname))
    print("\taddresses = {}".format(addr))
    print("\tpermisions = {}".format(perm))
    print("\toffset = {}".format(offset))
    print("\tinode = {}".format(inode))

    # check if there is read and write permission
    if perm[0] != 'r' or perm[1] != 'w':
        print("[*] {} does not have read/write permission".format(pathname))
        maps_file.close()
        exit(0)

    # get start and end of the heap in the virtual memory
    addr = addr.split("-")
    if len(addr) != 2: # never trust anyone, not even your OS :)
        print("[*] Wrong addr format")
        maps_file.close()
        exit(1)
    addr_start = int(addr[0], 16)
    addr_end = int(addr[1], 16)
    print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))

    # open and read mem
    try:
        mem_file = open(mem_filename, 'rb+')
    except IOError as e:
        print("[ERROR] Can not open file {}:".format(mem_filename))
        print("        I/O error({}): {}".format(e.errno, e.strerror))
        maps_file.close()
        exit(1)

    # read heap  
    mem_file.seek(addr_start)
    heap = mem_file.read(addr_end - addr_start)

    # find string
    try:
        i = heap.index(bytes(search_string, "ASCII"))
    except Exception:
        print("Can't find '{}'".format(search_string))
        maps_file.close()
        mem_file.close()
        exit(0)
    print("[*] Found '{}' at {:x}".format(search_string, i))

    # write the new string
    print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
    mem_file.seek(addr_start + i)
    mem_file.write(bytes(write_string, "ASCII"))

    # close files
    maps_file.close()
    mem_file.close()

    # there is only one heap in our example
    break

运行这个Python脚本

zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ sudo ./loop.py 2542 test_memory test_hello
[*] maps: /proc/2542/maps
[*] mem: /proc/2542/mem
[*] Found [heap]:
        pathname = [heap]
        addresses = 021dc000-021fd000
        permisions = rw-p
        offset = 00000000
        inode = 0
        Addr start [21dc000] | end [21fd000]
[*] Found 'test_memory' at 10
[*] Writing 'test_hello' at 21dc010

同时字符串输出的内容也已更改

[633] test_memory (0x21dc010)
[634] test_memory (0x21dc010)
[635] test_memory (0x21dc010)
[636] test_memory (0x21dc010)
[637] test_memory (0x21dc010)
[638] test_memory (0x21dc010)
[639] test_memory (0x21dc010)
[640] test_helloy (0x21dc010)
[641] test_helloy (0x21dc010)
[642] test_helloy (0x21dc010)
[643] test_helloy (0x21dc010)
[644] test_helloy (0x21dc010)
[645] test_helloy (0x21dc010)

实验成功!

Copyright© 2013-2020

All Rights Reserved 京ICP备2023019179号-8