Dynamic Binary Analysis with Intel Pin
Intro to Intel Pin
Dynamic Binary Instrumentation (DBI) is a technique for analyzing a running program by dynamically injecting analysis code. The added analysis code, or instrumentation code, is run in the context of the instrumented program with access to real, runtime values. DBI is a powerful technique since it does not require the source code for a program, as opposed to static analysis methods. In addition, it can instrument programs that generate code dynamically. To security researchers, DBI frameworks are invaluable tools as they allow for efficient ways to perform fuzzing, control flow analysis, and vulnerability detection with minimal overhead.
For this blog, I’ll explore Intel’s Pin tool and Linux system call hooking. Pin offers a comprehensive framework for creating pin tools to instrument at differing levels of granularity. You can find links to the Pin documentation in the references section. Also check out Gal Diskin’s slides from BlackHat for a more hands on overview of Pin’s functionality.
Identifying Linux System Calls
The main function of our pin tool example will be to intercept and identify the system calls made by a program. For reference, we can view the Linux x86_64 system call table here: https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/.
This table will help to identify the system calls by the mapped system call number.
One of the advantages of DBI is that we do not need the source code for analysis. For the sake of simplicity, the python script below will be our target for instrumentation. We know that it returns the response of a GET request to Google.
import urllib2 page = urllib2.urlopen("https://www.google.com").read()
We can use the strace tool to see the system calls made.
# strace python http.py execve("/usr/bin/python", ["python", "http.py"], [/* 19 vars */]) = 0 [TRUNCATED] socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 sendto(3, "GET / HTTP/1.1\r\nAccept-Encoding:"..., 117, 0, NULL, 0) = 117 recvfrom(3, "HTTP/1.1 200 OK\r\nDate: Mon, 15 M"..., 8192, 0, NULL, NULL) = 1418 recvfrom(3, "d\"><meta content=\"@GoogleDoodles"..., 7422, 0, NULL, NULL) = 2836 recvfrom(3, "ocation,b=a.href.indexOf(\"#\");if"..., 4586, 0, NULL, NULL) = 4586 recvfrom(3, "b\" value=\"Google Search\" name=\"b"..., 8192, 0, NULL, NULL) = 3154 recvfrom(3, "", 5038, 0, NULL, NULL) = 0 recvfrom(3, "", 8192, 0, NULL, NULL) = 0 close(3) = 0 [TRUNCATED]
The strace output above gives us an abundance of information to work with, but we will focus on the system calls we want to intercept: sendto and recvfrom. These system calls are used to transmit messages to and from sockets. We can see the arguments provided to both of the system calls and we will try to read those same arguments with our pin tool.
Hooking sendto and recvfrom
The Pin API for system calls starts with two main functions: PIN_AddSyscallEntryFunction and PIN_AddSyscallExitFunction. These functions register callback functions for before and after the execution of the system call, respectively. The registered callback functions allow us to add instrumentation code before and after every system call is executed.
PIN_AddSyscallEntryFunction(&syscallEntryCallback, NULL); PIN_AddSyscallExitFunction(&syscallExitCallback, NULL);
We can get the system call number with the PIN_GetSyscallNumber function. This function will get the system call number in the current context. Likewise, we can get the arguments for the current system call with PIN_GetSyscallArgument where ‘i’ is the ordinal number of the argument value.
//sendto: 44, recvfrom: 45 PIN_GetSyscallNumber(ctxt, std); PIN_GetSyscallArgument(ctxt, std, i);
By referencing the man pages for our intercepted system calls we know that the second argument holds a pointer to a buffer containing the message contents to be sent or received. The third argument is the length of that buffer. Once we intercept our system call, we can read the value of the buffer with the code below.
ADDRINT buf = PIN_GetSyscallArgument(ctxt, std, 1); ADDRINT len = PIN_GetSyscallArgument(ctxt, std, 2); int buflen = (int)len; char *bufptr = (char *)buf; for (int i = 0; i < buflen; i++, bufptr++) { fprintf(stdout, "%c", *bufptr); }
The buffer pointer is our starting point and we walk “byte-by-byte” dereferencing the buffer pointer to read the value at each point until we hit the end length. Putting it all together, we can see some of the results below.
#../../../pin -t obj-intel64/syscalltest.so -- python http.py call PIN_AddSyscallEntryFunction call PIN_AddSyscallExitFunction call PIN_StartProgram() [TRUNCATED] systemcall sendto: 44 buffer start: 0x7ff81ef26eb4 length: 117 GET / HTTP/1.1 Accept-Encoding: identity Host: www.google.com Connection: close User-Agent: Python-urllib/2.7 [TRUNCATED] systemcall recvfrom: 45 buffer start: 0x5644e5db7934 length: 8192 emtype="https://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script> [TRUNCATED]
The output of the example is far from clean but it does contain the information we want to intercept, the GET request and response. We can identify the system calls associated with network communications and even see the values of the arguments passed back and forth. Imagine if our binary from before sent login credentials in a GET request. We can retrieve that information.
systemcall sendto: 44 buffer start: 0x7f3b3dcf61c4 length: 146 GET /login?user=admin&pass=badpass HTTP/1.1 Accept-Encoding: identity Host: www.notarealhost.com Connection: close User-Agent: Python-urllib/2.7
This example only scrapes the surface of the functionality that the Pin framework has to offer. In the future, I hope to create more complex tools for fuzzing.
You can find the example code at https://github.com/NetSPI/Pin.
References
- https://media.blackhat.com/bh-us-11/Diskin/BH_US_11_Diskin_Binary_Instrumentation_Slides.pdf
- https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/
- https://software.intel.com/sites/landingpage/pintool/docs/81205/Pin/html/group__PIN__SYSCALL__API.html
- https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/
- https://linux.die.net/man/2/sendto
- https://linux.die.net/man/2/recvfrom
Explore more blog posts
Exploiting Second Order SQL Injection with Stored Procedures
Learn how to detect and exploit second-order SQL injection vulnerabilities using Out-of-Band (OOB) techniques, including leveraging DNS requests for data extraction.
CTEM Defined: The Fundamentals of Continuous Threat Exposure Management
Learn how continuous threat exposure management (CTEM) boosts cybersecurity with proactive strategies to assess, manage, and reduce risks.
Balancing Security and Usability of Large Language Models: An LLM Benchmarking Framework
Explore the integration of Large Language Models (LLMs) in critical systems and the balance between security and usability with a new LLM benchmarking framework.