Python code obfuscator
But how many clues are provided to your original Python source code? For example, can reverse engineers still see how you named each of your variables, functions, classes, class methods, etc.?
You might be surprised: even in native machine code, in the absence of the original and intermediate files, Nuitka-generated code implementing these entities is still labeled with names derived from your original Python source code.
Here is a brief piece of Python code, which could be used as a small Python program, or as a rather silly library.
import sys def hello_name(name_to_greet_by): greeting_str = "Hello, " + name_to_greet_by + ".\n" sys.stdout.write(greeting_str) name_in = raw_input("What is your name?") hello_name(name_in)
Let's run Nuitka with the --exe flag, which is the normal way to build an executable with Nuitka, and look at the results.
Nuitka converts the Python code into C++ code--but it is special C++ code, which does much of its work by interfacing to the internals of the Python interpreter, making calls into libpython and various other Python libraries. Below is a part of one of the C++ source code files (__constants.hpp) generated by Nuitka from the Python source code above.
.... extern PyObject *_python_str_plain_greeting_str; extern PyObject *_python_str_plain_hello_name; extern PyObject *_python_str_plain_inspect; extern PyObject *_python_str_plain_name_in; extern PyObject *_python_str_plain_name_to_greet_by; extern PyObject *_python_str_plain_open; extern PyObject *_python_str_plain_print; extern PyObject *_python_str_plain_range; extern PyObject *_python_str_plain_raw_input; extern PyObject *_python_str_plain_read; extern PyObject *_python_str_plain_stdout; extern PyObject *_python_str_plain_strip; extern PyObject *_python_str_plain_sys; extern PyObject *_python_str_plain_write; extern PyObject *_python_tuple_empty; extern PyObject *_python_tuple_str_digest_f6429fe0f1a76611670df7e1234af936_tuple; extern PyObject *_python_tuple_str_plain_name_to_greet_by_tuple; ....
So far, the above excerpt from the generated C++ code shows some of the C++ code's allusions to the names of items in our Python code:
But does the native (compiled and linked) binary built by Nuitka still give away this wealth of helpful hints?
Let's use Nuitka's --exe flag to build an executable, and take the resulting executable to another machine where none of the original or intermediate data from building the executable are even available.
There let's use objdump -x to dump the headers from the executable. These headers let us see which symbols the Nuitka-created executable exports. Some of these symbols show which code and data in the executable correspond to named functions, function arguments, variables, etc. within the original Python code.
$objdump -x .... 080510a0 l F .text 0000092b _ZL48_fparse_function_1_hello_name_of_module___main__P7_objectS0_S0_ 08058848 l O .bss 00000004 _ZZL45impl_function_1_hello_name_of_module___main__P7_objectS0_E46frame_function_1_hello_name_of_module___main__ .... 08058840 l O .bss 00000004 _ZL25_mvar___main___hello_name 08058844 l O .bss 00000004 _ZL22_mvar___main___name_in .... _python_tuple_str_plain_name_to_greet_by_tuple 080587c4 g O .bss 00000004 .hidden .... _python_str_plain_name_in 00000000 F *UND* 00000000 PyNumber_CoerceEx 080577c0 g O .bss 000000c4 PyCallIter_Type 0804fbe0 g F .text 00000141 .hidden .... _python_str_plain_name_to_greet_by 00000000 F *UND* 00000000 PyFile_SoftSpace 00000000 F *UND* 00000000 _PyObject_GC_NewVar 080582c0 g O .bss 000000c4 PyMethod_Type .... 0804dc60 g F .text 0000008e .hidden _Z11BUILTIN_OCTP7_object 080587a0 g O .bss 00000004 .hidden _python_str_plain_greeting_str 0804a1f0 F *UND* 00000000 PyObject_GenericGetAttr 00000000 F *UND* 00000000 __cxa_end_catch@@CXXABI_1.3 ....
Please scroll the above code pane horizontally to see all of the symbols. Then you'll see the generated binary executable still shows us the original names of the Python code's functions, function arguments, and variables. Symbols derived from these names still label the code and data that correspond to and implement these entities.
You might wonder, why is everything in the generated native code still labeled in such an informative way?
The reason is, even though this is native machine code that will most likely run 2x faster than the original Python, this native machine code still does its work in small pieces that rather closely match the way in which a standard Python interpreter would run the original Python source code. The generated native code handles data in analagous ways, and makes very frequent calls to libpython and various other Python libraries. When making these many library calls and generally replicating the work done by the the Python interpreter, practically all of the symbols/names defined in the Python source code are preserved, to facilitate the emulation of a Python interpreter.
The purpose of Nuitka is to generate working C++ that can be compiled to native code, not to obliterate any symbols, nor to figure out if any symbols can safely be obliterated without breaking functionality.
Of all the languages that many reverse engineers are used to dealing with for many years, and for which the reverse engineering tools are relatively advanced, C++ is actually high on the list. So what we now have is native machine code on which a reverse engineer could use a C++ decompiler, but the reverse engineer will get C++ code that is much more informative and helpful than the usual results of decompiling C++.
Unlike release binaries that were originally built from native C or C++, a normal Nuitka-built binary incorporates helpful labels showing the original names of nearly everything.
Those labels are undesirable if we want to fight reverse engineering. But Nuitka's design does not make it able to easily and safely remove them.
BitBoost's 'bobs' Python obfuscator, unlike Nuitka, can recognize which symbols (names) can be obliterated from Python code while preserving the code's functionality. For this reason, the obfuscator can increase resistance to reverse engineering by removing large amounts of information that Nuitka by itself would preserve.
By running Nuitka on the obfuscator's output code instead of on the original Python code, we can considerably increase the resistance to reverse engineering of Nuitka's output code.
Cython is a tool for creating shared libraries (such as UNIX .so files or Windows DLLs) from either pure Python code, or Python-like code that uses some extensions to the standard Python language.
Although the design goals of Cython and Nuitka are somewhat different, Nuitka is based largely on a fork of Cython, and some key fundamental implementation strategies are similar. In both tools, much of the implementation of Python functionality is provided by the standard Python interpreter and its libraries. In the generated output code, symbols (names) are preserved to enable working in a similar way to a Python interpreter, to support easy, frequent linking to libpython, and to facilitate use of the resulting shared libraries with other code. Therefore Cython similarly preserves large amounts of symbolic information helpful for reverse engineers.