BitBoost Python code obfuscator

Does Nuitka Remove Or Obscure Names of Variables, Functions, Classes, Etc. From Your Python Code?

(Nope!
It Preserves Them For Reverse Engineers!)

Introduction

Nuitka* is a tool for increasing the execution speed of Python code. Nuitka can convert a large subset of working Python source code into C++ code, and can even build (platform-dependent) native machine code from it, in order to increase execution speed.

But how many clues are provided to your original Python source code? For example, can reverse engineers still see how you named each of your variables, functions, classes, class methods, etc.?

You might be surprised: even in native machine code, in the absence of the original and intermediate files, Nuitka-generated code implementing these entities is still labeled with names derived from your original Python source code.

Following Python code and its symbols/names
through the Nuitka build process

Here is a brief piece of Python code, which could be used as a small Python program, or as a rather silly library.

import sys

def hello_name(name_to_greet_by):
  greeting_str = "Hello, " + name_to_greet_by + ".\n"
  sys.stdout.write(greeting_str)

name_in = raw_input("What is your name?")

hello_name(name_in)      
    

Let's run Nuitka with the --exe flag, which is the normal way to build an executable with Nuitka, and look at the results.

Nuitka converts the Python code into C++ code--but it is special C++ code, which does much of its work by interfacing to the internals of the Python interpreter, making calls into libpython and various other Python libraries. Below is a part of one of the C++ source code files (__constants.hpp) generated by Nuitka from the Python source code above.

....
extern PyObject *_python_str_plain_greeting_str;
extern PyObject *_python_str_plain_hello_name;
extern PyObject *_python_str_plain_inspect;
extern PyObject *_python_str_plain_name_in;
extern PyObject *_python_str_plain_name_to_greet_by;
extern PyObject *_python_str_plain_open;
extern PyObject *_python_str_plain_print;
extern PyObject *_python_str_plain_range;
extern PyObject *_python_str_plain_raw_input;
extern PyObject *_python_str_plain_read;
extern PyObject *_python_str_plain_stdout;
extern PyObject *_python_str_plain_strip;
extern PyObject *_python_str_plain_sys;
extern PyObject *_python_str_plain_write;
extern PyObject *_python_tuple_empty;
extern PyObject *_python_tuple_str_digest_f6429fe0f1a76611670df7e1234af936_tuple;
extern PyObject *_python_tuple_str_plain_name_to_greet_by_tuple;    
....    

So far, the above excerpt from the generated C++ code shows some of the C++ code's allusions to the names of items in our Python code:

But does the native (compiled and linked) binary built by Nuitka still give away this wealth of helpful hints?

Let's use Nuitka's --exe flag to build an executable, and take the resulting executable to another machine where none of the original or intermediate data from building the executable are even available.

There let's use objdump -x to dump the headers from the executable. These headers let us see which symbols the Nuitka-created executable exports. Some of these symbols show which code and data in the executable correspond to named functions, function arguments, variables, etc. within the original Python code.

$objdump -x
....
080510a0 l     F .text	0000092b              _ZL48_fparse_function_1_hello_name_of_module___main__P7_objectS0_S0_
08058848 l     O .bss	00000004              _ZZL45impl_function_1_hello_name_of_module___main__P7_objectS0_E46frame_function_1_hello_name_of_module___main__
....
08058840 l     O .bss	00000004              _ZL25_mvar___main___hello_name
08058844 l     O .bss	00000004              _ZL22_mvar___main___name_in
.... _python_tuple_str_plain_name_to_greet_by_tuple
080587c4 g     O .bss	00000004              .hidden 
.... _python_str_plain_name_in
00000000       F *UND*	00000000              PyNumber_CoerceEx
080577c0 g     O .bss	000000c4              PyCallIter_Type
0804fbe0 g     F .text	00000141              .hidden 
.... _python_str_plain_name_to_greet_by
00000000       F *UND*	00000000              PyFile_SoftSpace
00000000       F *UND*	00000000              _PyObject_GC_NewVar
080582c0 g     O .bss	000000c4              PyMethod_Type
....
0804dc60 g     F .text	0000008e              .hidden _Z11BUILTIN_OCTP7_object
080587a0 g     O .bss	00000004              .hidden _python_str_plain_greeting_str
0804a1f0       F *UND*	00000000              PyObject_GenericGetAttr
00000000       F *UND*	00000000              __cxa_end_catch@@CXXABI_1.3
....

Please scroll the above code pane horizontally to see all of the symbols. Then you'll see the generated binary executable still shows us the original names of the Python code's functions, function arguments, and variables. Symbols derived from these names still label the code and data that correspond to and implement these entities.

You might wonder, why is everything in the generated native code still labeled in such an informative way?

The reason is, even though this is native machine code that will most likely run 2x faster than the original Python, this native machine code still does its work in small pieces that rather closely match the way in which a standard Python interpreter would run the original Python source code. The generated native code handles data in analagous ways, and makes very frequent calls to libpython and various other Python libraries. When making these many library calls and generally replicating the work done by the the Python interpreter, practically all of the symbols/names defined in the Python source code are preserved, to facilitate the emulation of a Python interpreter.

The purpose of Nuitka is to generate working C++ that can be compiled to native code, not to obliterate any symbols, nor to figure out if any symbols can safely be obliterated without breaking functionality.

Of all the languages that many reverse engineers are used to dealing with for many years, and for which the reverse engineering tools are relatively advanced, C++ is actually high on the list. So what we now have is native machine code on which a reverse engineer could use a C++ decompiler, but the reverse engineer will get C++ code that is much more informative and helpful than the usual results of decompiling C++.

Unlike release binaries that were originally built from native C or C++, a normal Nuitka-built binary incorporates helpful labels showing the original names of nearly everything.

Those labels are undesirable if we want to fight reverse engineering. But Nuitka's design does not make it able to easily and safely remove them.

How can you use Nuitka and make the resulting binaries more resistant to reverse engineering?

BitBoost's 'bobs' Python obfuscator, unlike Nuitka, can recognize which symbols (names) can be obliterated from Python code while preserving the code's functionality. For this reason, the obfuscator can increase resistance to reverse engineering by removing large amounts of information that Nuitka by itself would preserve.

By running Nuitka on the obfuscator's output code instead of on the original Python code, we can considerably increase the resistance to reverse engineering of Nuitka's output code.

Cython and Nuitka

Cython is a tool for creating shared libraries (such as UNIX .so files or Windows DLLs) from either pure Python code, or Python-like code that uses some extensions to the standard Python language.

Although the design goals of Cython and Nuitka are somewhat different, Nuitka is based largely on a fork of Cython, and some key fundamental implementation strategies are similar. In both tools, much of the implementation of Python functionality is provided by the standard Python interpreter and its libraries. In the generated output code, symbols (names) are preserved to enable working in a similar way to a Python interpreter, to support easy, frequent linking to libpython, and to facilitate use of the resulting shared libraries with other code. Therefore Cython similarly preserves large amounts of symbolic information helpful for reverse engineers.

Footnotes

* Nuitka is not developed by BitBoost, although it can be used in combination with BitBoost products such as BitBoost's 'bobs' Python obfuscator.