Reducing Executable Size


16 minute read  • 

win32

There are so many large and bloated applications around today that most people just assume that this is normal and quite willingly put up with poor performance and having to buy larger harddrives.

In this tutorial I will present all of the techniques I know (and use) to reduce the size of executables. Some of these techniques can be applied to any C / C++ Win32 project, whilst other techniques are quite restrictive as to the type of application you can apply them to. I regularly use the techniques on this page to build tiny executables - less than 1Kb sometimes!

The techniques described here concentrate on reducing the file size of an executable. They only partly reduce the amount of memory (RAM) required by a program. If your program uses alot of memory to load bitmaps or animations, then this is a completely different issue which can only be controlled by you, the programmer.

Dependence on third-party libraries

If you are using MFC then you know what I’m talking about. MFC (and many other frameworks or libraries) by their very nature introduce code bloat. This is because libraries tend to be general purpose. So, even though you use only a small fraction of what a library offers, you are stuck with using the whole library. If you really want to build a small executable file then you have to bite the bullet and stay away from such frameworks. You have to have complete control over what gets compiled into your executable, and MFC doesn’t give you that control. Jump to the section on Compiler and Linker Settings to try to minimize the damage your MFC project is doing to your harddisk!

Debug Vs Release build

It constantly amazes me the number people that don’t understand the difference between a debug and release build of their application. Always distribute the release version of your program. A debug build has alot of bloat in it which, whilst assisting the debugger to perform its job properly, doesn’t do you any favours if you are trying to keep your program small. A release build on the other hand has minimal bloat and is optimized which always reduces code size.

If you really want to include debugging information for your release builds, then configure Visual Studio’s linker options to use an external debugging database which the exe links to.

Static Vs Dynamic linking of library files

One of the first things you could do to reduce executable size is to use the DLL version of your standard libaries. This may be the MFC DLL (mfc42.dll) or the DLL version of the C run-time (MSVCRT.DLL). In either case, you will end up with a smaller executable. There are disadvantages to this approach though. You may still need to distribute the runtime DLL at some point (for example, Win9x doesn’t include MSVCRT.DLL),

I used to say that it best to statically link to your runtime library - however, with Windows 95/98/ME pretty much dead, there is little reason not to link against MSVCRT.DLL. You’re executable will be very small and you get all the benefits of using the entire C-runtime.

The C-runtime and default libraries

The C-runtime is a collection of functions and services that you probably didn’t even know existed. The run-time performs many tasks which are required by a C or C++ implementation.

You may have noticed that the smallest C program you can write in visual studio is usually around 20-30Kb. Even the tiny “Hello, world” program results in a not so tiny executable size.

#include <stdio.h>

int main(void)
{
    printf("hello, world\n");
    return 0;
}

This is not a fault of the compiler or linker. It is only doing what you are telling it to do. A normal, standard C program gets it own copy of the C runtime. Unless you tell the linker otherwise, this is what you will always get. The following program illustrates not only the above “hello world” exacmple, but also a very simplified outline of what happens during program startup:

// 
// This is the same "hello world" as above
//
#include <stdio.h>

int main(void)
{
    printf("hello, world\n");
    return 0;
}

//
// This is what you never get to see...it is the proper
// entry-point into this application.
//
int mainCRTStartup()
{
    int retval;    
    
    init_heap();
    parse_command_line();    
    init_global_vars();    
    init_exception_handling();

    // finally call the user-defined main
    retval = main();

    // terminate all threads and exit
    ExitProcess(retval);
}

If we wrote our own minimal version of the C runtime, then we can remove this extra code and rely on the operating system to do much of the work for us. In a moment I will tell you how to remove the default libraries from your program, but there are a few rules you need to obey if you are to do this.

The following list contains some of the tasks that go on behind the scenes in a C or C++ program: You need to be aware of these tasks, and how to write software that doesn’t rely on the features they provide.

Heap Management

Every time you call malloc, new[], free or delete[], the heap manager is behind the scenes reserving memory from the operating system and trying to keep memory fragmentation down. The heap manager is really just a collection of functions which get called by malloc, free etc. You don’t NEED the heap manager though. If you replace the heap manager with small functions which call Win32 memory functions, then you can save alot of space in your program. You lose a bit of performance, but there is always a tradeoff in anything you do.

Startup code

If you didn’t know already, it is very uncommon for main to be the first function called during the lifetime of your executable. Under Visual C, the standard entry-point into a windows application is called WinMainCRTStartup, whilst a console application starts life in the mainCRTStartup function.

These entry-point functions perform many tasks before your main function is called. Initializing the Heap Manager. Retrieving the command line arguments and putting them into the argv[] array. Calling your main function. The final job of the startup function is to call ExitProcess. Without this, Windows will carry on executing random code at the end of your program, which will probably crash your computer.

Math routines

Everytime you use certain floating point operations, or 64bit integer arithmetic, then an appropriate CRT function will be automatically included to do the actual arithmetic, if the requested operation does not exist as part of the Intel instruction set. This could be something as simple as an integer division (i.e. int a=b / 5 ) - if you look at the assembler that the compiler generates for some operations like these, you can clearly see the resulting function calls.

Standard library calls

Whenever you call printf, atoi, strcpy or any other C library function, then that function is linked into your executable. Some functions are small (strcpy, atoi etc), but complicated functions like printf will always add a few Kb to your program. If you can find other ways to perform these tasks then you can reduce the amount of extra code that is linked into your application. There are many Win32 equivalents of the standard C library calls which you can use instead.

Static object initialization

Any variable or structure defined at global scope (i.e. outside of any function) needs to be initialized at program startup, so that when your functions go to use these variables, you know they contain the right values. These variables have to be initialized somehow, and the linker adds the necessary C runtime code to do this for you.

C++ exeception handling

God only knows how much extra code is included when you use exception handling. If you use this feature then you’ll have your work cut trying to reduce your executable size. If you want small programs, then turn off exception handling.

If you have decided to remove the C runtime from your applications, then you need to be very careful when you write your code. You can no longer rely on static object initialization. The C library calls need to be used with care, because many library functions have to be initialized correctly in the standard startup code. You cannot use new, delete, malloc or new unless you actually write these yourself.

Use Win32 equivalents

There are several standard-C functions which have direct replacements in the Windows API. Because these API calls are inside the operating system, this frees up space in your executable that might have been used otherwise. Using the win32 functions instead will obviously lead to unportablity, so you need to decide how badly you want to reduce your exe size. Here is a list of C functions, and the eqivalent win32 function.

Standard functionWin32 equivalent
mallocHeapAlloc
freeHeapFree
strcpylstrcpy
strcatlstrcat
strncpylstrncpy
strncatlstrncat
strlenlstrlen
strcmplstrcmp
strcmpilstrcmpi
memcpyCopyMemory
memsetFillMemory or ZeroMemory
memmoveMoveMemory
toupperCharUpper
tolowerCharLower
isalphaIsCharAlpha
isalnumIsCharAlphaNumeric
islowerIsCharLower
isupperIsCharUpper
sprintfwsprintf
vsprintfwvsprint

There is also a comprehensive list available on the Microsoft site which lists even more functions:

http://support.microsoft.com/default.aspx?scid=kb;EN-US;q99456

There is a small group of functions that are always available to you, even if you remove the default libraries. These are called intrinsic functions. These instrinsic functions might not be the same versions as those found in the standard libraries, and might not be as optimized. If you want to use these functions, it is always best to include the relevant headers.

  • memcmp, memcpy, memset
  • strcmp, strcpy, strlen, strcat, strset

Removing the C run-time library (RTL)

This is pretty straight forward. Just click on the “Ignore all default libraries” in the Link tab in your project settings, or use the /NODEFAULTLIB linker setting. However, you probably wont be able to compile your project any more.

Visual C++ requires that you provide three functions (for a normal C program): __purecall , new and delete. A C program requires malloc and free. Obviously, if you don’t do ANY memory allocation then you don’t need them. Here are minimal implementations of these functions:

void * __cdecl operator new(unsigned int bytes)
{
  return HeapAlloc(GetProcessHeap(), 0, bytes);
}

void __cdecl operator delete(void *ptr)
{
  if(ptr) HeapFree(GetProcessHeap(), 0, ptr);
}

extern "C" int __cdecl__ purecall(void)
{
  return 0;
}

This just leaves the entry-point functions. Define which ever one is required by the type of executable you are building. At the very least, your entry-point function must call either main WinMain or DllMain.

int __cdecl mainCRTStartup();
int __cdecl WinMainCRTStartup();
BOOL __stdcall _DllMainCRTStartup(HINSTANCE, DWORD, LPVOID);

More information in MSDN

There are two important articles which you should read if you want to understand more. I strongly suggest you read the October 1996 “Under the Hood” article, written by Matt Pietrek in the Microsoft Systems Journal. This article was updated recently and published in the Janurary 2001 “Under the Hood” column of MSDN magazine. Go here to read this article online.

The second article you should read is called “Remove Fatty Deposits from Your Applications Using Our 32-Bit Liposuction Tools”, also published in the October 1996 MSJ magazine. Go here to read this article online.

Goto Matt’s homepage here.

Once you have read these articles, then you don’t need to do anything other than download Matt’s LIBCTINY and link it into your application. You don’t need to alter your link settings at all. This cool little library replaces the default C run time and gives you tiny programs with no effort at all. Most of the programs on this site where compiled and linked using this library. The sample program on this page includes LIBCTINY and shows you how to create a small program.

Use the right Compiler settings

If you don’t want to remove the C run-time, or you want to go even smaller, then read on. Careful use of compiler and linker settings can have a dramatic effect on the size of your application.

As long as you select the Release build for your project then you are on the right track. However, there are a few important options that you should make sure are set for all your source files.

SwitchStateDescription
/OgonGlobal optimizations.
/OsonFavour small code.
/OyonNo frame pointers.
/ZlonPrevents the compiler from inserting a “defaultlib” reference to each object file, which causes the C runtime to be implicitly linked.
/GyonEnables function-level linking. Prevents unused functions from being included.
/GXoffEnables C++ exception handling. You really don’t want this set if you are aiming for small executables.
/GZoffEnable run-time checks for call stack validation

You can turn on the required optimizations by using a #pragma statement at the top of your source files.

#pragma optimize("gsy", on)

Use the right Linker settings

There are several aspects of an executable file’s structure that can be adjusted with the linker settings. The linker that ships with Visual C++ is very powerful, and with the right options you can get some amazingly small programs out of it.

SwitchStateDescription
/FILEALIGN: numberonUndocumented (VC 6 only). Specified the alignment of each section in the executable, as it is stored on disk. number is in bytes, and must be a power of 2. The default is 4096, but a smaller number results in a smaller executable because there will be less padding between the sections. Whilst very small numbers can be used here, an executable sometimes won’t load correctly with a section alignment of less than the “safe” 512.
/ALIGN: numberN/ASpecifies the alignment of each section in the executable, when it is mapped into memory. Has no relation to file-alignment (above), but an apparent bug in visual studio causes this option to also effect file-section alignment, as well as the intended image-section alignment, under certain circumstances. Default is 4096, it is best to leave this value alone.
/NODEFAULTLIBonPrevents the linker from using the standard C library.
/ENTRY: functiononTells the linker the name of the function to be used as the executable entry-point.
/OPT:NOWIN98onUndocumented. Essentially the same as the FILEALIGN option. Reverts to 512 byte file alignment, instead of the default 4096 byte alignment. This setting causes slightly slower load times, but smaller executables.
/MERGE: from=toonThis option combines the first section with the second section, often reducing exe size. Be careful with section read/write attributes.
/OPT:REFonPrevents functions and data that is never used from being included.
/INCREMENTALoffThis setting enables incremental linking. Whilst this speeds up the build time of your program, it increases exectuable size.
/FORCE:MULTIPLE??Sometimes useful when you are messing with removing the C startup code.

A quick word about the /FILEALIGN and /ALIGN linker options. Playing with these settings individually is often good enough. You can reduce executable size quite considerably by setting /FILEALIGN:512. However, some magic can be achieved by setting both /ALIGN and /FILEALIGN to the same value. (Note that setting /ALIGN through the Project->Settings dialogbox erronously effects the /FILEALIGN setting also). (I’ll discuss this later on).

These linker settings can be specified using the standard project-settings dialog, or (more helpfully) by the following #pragmas.

// Set section alignment to be nice and small
#pragma comment(linker, "/FILEALIGN:0x200")

// Merge all default sections into the .text (code) section.
#pragma comment(linker,"/merge:.rdata=.data")
#pragma comment(linker,"/merge:.text=.data")
#pragma comment(linker,"/merge:.reloc=.data")

Placing just these lines into one of your project’s file will result in considerably smaller programs. Use this handy header file which you can #include in every project.

Control your Stack

It is a little known fact that Visual C++ automatically inserts what are known as “Stack Probes” into your executable if it detects (at compile time) that your program stack is going to be over 1 page-size in length (4Kb).

If you store too much data (i.e. buffers / arrays) as local variables in your functions, then this will often result in stack-probes being inserted. It’s not a big deal - the code is there to make sure that there is enough memory committed to the stack during runtime. However, it still adds a small amount of code (the _chkstk function, for example), so being careful with the amount of local-data in your functions can help to eliminate these probes.

Use an Executable Packer

An executable packer or compressor is a tools which squeezes a program down as small as possible. Basically, they work by compressing an executable file, which is added as a “payload” to a very small decompressor stub. When you run a packed executable, you are really running a small decompressor program. When executed, the stub decompresses the attached (compressed) executable and runs that. No other external decompressing programs are required.

The disadvantage of this technique is that multiple copies of your program will not share the same memory location, so more RAM is required when running multiple instances. For small programs though an executable packer can work wonders - sometimes reducing exe size by as much as 50%.

The best packer I have found is called UPX (Ultimate Packer for eXecutables). It is fast, compresses very well, and is also FREE. So what are you waiting for? Go to https://upx.github.io/ and get your copy now!

The Smallest Win32 Executable

OK, let’s put together everything in this tutorial and see how small we can get an executable to be. The following single C program compiles down to a tiny 480 bytes using Visual C++ 6.0!!

#include <windows.h>

// Make section alignment really small
#pragma comment(linker, "/FILEALIGN:16")
#pragma comment(linker, "/ALIGN:16")// Merge sections
#pragma comment(linker, "/MERGE:.rdata=.data")
#pragma comment(linker, "/MERGE:.text=.data")
#pragma comment(linker, "/MERGE:.reloc=.data")

// Favour small code
#pragma optimize("gsy", on)

// Single entrypoint
int WinMainCRTStartup()
{
   return 0;
}

That’s a pretty small program! OK, it doesn’t do a great deal, and it also only executes under NT/2K, but it just goes to show that with a little understanding of how a C program works, together with how your compiler and linker generate your application binary, you can get some pretty impressive results. I realise that it is possible to shave maybe 100 bytes further from this program - but you will need to resort to assembly language and a Hex Editor to be able to beat this all-C solution.

How does it work? The #pragma settings do alot of the work, but the basic reason that this executable is so small is that there is no c-runtime included. Why? Not because of the “no-default-libraries” linker switch, but because WinMainCRTStartup has been explicitly defined. When this program is compiled, the linker has to locate the default entry-point function (defined with the /ENTRY linker switch). By default this is WinMainCRTStartup. The linker finds this function in the the compiled object file for the program, before it looks in the import libraries for the project. Because our version WinMainCRTStartup doesn’t call any of the C-runtime startup routines, these functions never get referenced, and the linker doesn’t include them - it’s as simple as that.

Conclusion

I have presented a few techniques that you can use to keep your executables small. This subject can be pretty complicated, especially when you start to use some of the more powerful compiler and linker settings. However, you only need to perform two steps to make small programs.

  • Link with the LIBCTINY.LIB library file
  • #include the “AggressiveOptimize.h” header file

Do only these two things and you don’t need to bother with much of what I have presented. I guarantee that you will get results! Download the sample hello-world application to see how to start off.