gcc link time optimization for EDK2
November 26, 2014
work in progress

 

Introduction

As of version 4.9.2, gcc link time optimization is working reasonably well with EDK2. The code size reduction nearly matches that of the Microsoft compiler. For some EDK2 projects it is usable today, though work is needed to get every project building. A Duet boot test passes for both IA32 and X64 builds.

 

OvmfPkg X64 Release          Build Time   SECFV   FVMAIN     DXEFV    PEIFV
     SVN 16303                seconds     bytes    bytes     bytes    bytes
gcc 4.9.2 edk2 std              38       22,752  811,624  4,003,568  151,472  
gcc 4.9.2 -Os                   45       13,376  823,224  2,804,144   96,496
gcc 4.9.2 -Os -mabi=ms          44       13,408  802,728  2,768,752   95,644
gcc 4.9.2 -Os -mabi=ms -flto    70       11,392  741,584  2,427,952   56,080
VS2010 edk2 std                 37       11,584  728,536  2,346,728   55,984

 

ShellPkg Release            X64 Shell.efi    IA32 Shell.efi     AARCH64
   SVN 16303                    bytes             bytes
gcc 4.9.2 edk2 std            1,335,808         1,028,736       931,968
gcc 4.9.2 -Os                 1,046,528           862,688
gcc 4.9.2 -Os -mabi=ms        1,028,192
gcc 4.9.2 -Os -mabi=ms -flto    992,896           838,400       882,464
VS2010 edk2 std                 884,576           787,520

 

DuetPkg Debug         X64 DUETEFIMAINFV.Fv   IA32 DUETEFIMAINFV.Fv
   SVN 16303                   bytes                  bytes
gcc 4.9.2 edk2 std           2,359,296             1,703,936
gcc 4.9.2 -Os                1,769,472             1,310,720
gcc 4.9.2 -Os -mabi=ms       1,638,400
gcc 4.9.2 -Os -mabi=ms -flto   917,504               786,432
VS2010 edk2 std                851,968               655,360

 

ArmPlatformPkg Release         FVMAIN.Fv        ArmPlatformBds.efi
   SVN 16303   AARCH64          bytes                 bytes
gcc 4.9.2 edk2 std (-Os)       561,088               83,968
gcc 4.9.2 -Os -flto            334,080               49,600 
 

 

Building gcc

How can a gcc tool chain be identified as lto capable? Execute gcc -flto to check the compiler. If this option doesn't produce an error message, then the compiler can do link time optimization across object files. But link time optimization across object files is of no value to the EDK2 project because the EDK2 build process puts object files into static libraries and then links the static libraries together. In order for gcc to do link time optimization across static libraries, gcc build time configuration option --enable-plugin is required. Execute gcc -v and look for --enable-plugin in the Configured with section. If present, gcc was built with the needed plugin support. So to do link time optimization across static libraries as needed by EDK2, the gcc compiler must support the -flto option and also must show --enable-plugin in its gcc -v output.

In addition to the gcc requirements for link time optimization across static libraries, there is a binutils configuration requirement. The ar utility that builds the static library must also be built with plugin support. To meet this requirement, binutils must have been configured with --enable-plugins. To check for this requirement, run ar with no arguments so that the help text is displayed. The help text must contain: --plugin <p> - load the specified plugin.

In general, the gcc compilers found preinstalled on Linux systems will meet these requirements. On the other hand, essentially all Windows hosted gcc tool chains lack the needed gcc plugin support. Why is this? Attempting to build a Windows hosted gcc with --enable-plugins shows why. The build stops with message: "Building GCC with plugin support requires a host that supports -fPIC, -shared, -ldl and -rdynamic." Internet search shows others encounter this problem when trying to build a Windows hosted gcc using --enable-plugins.

How do we solve the plugin problem for Windows hosted gcc builds? I noticed that Windows hosted binutils can be successfully built with plugin support. Binutils uses some simple wrapper functions to route the Linux dynamic library functions to Windows DLL handling functions. I applied that same code to gcc. The next problem is to make the configure script not stop the build with the error about "-fPIC, -shared, -ldl and -rdynamic". I did this by modifying the configure.ac script and rerunning autoconf. Problem solved, so it seems at this point at least. For the patch and build instructions, see the Tool Chain Build Instructions for the gcc492lto set of tool chains. For prebuilt Windows hosted lto enabled gcc tool chains, look here (gcc492lto-*_11-04-2014.7z).

Here is an easy way to confirm your gcc tool chain can be used for EDK2 LTO builds. Create these two C files:

foo.c: int foo (int value) {return value;}
main.c: extern int foo (int a); int main (void) {return foo (0) * 0x22222222;}

Now execute these commands:

gcc -flto -O3 -c main.c foo.c
gcc-ar -rc main.a main.o
gcc-ar -rc foo.a foo.o
gcc -flto -O3 -nostartfiles -nodefaultlibs main.a foo.a -Wl,--entry,main -omain.exe
objdump -Mintel -d main.exe

The expected result is:

main.exe: file format elf64-x86-64
Disassembly of section .text:

00000000004000f0 <main>:
4000f0: 31 c0 xor eax,eax
4000f2: c3 ret

If instead you get this:

sorry - this program has been built without plugin support

.. then your compiler is built with for LTO but is missing plugin support.

 

Adding gcc link time optimization to EDK2

The next problem to solve is adding an option for gcc link time optimization to the EDK2 build process. Here are the gcc requirements for applying link time optimization to a typical application build:

1) Add -flto to the compile flags
2) Use gcc to launch ld instead of invoking ld directly
3) Include the compile flags on the link command line
4) Use gcc-ar in place of ar when building static libraries

But there is another set of requirements that affect EDK2:

5) Library code that resolves helper function calls generated by the compiler must be compiled without the -flto flag
6) These libraries must be prefixed with
-Wl,-plugin-opt=-pass-through= on the link command line.

Requirements 5 and 6 are handled automatically when using -flto with an application build. The libraries that resolve compiler generated helper function calls (libc and libgcc) are part of the compiler distribution. They are compiled without -flto, so requirement 5 is met. The default gcc specs file (gcc -dumpspecs) uses the pass-through-libs function to generate the needed -Wl,-plugin-opt=-pass-through= prefixed library arguments.

While making the EDK2 build meet requirement 5 is easy, the same is not true for requirement 6 in some cases. For example, the ARM build resolves compiler generated helper function calls with CompilerIntrinsicsLib.lib and BaseStackCheckLib.lib. The path to these libraries varies with the package name and build options. Without a significant modification to the EDK2 build tools, there is no way to generate the requires prefixed path for use on the linker command line. This limitation affects ARM builds the most because gcc ARM code uses more compiler generated helper function calls than other target architectures. For this reason, link time optimization is not enabled for ARM in the gcc LTO EDK2 patch. The use of compiler generated helper function calls is kept to a minimum in x86 code. For the IA32 builds the compiler generates a floating point helper call when building stdlib. The patch disables link time optimization for IA32 stdlib to avoid the -Wl,-plugin-opt=-pass-through= requirement.

The patch used for these tests can be found here. It adds a link time optimization option to the GCC49 tool chain for IA32, X64, and AARCH64 builds. It also adds the -Os flag to the X64 build.

The patch has been tested with Windows building but is expected to give the same results with Linux. Apply it to an unmodified EDK2 rev 16449 or similar. To use the patch from Windows, download the gcc 4.9.2 lto compilers. To use the patch from Linux, try using the same compiler normally used for EDK2 builds.

The patch uses the same environment variables as for the normal Windows hosted gcc builds. The following additional environment variables are available:

GCC49_X64_EXTRA_CC_FLAGS    Set this variable to -Wno-error to allow non-fatal warnings
GCC49_IA32_EXTRA_CC_FLAG    Set this variable to -Wno-error to allow non-fatal warnings
GCC49_AARCH64_EXTRA_CC_FLAG Set this variable to -Wno-error to allow non-fatal warnings
GCC49_ARM_EXTRA_CC_FLAG     Set this variable to -Wno-error to allow non-fatal warnings

The above environment variables must be set  to -Wno-error as shown because gcc link time optimization exposes a few warnings that do not occur with the standard builds.
 

Example environment variable settings for Windows hosted LTO builds:

set UEFI_BUILD_TOOLS=%cd%\tools
set NASM_PREFIX=%UEFI_BUILD_TOOLS%\nasm211\
set IASL_PREFIX=%UEFI_BUILD_TOOLS%\ASL\

set GCC49_BIN=%UEFI_BUILD_TOOLS%\gcc492lto-x86\bin\
set GCC49_DLL=%UEFI_BUILD_TOOLS%\gcc492lto-x86\dll\;%GCC49_BIN%
set GCC49_AARCH64_PREFIX=%UEFI_BUILD_TOOLS%\gcc492lto-aarch64\bin\
set GCC49_ARM_PREFIX=%UEFI_BUILD_TOOLS%\gcc492lto-arm\bin\

set GCC49_X64_EXTRA_CC_FLAGS=-Wno-error
set GCC49_IA32_EXTRA_CC_FLAGS=-Wno-error
set GCC49_AARCH64_EXTRA_CC_FLAGS=-Wno-error
set GCC49_ARM_EXTRA_CC_FLAGS=-Wno-error