Link fast: Improve build and launch times
Description: Discover how to improve your app's build and runtime linking performance. We'll take you behind the scenes to learn more about linking, your options, and the latest updates that improve the link performance of your app.
What is linking?
A linker is needed whenever we'd like to use code written in libraries or frameworks
Two types of linking:
- Static linking
- happens when you build your app
- can impact both your app build time and app size
- Dynamic linking
- happens when your app is launch
- can impact your app launch time
What is static linking?
Let's have a look at its history to explain what it is.
1970s
Initially programs were simple: just one source file. Building was running the compiler on your single source file and it produced the executable program.
However, this didn't scale: separating a program into multiple source files was great both for readability and also meant not re-compiling every function, every time you build.
To make multiple source file programs possible, the compiler was split into two parts:
- the first (
cc
, C compiler) part compiles source code to a new intermediate "relocatable object" .o file - The second part (
ld
, static linker) reads the relocatable .o files and produces a single executable program
Late 1970s - Static libraries
Sharing multiple .o files to share functionality across programs was cumbersome, so the concept was Static libraries was born.
At the time the standard way to bundle files together was with the archiving tool ar
, used for backups and distributions.
A static library was just multiple .o files put into an archive via ar
, resulting in a .a static library file.
ld
was taught how to read .o files directly out of an archive file .a
This worked great, however now programs size started to grow the more static libraries were used:
to solve this ld
started pull .o files from a static library only when they'd resolve some undefined symbol. This is called selective loading.
Selective loading meant that each program would only get the .o objects of a .a library that the program actually needed.
Recent ld64 improvements
ld64 is Apple's static linker
- 2x faster in Xcode 14 thanks to better utilization of machine cores
Some of the ways ld64
was improved:
- content copied in parallel from the input to the output file
- multiple
__LINKEDIT
sections built in parallel - UUID computation and codesigning hashes done in parallel
- Optimized algorithms (e.g., exports-trie builder now uses C++ string_view objects to represent the string slices of each symbol)
- Accelerated UUID computation (now SHA256)
- adoption of latest crypto libraries which take advantage of hardware acceleration when computing the UUID of a binary
Static linking best practices
When to use static libraries?
- stable code is good
- consider moving code under active development out of a static library to reduce build time
Linker options
- Some options that can effect your app link time.
- Add them in your target/product Build Settings tab, on Other Linker Flags
-all_load
(-force_load
)
- Selective loading from archives has slows down the linker, because, to make builds reproducible and follow traditional static library semantics, the linker has to process static libraries in a fixed, serial order
- If you don't need this behavior, pass
-all_load
to the linker - this flag enables linker to parse all .a files in parallel, just like .o files
- May get duplicate symbol errors if multiple static libraries offer the same symbol:
- if your app does clever tricks where it has multiple static libraries implementing the same symbols, and depends on the command line order of the static libraries to drive which implementation is used, then this option is not for you
-all_load
may make your program bigger because "unused" code is now being added in. To compensate for that, you can use the linker option-dead_strip
, that will cause the linker to remove unreachable code and data
-no_exported_symbols
- one part of the
__LINKEDIT
segment that the linker generates is an exports trie, which is a prefix tree that encodes all the exported symbol names, addresses, and flags. - whereas all dylibs need to have exported symbols, a main app binary usually does not need any exported symbols (because, usually nothing ever looks up symbols in the main executable)
- with this flag the linker skips the creation of the trie data structure in
__LINKEDIT
- ⚠️ trie exports are required if:
- the app loads plugins
- you use XCTest with your app as the host environment to run XCTest bundles
To see how big is your trie (and check whether it's worth skipping its creation) run:
$ dyld_info -exports /path/to/binary | Wc -l
-no_deduplicate
- the linker combines C++ functions that have the exact same code but different name (happens a lot due to C++ template expansions)
- this is an expensive algorithm
- the linker only looks at weak-def symbols, which are the ones the C++ compiler emits for template expansions that were not inlined
- this is an app size optimization
- Xcode adds this flag for Debug builds (builds faster, but app binary is slightly larger)
- Cland adds this flag if you run clang link line with
-O0
- use
-no_deduplicate
in your custom Debug builds systems
What is dyamic linking?
Let's start from its history to explain what it is.
Early 1990s - Dynamic libraries
(picking up from where we left with static linking's history)
The more static libraries a program uses:
- the slower a program linking time is (at build time)
- the bigger a final executable size is
Instead of using ar
for libraries (and output a .a file), we could ld
for libraries (and output a .dylib file). These new libraries are called dynamic libraries ("dylibs"), also known as DSOs or DLLs on other platforms.
The difference now is that ld
treats linking with a dynamic library differently:
- instead of copying code out of the library into the final program, the linker just records a kind of promise
- That is, it records the symbol name used from the dynamic library and what the library's path will be at runtime
Thanks to this:
- the linker no longer get copies of library code in your executable
- your program's static link time is now proportional to the size of your code, independent of the number of dylibs you link with
- your program executable is only effected by your code (and other static libraries, but not dynamic ones)
When executing, the Virtual Memory system will reuse the same loaded dynamic library across multiple processes (that need to use that .dylib)
Cons:
- slower launch time
- launching the app is no longer just loading one executable, but also all the associated .dylib which then need to be linked
- more dirty memory (
__DATA
pages)- with static libraries, the linker would co-locate all globals from all static libraries into the same
__DATA
pages in the main executable - with dynamic libraries, each library defines its own
__DATA
page
- with static libraries, the linker would co-locate all globals from all static libraries into the same
- requires a runtime linker (dynamic linker)
dyld
will need to fulfill the promise made during build time, which means it need to resolve the promised symbols to your executable
Inside dynamic linking
- An executable binary is divided up into segments, such as
__TEXT
,__DATA
,__LINKEDIT
- segments are always a multiple of the page size for the OS
- each segment has a different permission
__TEXT
, segment that contains your code, has execute permissions: the CPU may treat the bytes on the page as machine code instructions
- At runtime,
dyld
has tommap()
the executables into memory with each segments' permissions
All the above is true for .dylibs as well
Fixups
- the main executable has various pointers to symbols belonging to the
.dylib
s - those pointers (and other memory allocations) cannot be known until runtime
- because
__TEXT
segments cannot change (at least in system based on code signing), these dynamic symbols/call sites becomes a call to a stub synthesized by the linker in__TEXT
- the stub loads a pointer from the
__DATA
segment, and jumps to that location - unlike
__TEXT
, the__DATA
segment can change at runtime (when copied into memory) - hence, when
dyld
is resolving our promises, really it's just setting the correct pointers into our executable__DATA
segment (in memory) pointing to the relevant .dylib symbols (also loaded into memory)
The __LINKEDIT
segment contains the information dyld needs to drive what fixups are done
Three kind of fixups:
- rebases
- when a dylib or app has a pointer that points within itself
- needed due to ASLR, meaning that even internal pointers need to be rebased at app launch
- binds
- symbolic references
- their target is a symbol name (not a number like in rebases, as those were just offsets of the targets within the image)
- e.g., a pointer to the function
malloc
:- the string
_malloc
is stored in__LINKEDIT
dyld
uses that string to look up the actual address ofmalloc
in the exports trie of libSystem.dylib- Then,
dyld
stores that value in the location specified by the bind
- the string
- chained
- new this year
- makes
__LINKEDIT
smaller - instead of storing all the fixup locations,
__LINKEDIT
stores just where the first fixup location is in each__DATA
page, as well as a list of the imported symbols - the rest of the info is encoded in
__DATA
- it's called chained because, in the 64-bit pointer location in
__DATA
, some of the bits contain the offset to the next fixup location (hence we jump from fixup to fixup in a chain-like manner) - supported when deploying to iOS 13.4 and later
How dyld works
dyld
starts with the main executable, and it parses the mach-o find the dependent dylibs (the promised dynamic libraries your executable needs in order to run)- For each .dylib,
dyld
finds them andmmap()
s them - then
dyld
does step 1-2 recursively for each .dylib - once everything is loaded,
dyld
looks up all the bind symbols needed - once the lookup is completed,
dyld
uses those addresses when applying fixups - once all the fixups are done,
dyld
runs initializers, bottom up
Since 2017, steps 1 to 4 are cached, as they're the same at every app launch (they need to be redone only on app/OS updates).
Recent dyld improvements
dyld is Apple's dynamic linker
Page-in linking
dyld
no longer applies fixups to all dylibs at launch- instead, the kernel applies fixups to your
__DATA
pages lazily, on page-in- it has always been the case that the first use of some address in some page of an
mmap()
ed region triggered the kernel to read in that page - now, if it is a
__DATA
page, the kernel will also apply the fixup that page needs
- it has always been the case that the first use of some address in some page of an
- Apple OSes had special case of page-in linking for over a decade for OS dylibs in the dyld shared cache.
- this year page-in linking has been generalized and made it available to everyone
- reduces dirty memory
- reduces launch time
DATA_CONST
pages are clean, which means they can be evicted and recreated just like__TEXT
pages, reducing memory pressure- available in iOS 16, macOS 13, and watchOS 9
- requires chained fixups, thus requiring your app to target iOS 13.4 or later
- because with chained fixups, most of the fixup information will be encoded in the
__DATA
segment on disk, which means it is available to the kernel during page-in
- because with chained fixups, most of the fixup information will be encoded in the
- does not work for
dlopen()
, just dylibs linked at launch (in this casedyld
does the fixups during thedlopen
call
Dynamic linking best practices
- use fewer dylibs
- optimize or eliminate static initializers (code that always runs pre-
main
) - find sweet spot for static vs. dynamic libraries
New tools
Two new tools:
dyld_usage
- cli tool that logs dyld operations (similar to
fs_usage
) - available in macOS 13
- uses same mechanism as Instruments.app to trace
dyld_info
- allows you to inspect binaries (similar to
nm
orotool
) - can show info about mach-o files and dylibs in the dyld cache (
dyld_info
tool uses the same code as dyld and can thus see files/binaries not on disk) - available in macOS 12
- how to view the exports (will show all the exported symbols in the dylib, and the offset of each symbol from the start of the dylib):
$ dyld_info -exports /path/to/bin
- how to view the fixups:
$ dyld_info -fixups /path/to/bin