Here is an old article: reading notes written in June 2012 (original link). These notes were written for my own reference, and they contain many of my own ideas, which may mislead readers. However, the article is too long, and I don’t have time to revise it. Feel free to criticize.

Recently (in the first half of 2012), under the recommendation of Jiahua Guo, I read the book “The Self-Cultivation of Programmers - Linking, Loading, and Libraries” from the LUG library, and I felt like I had found a treasure. However, the final exams are approaching, and I don’t have time to finish the whole book, so I only wrote a part of it.

There are two ways to build software: one is to make it so simple that there are obviously no defects, and the other is to make it so complicated that there are no obvious defects; the former is much more difficult.

——Hoare in the Turing Award speech “The Emperor’s Old Clothes”

0 Idle Talk About Compilation

0.1 Executable File ≠ Compilation + Assembly

We know that C language can generate assembly code through the compiler, and then generate machine code through the assembler. This seems to be no problem for the small programs of dozens or hundreds of lines that we wrote during the C language learning stage. After all, the entire source code is in the same .c file, just need to handle the library files.

For such application scenarios, I imagine the execution process:

  • The machine code of the program is directly "copied" from the hard disk to the memory (without considering the possible problems);
  • The program starts from a fixed point (such as virtual address 0x80000000), and the operating system only needs a jump instruction;
  • The program needs to obtain environment variables and parameters, so it is stipulated that argc is stored at 0x70000000, followed by argc pointers pointing to parameter strings (argv); environment variables can also be stipulated at an address;
  • This fixed point is placed by the compiler with a piece of "stub code", the only function is to push argc and **argv onto the stack, call the main function; when returning from the main function, restore the stack to the initial state, and return the return value of the main function to the operating system.
    My initial idea of the compilation process: printf and other standard library functions must have source code, so as long as the source code of these library functions is copied into the source file, it can be compiled, and there will be no problem of function name not found. At this time, the executable file = compilation + assembly.

Of course, the source code of the library function may be very long, and it is too time-consuming to compile every time. In addition, some commercial compilers may not be willing to provide source code. Therefore, these libraries need to be compiled in advance, and then these compiled functions are assembled together. At this time, the executable file seems to be not generated by compilation and assembly, how should these pre-compiled libraries be integrated with the newly compiled binary files?

My improved compilation process: Each standard library function is compiled into a piece of machine code, stored in a file named after the function name.

  • According to the library functions used and their machine code lengths, arrange the positions of each library function;
  • Compile the program written by the user, generate machine code, and the call address of the library function has been determined;
  • Put the machine code of each library function in the position just arranged.
    If the compiler only needs to solve the "pre-compilation" of standard library functions, such a mechanism seems to be enough; however, actual projects cannot have only one C file. The simplest way is to treat all C files "equally", just paste them all into a large C file. But C language is different from object-oriented languages, there is no encapsulation level above the function, so the function name has to be very long to avoid conflicts between functions of different modules.

0.2 Encapsulation

Because I have never written a program with more than 1000 lines in C language, I didn’t understand the importance of the extern keyword in C language until I read the Linux kernel source code. Functions modified by this keyword can be called across files. It can be understood that “file” has become a kind of “namespace” or “class”, other files included are classes with inheritance relationship with it, variables and functions declared as extern are protected, and other variables and functions are private. Of course, the include mechanism of C language is far inferior to the inheritance mechanism of object-oriented languages, but with the powerful assistant Makefile, the logical relationship between source code files can be sorted out.

Jiahua Guo and I once debated the necessity of Makefile. I thought at the time that the rules for variable naming in the kernel are quite strict, so let each file include the (n-1) other files in the kernel, and there is no need to bother writing Makefile (not considering the problem of generating bzImage and other compressed formats). Jiahua Guo raised three questions:

  • If only a non-core source file is modified, according to my design, all files need to be recompiled, while Makefile can judge which files are affected and need to be recompiled based on timestamps and dependencies, reducing the time required for compilation;
  • An important mechanism of the kernel is the kernel module. Without Makefile, there would be no modules to speak of, and the extensibility of the entire kernel would be greatly reduced; of course, a "simplified version of Makefile" could be used to define kernel modules, but Makefile is already good enough;
  • Despite rigorous naming, the kernel of tens of millions of lines inevitably has the problem of "misfiring" of extern symbols, and it is better to explicitly specify dependencies.
    More seriously, this is a problem of encapsulation. I have always believed that a module of a program should be "fully open" to other modules, so as to facilitate understanding of the internal principles of the other module and write "targeted" code; I know this violates the basic principle of encapsulation. The first edition of "The Mythical Man-Month" also elaborated on this concern about encapsulation, but in the twentieth anniversary edition, Brooks changed his view and believed that encapsulation was correct. Microsoft's Office series of programs have a close (or messy) understanding of each other's internals, which to some extent makes the integration of the series of software higher, but it pays a heavy price in terms of the overall stability of the software, not to mention interoperability with other software. Writing a program with a messy internal understanding is easy, but subsequent maintenance work will drive people crazy. M$ is good at first making a "usable" system, and then patching it up, which is more based on commercial considerations; it is incompatible with the perfectionist UNIX culture.

In fact, the reason why my own programs have not become a mess is because clear boundaries have been defined at the beginning of the design, but these boundaries have not been explicitly specified. But in multi-person cooperation, relying on documents to specify these interfaces and boundaries is far less than using the built-in mechanisms of the programming language, and defining the interfaces and relationships of each module in a logical form (such as include, class inheritance, Makefile). “The system is dead, people are alive”, relying on self-consciousness to maintain the internal boundaries of the code, it is better to set up a system, “unchanged for a hundred years”, and modify the system when encountering major changes that cannot be bypassed. After a year of thinking, I believe that the common view in the software engineering community is correct.

0.3 ABI

I digress. I mentioned function calls earlier, and in fact, this has already brought up the issue of binary interfaces. When I first learned C language, the teacher said that the function call method of C language is for the caller to push the parameters onto the stack, and the function takes the parameters out of the stack; now think about it, if the number of function parameters is very small, and the entire function is not suitable for inline, then the overhead of repeatedly pushing the stack is relatively large, why not specify that the first parameter is in the first register, the second parameter is in the second register…? In fact, stack passing parameters and register passing parameters are two different methods. It is easy to imagine that if different compilers adopt different parameter passing methods, or even different compilation parameters of the same compiler specify different parameter passing methods, then the generated binary files cannot be linked together for use.

The issues of ABI (Application Binary Interface) also include (I don’t understand C++, so I didn’t list those related to C++):

  • The size of built-in types (int, float, etc.)
  • The size of storage
  • Alignment method (align by 4 bytes or 8 bytes)
  • The storage method of composite types (such as struct, union) (such as the storage order of bit fields)
  • The naming and resolution method of external symbols
  • The method of parameter passing, the method of return value passing
  • The distribution method of the stack, such as the order of local variables and parameters entering the stack
  • In function calls, whether the register is saved by the caller or the callee, including which registers
    The makers of the C language standard adopted an evasive strategy, and these issues were defined as "implementation-defined" in the C99 standard. Parameters like the size of built-in types are defined in header files such as limits.h, while many more issues are opaque to developers. As time goes by, the ABI of compilers under UNIX systems is moving towards LSB (Linux Standard Base) and Intel Itanium C++ ABI, leaving GCC and MSVC two camps that are difficult to compromise in a short time.

1 Linking

To implement a project composed of many C files, you need to first compile each C file into a kind of “intermediate format”, and then combine them. This intermediate format is “object code”, and the process of combining is “linking”. The linking process seems simple at first glance, but there are many problems when you think about it:

  • When compiling, you don't know the existence of other files, so when calling a function within the same file, which memory address should you refer to? More specifically, after the final generated binary file is loaded into memory, can the virtual memory address of each function be determined? If it is arbitrarily specified, the function addresses of different C files will conflict.
  • The external symbols (such as functions, variables) referenced during compilation are unknown, how to reference them?
    The first problem is not difficult to solve. Although the virtual memory address of each function cannot be determined, the offset relative to the call point can be determined. The processor's addressing mode has relative addressing, so relative addressing can be performed between the target codes within the same file.

The second question: It’s not a good idea to let the C compiler pre-load all the source code to be compiled, “pre-compile”, arrange all symbol positions, and “officially compile”. First, it is impossible to compile the library into binary files for distribution. Every time, all the code must be compiled together with the source code of the library, which is too time-consuming and unacceptable; secondly, the C compiler takes on too many responsibilities, becoming a huge and bloated whole, which is not conducive to the modularization of the compilation system. In UNIX, efficient process generation, rich and concise inter-process communication mechanisms (such as pipes, redirection, sockets), and the unified concept of “everything is a file” encourage many collaborative small tools (such as cc, as, ld, make) to form a large system with clear internal boundaries.

Since there is no concept of “function” and “variable” in machine code, everything is a memory address. The machine code compiled from a single file is difficult to represent references to external functions and variables. It is also difficult to describe clearly which external symbol is referenced and the offset relative to the symbol in the small space of internal relative addressing offset.

My idea is that since it is impossible to check symbols from machine code, let’s reverse the process and build an index. Store a table in the “intermediate format”, which stores the instruction positions of all external functions and variables referenced in the machine code, and the names of the referenced symbols. The relative addressing offset in the corresponding instructions in the machine code only needs to store the offset relative to the variable base address (it cannot simply fill in 0, the address of the array element int a[10] needs to be added to the symbol a address plus 10*sizeof(int)).

During linking:

  • Bring all the "intermediate format" files that need to be linked, get all the symbol lists and their required storage space;
  • Arrange the positions of all symbols in virtual memory;
  • According to the "reference table", find all the instructions that need to be modified, and modify their addressing offsets to the correct values in turn. This is easy to calculate: relative addressing offset = target symbol's virtual memory position - current instruction's virtual memory position + offset relative to symbol base address (the value saved in relative addressing offset earlier). At that time, I didn't consider absolute addressing, but it's similar: absolute addressing offset = target symbol's virtual memory address + offset relative to symbol base address. Of course, to determine the addressing method used and the position of the offset in the instruction, it must be "prescribed" according to the format of the instruction.
    This is the legendary "relocation" process, and this "reference table" is the "relocation table", and the "intermediate file" is the "relocatable file".

I was still worried about where to store this “reference table”, how to separate these data from the machine code? Fortunately, the designers of the executable file format have considered this issue: the target file contains not only the code and data to be loaded into memory.

2 Target File

2.1 Target File Structure

The target file cannot just be a bunch of machine code. Many file formats have a magic number at the beginning of the file, for example, the first line of the script file is “#!/path/to/interpreter”, and the first 7 bytes of Microsoft’s Word 97/2003 document are D0CF11E. These magic numbers are used to query file types using commands like file, and to call related programs to open files in the Linux desktop environment; for executable files, they have a more important meaning: the execve system call in Linux will read the first 128 bytes of the file, match the appropriate executable file loading process, for example, seeing a file starting with the two bytes “#!” knows that it should call the interpreter after “#!” to interpret and execute, seeing a file starting with the four bytes “0x7F e l f” knows that it is an ELF executable file, seeing a file starting with the four characters “cafe” knows that it is a Java executable file (why use cafe?).

In addition to the magic number, the header of the ELF file also needs to specify the ELF file type (relocatable? executable? shared target file? The type of file in UNIX is not determined by the extension), ELF version (adding version information in the file format helps to improve scalability), running platform, ABI, segment table descriptor and other information. If it does not conform to the current environment, the kernel will refuse to execute, rather than finding an error halfway through and exiting inexplicably, which is also a kind of error prevention mechanism.

What information needs to be stored in the target file?

  • Code and data are obviously needed, and they must be separated, because the properties of code are generally read-only and executable, and the properties of data are generally readable and writable, and not executable.
  • Is all the data in the program readable and writable? Variables declared with const, string constants do not need to be written, so mapping them as read-only when loading can enhance the security of the program; the burner in the embedded system can write them into ROM, saving precious RAM space. It seems that a read-only data segment is needed.
  • In computer competitions, we often declare a large global array a[2000][2000] at the beginning of the program. This array obviously cannot be stored in the data segment, otherwise the size of the executable file would be several MB. Therefore, uninitialized data is stored in the BSS segment. The routine for loading executable files in the operating system initializes the process, and it is enough to allocate such a large memory space.
  • The relocation table mentioned earlier also needs to be stored.
  • To support debugging tools such as single-step execution, new segments need to be added to store information such as source code line numbers (.line).
  • Compiler version information (.comment), additional compiler information (.note)
  • Dynamic linking information to be mentioned later (.dynamic)
  • Program initialization and termination code, used for C++ global construction and destruction. Can these codes be placed at the beginning and end of the code segment? No, because when multiple target files are linked together, these initialization operations still need to be placed at the beginning and end of the new target file, and the linker cannot distinguish which ones in the code segment are initialization codes. Logically different things should not be put together; as long as one kind of information does not waste too much space or affect performance, it can be retained; if you save trouble and discard the information, then the later processing process wants to use it again, it is impossible.
    Since the target file needs to store various types of information, a sectioning mechanism is needed to put each type of information in a section, which is logically clear. Here, "section" is often referred to as "segment", but don't confuse it with "segment" in memory. Segmentation has its advantages and disadvantages. One of the troubles is that the offset information stored in the ELF file needs to specify both the segment name and the offset in this segment. The most important structure in the ELF file besides the file header is the section header table, which describes the information of each section (segment) of the ELF in the form of a structure array, such as section name, section length, offset in the file, read-write permission, etc.

2.2 Symbol Table

In the discussion about the relocation mechanism, my initial idea was to save the symbol name strings in the relocation table. We, spoiled by high-level languages, can treat strings as basic data types, but in the structure of C language, variable-length strings need to be pointed to a space outside the structure for storage. So, should the strings be placed at the end of each section, or concentrated at the end of the entire target file, or scattered in any position? In any case, the haphazardly placed strings have damaged the uniformity of the file format. The program also needs to convert between pointers to strings in memory and offsets of strings in the file, and the lack of a unified mechanism is simply a nightmare.

In the ELF file format, there are two dedicated string tables (sections): .strtab for ordinary strings, such as symbol names; .shstrtab for strings used in the section table, such as section names. (I don’t understand why the section table gets special treatment) The storage method of strings is very simple, with each string ending with “\0” as the boundary. Noting that the section names themselves are also stored in the string table, finding the section where the string table is located becomes a “chicken and egg” problem. In fact, e_shstrndx in the ELF file header is the index of the .shstrtab section in the section table, and the offset of the section table in the file is specified by e_shoff in the ELF file header. There are quite a few such interlocking things in the ELF file format.

With the mechanism to save strings, storing various symbols only needs to specify their index in the string table (i.e., which string). Thus, function names and variable names that “cannot fit” in machine code can be placed in the string table, which requires a mapping from the position of the symbol in the machine code to the symbol name in the string table. This mapping is the “symbol table” (.symtab section). In fact, each structure in the symbol table not only describes the section where the symbol is located, the offset of the symbol in the section, and the index of the symbol name in the string table, but also describes the symbol type (data object, function, section, file name, etc.), the size of the data type corresponding to the symbol, etc.

The symbol table plays a key role in the interpretation process of dynamic languages. For example, in PHP, “variable variables” (i.e., $a="varname"; $varname=100; $$a equals 100) and “execution of dynamically generated code” and other “terrifying” functions are implemented in the PHP interpreter with a mapping from variable names to storage addresses in memory.

In fact, PHP’s Zend engine internally uses the zval structure to represent variables. Implementing weak types with strong types is not complicated:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
typedef struct _zval_struct {
zvalue_value value; // 变量的值
zend_uint refcount; // 引用计数,用于写时复制
zend_uchar type; // 变量类型
zend_uchar is_ref; // 是否是引用(如果是引用,则写时不复制)
} zval;

typedef union _zvalue_value {
long lval; // 整数类型(包括bool)、资源类型(用整数表示资源序号,类似C中的文件描述符)
double dval; // 浮点类型
struct {
char *val;
int len;
} str; // 字符串类型
HashTable * ht; // 关联数组类型(关联数组类似Python中的字典)(用hash表存储)
zend_object_value obj; // 对象类型
} zvalue_value;

This is not a difficult solution to think of. When I was thinking about making an online C language interpreter, I proposed a similar solution to represent strong C variables in weakly typed javascript.

In PHP’s execution engine Zend, variables (zval) are stored in a hash table form of the symbol table, with the key being the variable name and the value being zval. The global symbol table stores variables in the top-level scope (i.e., not in any class or function), and each function and class method has its own symbol table during execution. When a function or class method is called, a symbol table is created for it and set as the active symbol table. All variables defined within the function are stored in this symbol table, and this symbol table is destroyed when returning from the function. The global symbol table is used only outside of functions or methods. PHP has a strange rule that global variables need to be declared with global, which is probably related to the symbol table.

PHP functions are global, so they are not stored in the symbol table. Functions are divided into internal functions (internal function) and user functions (user function), internal functions (functions in the PHP core and extensions written in C) are stored in the function table, and user functions (functions written in PHP) point to their Opcode (intermediate code) sequence. Since the focus of this article is not PHP, interested readers are advised to refer to the zend_internal_function, zend_op_array, zend_function three structures themselves. Methods in classes have scope (only valid for instances of the class), so all three of the above structures have a pointer to the “class” (zend_class_entry). When executing a function, if a function within the scope is found in the internal function table, it is called directly; otherwise, a function within the scope is searched for in the user functions, and zend_execute is called to execute its opcode.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
struct _zend_op {
opcode_handler_t handler; // 处理函数,与opcode对应
znode result; // 操作结果
znode op1; // 操作数1
znode op2; // 操作数2,不是每个操作都会同时使用result、op1、op2
ulong extended_value; // 执行过程中需要的其他信息,是一组flags
uint lineno; // 源码中的行号(调试和错误处理时用)
zend_uchar opcode; // 操作类型,可以认为是指令(例如FETCH_W是以写的方式获取变量到“寄存器”,ASSIGN是赋值,ECHO是显示)
}

typedef struct _znode {
// 操作数类型:常量、变量(用户可见的)、临时变量(引擎内部的)、编译后的变量(有点像寄存器,为了避免每次使用变量都去hash表里查询,效率太低)
int op_type;
union {
zval constant; // 常量
zend_uint var; // 变量(可见、临时)
zend_uint opline_num; /* Needs to be signed */
zend_op_array *op_array;
zend_op *jmp_addr;
struct {
zend_uint var; /* dummy */
zend_uint type;
} EA; // 编译后的变量
} u;
} znode;

Returning to the symbol table in the C language target file, there is a problem that does not seem serious at present: the same name extern symbols in different files represent the same symbol, and duplicate definitions are not allowed; however, assembly language programs do not have an extern mechanism, if an assembly language program defines a main function, then all C programs linked with it cannot define a main function. Unlike PHP where each scope has a dynamic symbol table, the symbol table in the target file can only have one, and it must be determined at compile time. For this reason, UNIX C language stipulates that all global symbol names are prefixed with an underscore, called symbol decoration; currently, MSVC retains this tradition, while GCC has removed it by default. However, in C++ language, symbol management is not so simple. First, C++ allows multiple functions with different parameter types to have the same name, which requires the decorated symbol name to reflect the type information of the parameters; second, the same name symbols can exist in different namespaces and classes, which requires the decorated symbol name to reflect the information of the namespace and class. The specific symbol decoration strategy is up to the individual.

Understood the format of the target file, the link mechanism mentioned earlier still needs to refine a few details:

  • For relocatable files, the offset of the relocation entry is the offset of the first byte of the position to be corrected (not the first byte of the instruction) relative to the start of the segment (not the offset relative to the start of the relocatable file); for executable files or shared object files (for dynamic linking), the offset of the relocation entry is the virtual address of the first byte of the position to be corrected, which will be used in dynamic linking.
  • How to know the names of symbols in other files? To know, machine code only has memory addresses, not symbol names. In fact, all exported (extern) symbols are stored in the symbol table of the target file.
  • The relocation table does not actually store the name strings of external symbols, but stores the type of the relocation entry (specifying different modification strategies for addressing offsets) and the index of this symbol in the symbol table.

2.3 Weak Symbols and Weak References

When we write programs, we sometimes want functions to have a variable number of arguments, such as a function where the last few arguments are rarely used, so we don’t write them to use the default values, and fill in these arguments when needed. This can be implemented with the variable argument mechanism of C language. Similarly, we may want to select some functional modules from a library, make several versions with different functions, without changing the linking characteristics of the library; or use custom library functions to override the functions in the library. These can be well implemented in C++ with class inheritance and function overloading, but what about C language users? GCC provides a “weak symbol” mechanism, which requires adding the

1
__attribute__((weak))

keyword before the symbol definition. The ordinary symbols corresponding to this are called “strong symbols”.

  • It is not allowed for the same strong symbol to be defined multiple times; ("definition" and "declaration" are two different things)
  • If a symbol is a strong symbol in one definition and a weak symbol in other definitions, choose the strong symbol;
  • If a symbol is a weak symbol in all definitions, choose the one that occupies the most space.
    The compiler treats the definition of uninitialized global variables as weak symbols, so that the linker can determine their size during the linking process and finally allocate space in the BSS segment. Therefore, the same global variable can only be initialized once, and uninitialized same-name global variables can appear in several files. This is also necessary: some common header files define some common global variables, each source file directly or indirectly includes them, then each relocatable file compiled from the source files contains the definitions of these global variables, these relocatable files must be able to link normally.

Weak symbols in ELF files set the “symbol in segment” to SHN_COMMON. Correspondingly, symbols not defined in the current ELF file are set to SHN_UNDEF, and symbols containing absolute values (including strong symbols, file names, etc.) are set to SHN_ABS.

Symbols have strong and weak definitions, so what about symbol references? GCC also provides a “weak reference” mechanism. By adding the

1
__attribute__((weakref))

keyword before the symbol declaration, if the symbol is not defined, GCC will not report an error, but replace it with a special value (0), so the program has the opportunity to run without providing this function, instead of failing to link, making the program’s functions more convenient to trim and combine.

Weak references in ELF files set the “symbol binding information” to STB_WEAK. Correspondingly, global symbols visible to the outside of the target file (such as those declared with extern) are set to STB_GLOBAL, and local symbols not visible to the outside are set to STB_LOCAL.

3 Dynamic Linking

3.1 The Dilemma of Loading

We already know that executable files are divided into several segments, some of which are data, some are code, and some are string tables and other information that does not need to be loaded into memory at runtime. Therefore, starting a process is not as simple as reading the file into memory and jumping to the start address. Of course, it’s not very difficult to judge whether a segment needs to be loaded.

The crudest method is to put all the libraries that the program depends on into the executable file at the time of linking, and just map each section in virtual memory to a segment in physical memory at the time of loading. But this has serious problems: for example, every C program depends on the libc library, so in this way, there will be thousands of copies of the libc library on the disk, and each process in memory also has a copy of the libc, causing a great waste; and if the program needs to update a module, it has to download the entire executable file again.

After the invention of virtual storage, with the support of hardware, the logical address and physical address of the program are “decoupled”, and it is only necessary to maintain the mapping from virtual address to physical address. Assuming that the hardware does not have a virtual storage mechanism, my idea is to adopt the “all executable files are dynamically loaded” mechanism; when the operating system creates a process, it finds a piece of available space in physical memory, relocates the executable file according to the desired loading position, that is, corrects all absolute addressing offsets; the reason why the relocation table is not needed is that processes do not need to call each other. Of course, the significance of virtual storage also includes providing logical isolation between processes and distinguishing between user mode and kernel mode, which are difficult to solve without hardware support.

3.2 The Concept of Runtime Loading

In fact, when I started to conceive the mechanism of dynamic linking, my thinking was completely similar to dynamic languages. (All discussions in this section ignore the mechanism of virtual memory)

  • All dynamic link libraries retain relocation information. (The current system is also like this)
  • The operating system maintains a "dynamic link library table", which records all the dynamic link libraries in the system and the declarations of their exported functions.
  • The operating system provides a system call to execute dynamic link library functions: execfunc(char *function_name, …).

    • The first parameter of execfunc is a pointer to function_name
    • The subsequent parameters are the parameters of the original function (similar to the variable parameter mechanism of C language);
    • function_name is a string constant in the executable file;
    • The return value of execfunc is stored in the place stipulated by ABI (such as specifying a register for integer type and floating point type);
    • execfunc is not a function in the sense of C language, because its return value type is uncertain.
  • The operating system provides a system call to query the dynamic link library table: queryfunc(char *function_name), which returns the parameter and return value types of this function for the compiler to handle.

  • When the compiler encounters a function name that cannot be "internally resolved":
    • Query the dynamic link library table;
    • Change the target address of the call instruction of the function calling the dynamic link library to execfunc (ignoring the details of the system call mechanism for the time being);
    • Add a string constant function_name to represent the function name;
    • Add an instruction to push the pointer of the string constant onto the stack before the call instruction (assuming that the ABI stipulates that parameters are passed through the stack, and the order of pushing onto the stack is also consistent with this example);
    • Handle the return value of execfunc according to the requirements of the original program (such as assigning it to other memory units or registers).
  • The program loads and executes normally, just like there is no dynamic link.
  • When the program executes to what was originally a function calling the dynamic link library, it is actually calling execfunc.
  • As the name suggests, execfunc:
    • Find the dynamic link library where this function is located from the dynamic link library table of the operating system;
    • If this dynamic link library has not been loaded yet, find a memory space to load this dynamic link library; relocate the target file; load the relocated target file;
    • Call the dynamic link library function with its own variable parameters (if C language is not easy to implement, the operating system can completely use assembly language to implement this code)
    • Return the return value of the dynamic link library function to the caller (C language also does not have a "dynamic return value type", so it also needs to use assembly language)
  • In this way, every dynamic link library call in the original program goes around in execfunc, but it can finally complete the task. Of course, execfunc and queryfunc do not have to be mechanisms of the operating system, they can also be internal mechanisms of the C compiler, but the implementation of these functions must exist in every executable file that needs dynamic linking; writing the loader in the executable file is a big overhead, and the compatibility is not good.
    In fact, a set of PHP programs I wrote implements a similar dynamic loading mechanism, but instead of modifying the code by the PHP interpreter, it uses PHP's error handling mechanism to capture the exceptions thrown by illegal function calls, find and load the corresponding library from the list, and then return to the original position to re-execute the PHP code that triggered the exception. As a dynamic language, PHP's loading of external files is inherently dynamic, and the call of functions is also "look up as you go" from the hash table, so this mechanism is quite appropriate.

However, in C language, such a mechanism will add a lot of overhead to each function call of the dynamic link library. If the program can modify the code segment during execution, there is a “load on demand” mechanism that only loads once:

  • The operating system still needs to maintain the dynamic link library table.
  • The operating system provides a system call to load dynamic link library functions: `void loadfunc(char *function_name)`.
  • The first parameter of loadfunc is a pointer to function_name;
  • `function_name` is a string constant in the executable file;
  • The operating system still provides system calls to query the dynamic link library table.

  • When the compiler encounters a function name that cannot be “internally resolved”:

    • Query the dynamic link library table;
    • Change the target address of the call instruction of the function calling the dynamic link library to loadfunc;
    • Add a string constant function_name to represent the function name;
    • Add an instruction to push the string constant pointer onto the stack before the call instruction (assuming that the parameters are passed through the stack, other parameters have been pushed onto the stack before this instruction).
  • The program loads and executes normally, as if there is no dynamic linking.
  • When the program executes to what was originally a function call to the dynamic link library, it is actually calling `loadfunc`.
  • The highlight of `loadfunc` is here:
    • The string constant pointer (the first parameter) is popped from the stack, and other elements in the stack (such as the return address) are adjusted accordingly;
    • Find the dynamic link library where this function is located from the dynamic link library table of the operating system;
    • If this dynamic link library has not been loaded yet, find a memory space to load this dynamic link library; relocate the target file; load the relocated target file;
    • Here are a few key assembly instructions: set the code segment to writable; (a bit of a hacker's meaning)
    • Based on the return address in the call stack, modify the instruction corresponding to the return address (i.e., call `loadfunc`) to call `function_name`; (modify the addressing offset)
    • Modify the instruction before call `function_name` (i.e., pushing the string constant pointer) to `nop`; (it's no longer needed)
    • Modify the return address to its previous instruction; (to call `function_name` after returning)
    • Set the code segment to read-only (optional).
  • After `loadfunc` returns, it will call `function_name`, and the parameters are already prepared, and we haven't touched the return value part of the function at all, the first call will proceed smoothly;
  • For subsequent calls to `function_name`, except for a few more `nop` instructions between parameter pushing and call than usual, there is no difference, `loadfunc` has retired behind the scenes.
    Of course, making the code segment writable to load the dynamic link library seems a bit discordant :)

The above are possibilities, but not facts. According to “Principles of Programming Languages”, from 1936 to 1945, German scientist Konrad Zuse designed a series of complex computers and the Plankalkul language (published in 1972) using electromagnetic relays. This language is amazingly complete: the basic data type is byte, based on which integer and floating-point data types are constructed, and it also includes arrays and records (which can be nested). In terms of control structure, this language includes iteration statements similar to for and selection statements similar to if. Of course, the notation of this language (in our view) is strange. Zuse’s “sample programs” include array sorting, testing graph connectivity, calculating square roots, parsing simple logical formulas, and a 49-page chess algorithm. I’m a bit skeptical whether this was forged by someone in 1972. If not, according to the book, “we can only guess now, if Zuse was not working in Germany in 1945, and his work was smoothly published, in which direction programming languages would develop”. Perhaps the design of modern compilation systems and operating systems is one of the better possibilities.

3.3 Position Independent Code

Let’s jump out of the “dynamic linking” text trap, not loading during execution, but loading the dynamic link library during process initialization, that is, during process initialization. The simplest way, of course, is to relocate and load all required dynamic link libraries into the process’s virtual address space during process initialization.

In the previous discussion, we have always treated dynamic link libraries as ordinary target files, so can we directly use target files for dynamic linking?

At first glance, it seems there is no problem. However, we have always ignored the existence of virtual memory. The loading process needs to relocate the code of the dynamic link library, and the loaded code actually depends on its location. This means that to implement a copy of the dynamic link library code shared by multiple processes, the reference to the dynamic link library in these processes’ virtual address space must use the same virtual memory address.

One method is to reserve address space for it system-wide every time a dynamic link library is loaded (not overlapping with other dynamic libraries), and if a process created later needs to use this dynamic link library, it will load it at the predetermined address (i.e., map the same virtual address to the same physical address). But in this way, the total number of dynamic link libraries loaded in the system cannot be too many, otherwise it will “fill up” the 3GB user-mode virtual address space. For a 32-bit system that may run many heterogeneous tasks and only has 3GB of user address space, this method of reserving address space is unacceptable. Of course, for 64-bit systems, it seems that the address space is inexhaustible.

It seems that the path of directly using the target file for relocation is not feasible. In fact, our goal is simple. We hope that the shared instruction part in the dynamic link library does not need to change due to the change of the loading address when loading. Therefore, the basic idea of implementation is to separate the parts that need to be modified in the instruction and put them together with the data part. In this way, the instruction part can be shared among multiple processes, and the data part that needs to be read and written naturally has independent copies in each process. This is the legendary “Position Independent Code” (PIC).

Let’s analyze different types of address references:

Internal function calls and jumps in the module

It is not difficult to find that these instructions are address-independent as long as they use relative addressing mode.

Internal data access in the module

The instruction cannot accommodate absolute addresses, so the reference to data can only use relative addressing; however, data access is not like memory access that can specify the offset relative to the Program Counter, so the compiler adopts a very clever method: when the function is called, the call instruction will push the return address (the next instruction of the call) onto the stack, so the relative addressing based on this return address can find any data in the module. It seems that this is also address-independent.

Function calls and jumps between modules

To call a function in another module, and the address of the function is not yet determined before the module is loaded, and the code of the module itself cannot be modified. If the related instructions are placed in the data segment, the data segment cannot be executed. The ELF’s approach is to establish a pointer array (global offset table) in the data segment, storing the address of each external function; when calling an external function, locate the function in another module through the corresponding entry in the global offset table; after loading other modules, just fill in the address of the corresponding function in the global offset table.

Data access between modules

Data access between modules is similar to function calls, and the target address is also determined through the global offset table. The global offset table actually stores the offset of the symbol, regardless of whether it is data or a function.

Function calls and jumps between the module and the executable file

This is no different from function calls between modules, just find the external symbol through the symbol table in the executable file and relocate it.

Data access between the module and the executable file

When compiling executable files, address-independent code is generally not generated, so the shared library variables declared by extern are placed in the BSS segment as uninitialized data; however, there is also a copy of these data in the shared library (possibly initialized). The shared library compromises and points the corresponding entries in the global offset table to the data in the BSS segment during dynamic linking, and may also use the copy in the shared library to initialize it. In fact, when generating address-independent code, the compiler does not know whether extern is a global variable defined in other files in the same module, or a cross-module call, so it has to put it into the global offset table in an address-independent manner.

Cross-module address references in data

For example, a common piece of code:

1
2
extern int a;
int *b = &a;

The fundamental difference between cross-module address references in data and cross-module data access is that the former is stored in the data segment and the latter is stored in the code segment. The code segment cannot be modified arbitrarily, so it needs to use the global offset table to indirectly access; while each process has a copy of the data segment, it can be modified. Therefore, it is only necessary to record this type of address reference in the dynamic linking information, and modify the value of the corresponding position in the data segment during dynamic linking (modify the value of b in the data segment to the address of a in the shared library).

To generate address-independent shared libraries with GCC, just add the “-fPIC” parameter. For executable files, you can use “-fPIE” to achieve address-independent effects.

3.4 Dynamic Linker

We have seen that dynamic linking is a relatively complex thing. The previous discussion gives us an illusion that these tasks are all completed by the operating system. In fact, in Windows, the dynamic linker is indeed part of the kernel, and the entire Windows system is highly dependent on dynamic link libraries:

All system calls are wrapped into WINAPI, and WINAPI is defined in dynamic link libraries such as kernel32.dll, ntdll.dll;

  • Most of the calls between Windows applications are through COM (Component Object Model), which is used to implement mechanisms such as ActiveX, .NET, etc. that call components on demand (for example, calling Media Player to play videos in the browser, operating Word documents in PHP), and the basis of COM is dynamic link libraries.
  • In Linux, which is known for its modularity, the kernel only plays a "push" role for dynamic linking work. The kernel returns to user space after loading the ELF executable file and hands over control to the program entry. For statically linked executable files, the entry of the program is the entry specified by e_entry in the ELF file header; for dynamically linked executable files, if no special mechanism is adopted, the problem of dynamic linking will be left to the executable file: the dynamic linking mechanism is relatively complex, and it is obviously a waste to keep a dynamic linker in each executable file. So is the path of the dynamic linker in the file system specified by the operating system?
    The .interp (interpreter) segment of the ELF file saves a string, which is the path of the dynamic linker. The kernel will read the dynamic linker specified by the .interp segment of the ELF file into memory, map it to the user address space, and then hand over control to the dynamic linker. What happens next, how does the dynamic linker "eat its own strength"?

/lib/ld-x.y.z.so (x,y,z are version numbers) is such a magical thing.

  • The entry function of the dynamic linker _dl_start() performs a "bootstrap" process, that is, it does the relocation work for itself. Relative addressing within the dynamic linker during this process is not a problem, but absolute addressing is not possible, so this part needs to be handled with extra care.
  • Load the symbol table of the program. After the bootstrap is completed, you can freely call functions in the program and access global variables.
  • _dl_start_final() collects some basic runtime values
  • _dl_sysdep_start() performs some platform-related processing
  • _dl_main() determines the specified user entry address. If it is the dynamic linker itself, it is run as an executable file.
    If the user entry address is the dynamic linker, it loads the shared objects that the program depends on, resolves symbols, and relocates, which is the dynamic linking process we mentioned earlier. Obviously, the dynamic linker itself must be statically linked and cannot depend on other dynamic link libraries, otherwise no one will help it resolve dependencies. The dynamic linker itself is PIC (Position Independent Code), one is because relocation is required during the bootstrap process, and relocating the data segment is simpler than relocating the code segment; the second is because the PIC code can share physical addresses, so each program only needs one copy of the dynamic linker in memory, saving memory.

From the above description, it is easy to see that the dynamic linker can also be run directly. The kernel just looks for the .interp section, if it is not found, it directly jumps to e_entry, if it is found, it loads the dynamic linker and jumps to the dynamic linker’s e_entry.

3.5 Runtime Loading

Dynamic linking solves the problem of duplicate space occupancy of shared library instructions in different processes, but there is a flaw in completing all dynamic links at initialization: the program runs locally, and many modules will not be used in the near future, and it is a waste to load them all at once. This problem is essentially the “internal affairs” of the process, unrelated to dynamic linking.

To implement runtime on-demand loading of modules, early programmers organized modules into a tree structure according to their call relationships, using “overlay loading” to save memory space. It takes advantage of the constraints that some modules cannot coexist, allowing some modules to share the same address area, thereby saving memory space. This method requires programmers to spend a lot of effort to arrange the overlay loading structure for modules, and each executable file header must also add a “overlay manager”.

So can a system call be implemented to support the loading of a dynamic link library during the operation of the program at the operating system level? This is like PHP language can include other files at any position. Compared with “automatic loading when called”, the task of runtime loading is transferred from the operating system to the programmer, and the operating system will not automatically load, the programmer needs to declare before using; compared with “overlay loading”, there is no need for each program to carry a piece of overlay loader code, and using virtual memory mapping is more flexible than using fixed memory address allocation. With the runtime loading mechanism, the previous “automatic loading” with a large impact on performance and “overlay loading” with a large impact on programming complexity can be retired.

We first consider not modifying the loaded code, so the process of loading dynamic modules cannot involve the relocation of already loaded code. That is to say, when loading executable files, you need to know the entry address of each function in the possible loaded dynamic link library. This is not difficult to implement:

  • When the operating system creates a process, it lists the possible loaded dynamic link libraries and pre-allocates virtual address space for them;
  • Relocate the executable file according to this allocation table and load it;
  • The system call responsible for loading the dynamic link library takes out the dynamic link library;
  • If the newly loaded dynamic link library depends on other dynamic link libraries that have not yet pre-allocated address space, you need to pre-allocate space for them;
  • Now the address of the caller and the dependent of the dynamic link library has been determined, relocate it and load it into the already pre-allocated address.
  • In this way, relying on pre-allocated address space, the loaded executable files and dynamic link libraries do not need to modify the code segment after loading.
    This "reserve address space" design not only has the problem of insufficient address space, but also requires programmers to explicitly specify the "all possible loaded dynamic link library" list, which is not convenient.

The designer of runtime loading adopted a “lazy” method: provide some interfaces, let programmers find the symbols they need to use by themselves, and then indirectly call them through the address of the function or variable (function pointer, variable pointer).

In Linux, the kernel does not “overstep” to manage these things, and the runtime loading mechanism is provided by the API of the dynamic linker (/lib/libdl.so.2), including:

  • Open dynamic library (dlopen)
  • Find symbols (dlsym)
  • Error handling (dlerror)
  • Close dynamic library (dlclose) `void * dlopen(const char *filename, int flag);`
  • filename: The absolute path of the dynamic library, or the relative path relative to /lib, /usr/lib.
  • flag: The resolution method of the symbol. There are two methods for runtime loading: one is to complete all function loading work when the module is loaded (RTLD_NOW), and the other is to bind when the function is used for the first time, similar to the scheme I proposed in section 3.2 (RTLD_LAZY). Using RTLD_NOW is beneficial to discover errors in undefined symbols when debugging programs, and expose errors as early as possible; while in actual use, RTLD_LAZY can speed up the loading speed of dynamic libraries and achieve true "on-demand loading". Return value: the handle of the loaded module, pointing to the symbol table of the module. Interestingly, if the filename parameter is 0, the returned handle is the handle of the global symbol table, that is, the address can be found based on the function name at runtime and executed, based on which a mechanism similar to the "reflection" of high-level languages can be implemented. `void * dlsym(void *handle, char *symbol);`
  • handle: The symbol table handle returned by dlopen.
  • symbol: The name of the symbol to be found. If it is not found in the symbol table of the current module, it will search for the symbol in the shared objects it depends on in a breadth-first order. Return value: If it is a function, return the function address; if it is a variable, return the variable address; if it is a constant, return the value of the constant; if the symbol is not found, return NULL.
    dlclose is used to unload modules. Note that the unloading here and the loading in dlopen can be repeated, and each module has a reference count.

dlerror is used to determine whether the last dlopen, dlsym, dlclose call was successful. dlsym returning NULL (0) does not necessarily mean that the symbol was not found, it could just be a constant with a value of 0.

The difference between runtime loading and initialization loading is that the latter is transparent to the programmer and the loading of the shared library is completed before the first line of code is executed; the former explicitly calls the API provided by the dynamic linker within the program. For example, a web server can load new modules based on new configurations without restarting; a browser can load the required plugins when it encounters a webpage with Flash.

3.6 Delayed Binding

Dynamic linking is much more flexible than static linking, but it comes at the cost of sacrificing some performance. The performance of a dynamically linked program is generally 1%~5% lower than that of a statically linked program. The main reasons are:

  • Every function call and address access between modules of position-independent code has to go through the GOT (Global Offset Table) for indirect access;
  • Not every function in the dynamic link library is used during the program's execution, wasting the time for module loading and relocation.
    In fact, as early as section 3.2, the idea of "loading a function when it is first called" was proposed, which is called "delayed binding" in ELF.

When we call a function from an external module, the dynamic link method is to indirectly jump through the Global Offset Table (GOT). The simplest method is to initially let the corresponding entry in the GOT point to a “stub function”, which completes the loading work, modifies the jump address in the GOT to the loaded external function address, and then calls this function. This is similar to the idea in section 3.2, but modifying the code segment has become modifying the data segment in the GOT, so the code segment can be shared between different processes, and it reduces the security risks that may be brought by the writable code segment.

The implementation of ELF is similar, but it adds another layer: each external function has a corresponding stub function, and the function call is a call to the stub function, which implements the jump and runtime loading through the GOT inside the stub function. Such a “stub function” is called a PLT (Procedure Linkage Table) item.

1
2
3
4
5
func@plt:
jmp *(func@GOT)
push index
push moduleID
jmp _dl_runtime_resolve
  1. The linker initializes the entry corresponding to `func` in the GOT to the address of the "push index" instruction above, so that the first execution of this function is equivalent to doing nothing. From the second call to this function, it will directly call the external function through `func@GOT` and return directly, without executing the "push index" and the following instructions.
  2. `index` is the subscript of the symbol `func` in the relocation table ".rel.plt". Push `index` into the stack.
  3. Push the current module's ID into the stack. (The module ID is allocated by the dynamic linker)
  4. Call the dynamic linker's `_dl_runtime_resolve()` with `moduleID`, `index` as parameters, complete symbol resolution and relocation, and fill the real address of `func` into `func@GOT`.
    In actual implementation, ELF splits the GOT into ".got" and ".got.plt" two tables, where ".got" saves the addresses of global variable references, and ".got.plt" saves the addresses of function references. The first three items of .got.plt have special meanings:
  • The first item saves the address of the .dynamic segment, which describes the dynamic linking information of this module;
  • The second item saves the module ID of this module, which is initialized by the dynamic linker when loading the module;
  • The third item saves the address of `_dl_runtime_resolve()`, which is initialized by the dynamic linker when loading the module.
    To reduce code duplication, ELF puts the last two instructions in the example above into the first item (PLT0) of the PLT, and stipulates that each item is 16 bytes long, just enough to store the `jmp *(func@GOT), push index, jmp PLT0` three instructions.

Dynamic link libraries are not immutable, they also need to be updated. There is a vivid example in “COM Essence”: Suppose a programmer has implemented an O(1) string search algorithm, its header file is:

1
2
3
4
5
6
7
8
class __declspec(dllexport) StringFind {
char *p; // 字符串
public:
StringFind(char *p);
~StringFind();
int Find(char *p); // 查找字符串并返回找到的位置
int Length(); // 返回字符串长度
};

After receiving praise from major manufacturers, the programmer decided to continue to work hard: The Length() member function internally directly calls the strlen() function to return the string length, which is very inefficient, so the programmer decided to add a length member to save the string length; and added a SubString member function to get the substring of the string:

1
2
3
4
5
6
7
8
9
10
class __declspec(dllexport) StringFind {
char *p; // 字符串
int length; // 字符串长度
public:
StringFind(char *p);
~StringFind();
int Find(char *p); // 查找字符串并返回找到的位置
int Length(); // 返回字符串长度
char* Substring(int pos, int len); // 返回字符串从pos处开始长度为len的子串
};

The manufacturer made the new version of the DLL into a patch upgrade package to cover the old version of the DLL; soon they received a lot of complaints. The main reason comes from: The new version of StringFind object occupies 8 bytes of space, while the original program’s main module only allocated 4 bytes for it, the accessed length member actually does not belong to the StringFind object, resulting in erroneous data access, causing the program to crash.

On the Windows platform, the Component Object Model (COM) is a complex mechanism developed by Microsoft to solve these program compatibility issues (not just version issues). In .NET, an assembly includes a Manifest file, which describes the name, version number, various resources and their dependencies (including DLLs) of this assembly (composed of several executable files or dynamic link libraries). There is a WinSxS (Windows Side by Side) directory in the Windows system directory. Each version of DLL has an independent directory named after the platform type, compiler, dynamic link library name, public key, and version number in the WinSxS directory, ensuring that multiple versions of dynamic link libraries will not conflict. Of course, this requires that the dynamic link library and the main program have the same compilation environment. Windows does not have a common runtime library download repository similar to “source”, so the corresponding runtime library often needs to be included when the program is released.

In fact, the design purpose of DLL is not “shared objects”, but to promote the modularization of programs, so that various modules can be loosely combined, reused, and upgraded. The runtime loading mechanism allows various functional modules to exist in the form of plugins, which is the basis of technologies such as ActiveX. Using the characteristic that the data segment of DLL can be shared between different processes, DLL is also a way of inter-process communication in Windows (although third parties can also share their DLLs, resulting in security vulnerabilities). In the UNIX tradition, such modularization is usually one process per module, and the coordination of processes is achieved through pipes, sockets and other inter-process communication means. This method requires more effort from programmers, but can provide better encapsulation. Since the programs in the Windows tradition are mostly closed-source software, the internal interfaces are easy to unify, so most of the modules use more direct function calls, and the communication between the server and the client also uses remote procedure calls (RPC) rather than transparent text protocols.

In Linux, the version issue of shared libraries is solved by a simple method of including the version number in the file name. The naming rule of shared libraries is libname.so.x.y.z:

  • x represents the main version number, and shared libraries with different main version numbers are not compatible;
  • y represents the minor version number. When the main version numbers are the same, the higher minor version number is compatible with the lower minor version number;
  • z represents the release version number, does not make any changes to the interface, and shared libraries with the same main and minor version numbers are fully compatible.
    So how does the dynamic linker know which version of the shared library the program needs? Linux uses the SO-NAME naming mechanism to record the dependency of the library. SO-NAME is libname.so.x, only retaining the main version number. Using the feature of "SO-NAME same two shared libraries, the higher minor version number is compatible with the lower minor version number", the system will create a soft link named SO-NAME for each shared library, and only retain the highest minor version number of the shared library with the same main version number. In this way, all modules using shared libraries only need to specify the main version number (SO-NAME) without specifying the detailed version number when compiling and linking; timely deletion of outdated redundant shared libraries saves disk space.

The dependency of software packages in Linux is largely the dependency of shared libraries. Since shared libraries are usually open source or publicly available for download, the package manager will automatically obtain and install the required shared libraries from the “source”, without having to carry a large burden of shared libraries. When a new shared library is installed in the system (that is, the shared library is placed in /lib, /usr/lib or /usr/local/lib, specifically designated by /etc/ld.so.conf), you need to use the ldconfig tool to traverse the shared library directory, create or update the SO-NAME soft link, and point them to the latest shared library; update the SO-NAME cache (/etc/ld.so.cache) to speed up the search process of the shared library.

Is the symbol version problem resolved? If the dynamic linker only judges the main version number when linking, if a program depends on a shared library with a higher minor version number, the dynamic linker may not be able to detect the version conflict, which will bring the problem at the beginning of this section. In addition, “shared libraries with the same main version number, the minor version number needs to be backward compatible”, so as long as the interface makes a little non-backward compatible change, the main version number must be upgraded. Linux adopts a more fine-grained version mechanism-in executable files and shared libraries, each imported or exported symbol corresponds to a set of main and minor version numbers, and symbols with the same name can have multiple versions. In this way, a Version 1.2 shared library can contain 1.2 and 1.1 versions of library functions at the same time, and the dynamic linker will try to find the appropriate version of the library function to link for the function reference in the executable file, even if the 1.2 version and 1.1 version of this library function are incompatible, the program using these two versions of shared libraries can still be linked normally.

GCC provides the .symver assembly macro instruction to specify the symbol version. For example, change the interface of strstr without upgrading the main version number:

1
2
3
4
5
asm(".symver old_strstr, strstr@VERS_1.1");
asm(".symver new_strstr, strstr@VERS_1.2");

int old_strstr(char *haystack, char *needle); // 返回needle在haystack中第一次出现的offset,未找到返回-1
int new_strstr(char *haystack, char *needle, bool direction); // direction用于指定从前向后查找还是从后向前查找

3.8 Data Structures in Object Files

Based on the previous discussions on compilation, static linking, and dynamic linking, the classification of object files is actually quite clear:

  • Relocatable files (.o) that contain code and data and can be used for linking
  • Executable files (a.out) that contain programs that can be directly executed
  • Shared object files (.so) for use by the dynamic linker
  • This is not easy to think of, Core Dump files that dump the contents of the address space and some other information when the process terminates unexpectedly
    Both Windows' PE (Portable Executable) file format and Linux's ELF (Executable Linkable Format) file format are variants of the COFF (COmmon File Format) file format.

Below are some common ELF segments listed in alphabetical order (I haven’t verified them one by one, especially the parts related to C++. Please correct me if there are any mistakes):

.bss Uninitialized data (global variables)
.comment Compiler version information
.ctors Global constructor pointers
.data Initialized data (global variables, static variables)
.data.rel.ro Read-only data, similar to .rodata, but it will be rewritten during relocation, then set to read-only
.debug Debugging information, using gcc's -g or -ggdb parameters
.dtors Global destructor pointers
.dynamic Dynamic linking information, storing the address of the dynamic linking symbol table, string table address and size, hash table address, shared object's SO-NAME, search path, initialization code address, end code address, dependent shared object file name, dynamic linking relocation table address, relocation entry count, etc.
.dynstr Symbol names (string table) of dynamic linking symbols
.dynsym Symbol table related to dynamic linking. Note that .symtab often saves all symbols, while .dynsym only saves symbols needed for dynamic linking, and does not save symbols used only internally in the module.
.eh_frame Related to C++ exception handling
.eh_frame_hdr Related to C++ exception handling
.fini Code executed when the program exits, equivalent to the "destructor" of main()
.fini_array Function pointers that need to be executed when the program or shared object exits
.gnu.version Dynamic linking symbol version, each symbol in .dynsym corresponds to an item (the sequence number of the required version in .gnu.version_d)
.gnu.version_d Definitions of dynamic linking symbol versions, each version's flag, sequence number, shared library name, major and minor version numbers
.gnu.version_r Requirements of dynamic linking symbol versions, dependent shared library name and version sequence number
.got Global Offset Table (used for indirect jumps or references in dynamic linking)
.got.plt Procedure Linkage Table, i.e., "stub functions" for runtime linking
.hash Hash table of the symbol table, used to speed up symbol lookup
.init Initialization code before main() is executed, equivalent to the "constructor" of main()
.init_array Function pointers that need to be executed when the program or shared object is initialized
.interp File path of the dynamic linker
.line Line number information for debugging, using gcc's -g or -ggdb parameters
.note Additional platform-related information added by the compiler, linker, and operating system
.note.ABI-tag Specifies the program's ABI
.preinit_array Function pointers executed before the initialization phase, executed before .init_array
.rel.data In static linking files, the relocation table of the data segment
.rel.dyn In dynamic linking files, the relocation table for data references (.got, .data)
.rel.plt In dynamic linking files, the relocation table for function references (.got.plt)
.rel.text In static linking files, the relocation table of the code segment
.rodata Read-only data (constants, string constants)
.shstrtab String table that saves the names of each segment
.strtab String table, usually the strings corresponding to the symbol names in the symbol table
.symtab Symbol table, symbol information needed for static linking
.tbss Uninitialized data for each thread (.bss is shared by all threads)
.tdata Initialized data for each thread (.data is shared by all threads)
.text Code segment (why not called .code?)
(End of the article)

The text you provided doesn’t contain any Chinese characters to translate. It’s just closing tags for HTML elements. Please provide the Chinese text you want to translate.

Comments