[转帖]GCC的优化的情况

gcc,优化,情况 · 浏览次数 : 0

小编点评

Sure, here's the summary you requested: **LTO Design Overview:** * LTO (Link-Time Optimization) is implemented as a GCC front-end for GIMPLE bytecode. * LTO support is enabled by default for ELF-based systems, including darwin, cygwin, and mingw. * Objects generated with LTO support contain the intermediate code only. * The “fat” object format allows for efficient shipping of one set of fat objects. * Any mistake in the toolchain leads to LTO information not being used. **LTO Modes of Operation:** * **LTO mode:** The whole program is read into the compiler at link-time and optimized in a similar way as if it were a single source-level compilation unit. * **WHOPR mode:** The program is partitioned into modules and compiled in parallel. **Main Advantages of LTO:** * Improves compilation speed for large programs by splitting the optimization process into multiple stages. * Allows parallel execution of optimization tasks, reducing compilation time. * Avoids the need to load the whole program into memory. **Points to Note:** * LTO can be disabled by passing the `-flto` option to GCC. * The `lto1` front end is responsible for implementing LTO and WHOPR modes.

正文

25.1 Design Overview

Link time optimization is implemented as a GCC front end for a bytecode representation of GIMPLE that is emitted in special sections of .o files. Currently, LTO support is enabled in most ELF-based systems, as well as darwin, cygwin and mingw systems.

By default, object files generated with LTO support contain only GIMPLE bytecode. Such objects are called “slim”, and they require that tools like ar and nm understand symbol tables of LTO sections. For most targets these tools have been extended to use the plugin infrastructure, so GCC can support “slim” objects consisting of the intermediate code alone.

GIMPLE bytecode could also be saved alongside final object code if the -ffat-lto-objects option is passed, or if no plugin support is detected for ar and nm when GCC is configured. It makes the object files generated with LTO support larger than regular object files. This “fat” object format allows to ship one set of fat objects which could be used both for development and the production of optimized builds. A, perhaps surprising, side effect of this feature is that any mistake in the toolchain leads to LTO information not being used (e.g. an older libtool calling ld directly). This is both an advantage, as the system is more robust, and a disadvantage, as the user is not informed that the optimization has been disabled.

At the highest level, LTO splits the compiler in two. The first half (the “writer”) produces a streaming representation of all the internal data structures needed to optimize and generate code. This includes declarations, types, the callgraph and the GIMPLE representation of function bodies.

When -flto is given during compilation of a source file, the pass manager executes all the passes in all_lto_gen_passes. Currently, this phase is composed of two IPA passes:

  • pass_ipa_lto_gimple_out This pass executes the function lto_output in lto-streamer-out.cc, which traverses the call graph encoding every reachable declaration, type and function. This generates a memory representation of all the file sections described below.
  • pass_ipa_lto_finish_out This pass executes the function produce_asm_for_decls in lto-streamer-out.cc, which takes the memory image built in the previous pass and encodes it in the corresponding ELF file sections.

The second half of LTO support is the “reader”. This is implemented as the GCC front end lto1 in lto/lto.cc. When collect2 detects a link set of .o/.a files with LTO information and the -flto is enabled, it invokes lto1 which reads the set of files and aggregates them into a single translation unit for optimization. The main entry point for the reader is lto/lto.cc:lto_main.

25.1.1 LTO modes of operation

One of the main goals of the GCC link-time infrastructure was to allow effective compilation of large programs. For this reason GCC implements two link-time compilation modes.

  1. LTO mode, in which the whole program is read into the compiler at link-time and optimized in a similar way as if it were a single source-level compilation unit.
  2. WHOPR or partitioned mode, designed to utilize multiple CPUs and/or a distributed compilation environment to quickly link large applications. WHOPR stands for WHOle Program optimizeR (not to be confused with the semantics of -fwhole-program). It partitions the aggregated callgraph from many different .o files and distributes the compilation of the sub-graphs to different CPUs.

    Note that distributed compilation is not implemented yet, but since the parallelism is facilitated via generating a Makefile, it would be easy to implement.

WHOPR splits LTO into three main stages:

  1. Local generation (LGEN) This stage executes in parallel. Every file in the program is compiled into the intermediate language and packaged together with the local call-graph and summary information. This stage is the same for both the LTO and WHOPR compilation mode.
  2. Whole Program Analysis (WPA) WPA is performed sequentially. The global call-graph is generated, and a global analysis procedure makes transformation decisions. The global call-graph is partitioned to facilitate parallel optimization during phase 3. The results of the WPA stage are stored into new object files which contain the partitions of program expressed in the intermediate language and the optimization decisions.
  3. Local transformations (LTRANS) This stage executes in parallel. All the decisions made during phase 2 are implemented locally in each partitioned object file, and the final object code is generated. Optimizations which cannot be decided efficiently during the phase 2 may be performed on the local call-graph partitions.

WHOPR can be seen as an extension of the usual LTO mode of compilation. In LTO, WPA and LTRANS are executed within a single execution of the compiler, after the whole program has been read into memory.

When compiling in WHOPR mode, the callgraph is partitioned during the WPA stage. The whole program is split into a given number of partitions of roughly the same size. The compiler tries to minimize the number of references which cross partition boundaries. The main advantage of WHOPR is to allow the parallel execution of LTRANS stages, which are the most time-consuming part of the compilation process. Additionally, it avoids the need to load the whole program into memory.

与[转帖]GCC的优化的情况相似的内容:

[转帖]GCC的优化的情况

25.1 Design Overview Link time optimization is implemented as a GCC front end for a bytecode representation of GIMPLE that is emitted in special secti

[转帖]gcc与makefile常用操作(绝对常用,也绝对够用)

makefile与gcc常用操作 一、温故知新1、可执行程序的生成过程2、gcc的常用操作 二、make操作三、编写Makefile文件时常用操作注意:在Makefile文件中 空格和缩进是完全不同的,不可以相互转换。1、框架格式2、举例3、优化1). 伪目标 .PHONY2). $ 和 @ 符号的

[转帖]Linux性能优化(十二)——CPU性能调优

Linux性能优化(十二)——CPU性能调优 https://blog.51cto.com/u_9291927/2594259 一、应用程序优化 (1)编译器优化。适当开启编译器优化选项,在编译阶段提升性能。gcc提供优化选项-On会自动对应用程序的代码进行优化。(2)算法优化。使用复杂度更低的算法

[转帖]GCC 编译及编译选项

俗话说:'工欲善其事,必先利其器',一直在工作中使用GNU C编译器(以下简称GCC),这里对GCC的一些警告选项细致的分析,并列举几个简单的例子[注1]供分析参考。 1、 -Wall集合警告选项我们平时可能大多数情况只使用-Wall编译警告选项,实际上-Wall选项是一系列警告编译选项的集合。下面

[转帖]使用GCC编译器实测兆芯KX-U6780A的SPEC CPU2006成绩

https://baijiahao.baidu.com/s?id=1722775453962904303 兆芯KX-U6780A是一款8核2.7GHz的使用x86/AMD64指令集(架构)的国产CPU,于2019年发布。兆芯于2013年成立,不久之后就使用VIA的CPU成品成功申请了“核高基”重大专

[转帖]CentOS7/完美升级gcc版本方法

https://zhuanlan.zhihu.com/p/535657060 在某些应用场景中,需要特定的gcc版本支持,但是轻易不要去编译gcc、不要去编译gcc、不要去编译gcc,我这里推荐使用红帽提供的开发工具包来管理gcc版本,这样做的好处是随时切换版本,并且可以并存多个版本,不破坏原有gc

[转帖]CentOS7完美升级gcc版本方法

https://blog.whsir.com/post-4975.html 在某些应用场景中,需要特定的gcc版本支持,但是轻易不要去编译gcc、不要去编译gcc、不要去编译gcc,我这里推荐使用红帽提供的开发工具包来管理gcc版本,这样做的好处是随时切换版本,并且可以并存多个版本,不破坏原有gcc

[转帖]CentOS8完美升级gcc版本方法

https://blog.whsir.com/post-6114.html 在CentOS8系统中,默认gcc版本已经是8.x.x版本,但是在一些场景中,还是需要高版本的gcc,网上一些作死的文章还在复制粘贴的告诉你如何编译升级gcc版本。 之前吴昊也写过CentOS完美升级gcc版本方法:http

[转帖]linux系统gcc编译过程

https://www.jianshu.com/p/09c8edd86a96 姓名:曾国强 学号:19021210984 【嵌牛导读】GCC(GNU Compiler Collection,GNU编译器套件)是由GNU开发的编程语言译器。GNU编译器套件包括C、C++、 Objective-C、 F

[转帖]axel 下载与安装

一.安装必要的库 yum -y install openssl-devel gcc 二.下载源码包 wget -O axel-2.17.11.tar.gz http://github.com/axel-download-accelerator/axel/releases/download/v2.17