diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-30 13:41:50 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2024-11-30 13:41:50 -0800 |
commit | 6a34dfa15d6edf7e78b8118d862d2db0889cf669 (patch) | |
tree | f5bc36ccaa5251fa660cf2efdb00d6ffa3ace36b /Documentation | |
parent | 0e287d31b62bb53ad81d5e59778384a40f8b6f56 (diff) | |
parent | e6064da6461f989a357f2e280d7f8d4155267c4c (diff) |
Merge tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add generic support for built-in boot DTB files
- Enable TAB cycling for dialog buttons in nconfig
- Fix issues in streamline_config.pl
- Refactor Kconfig
- Add support for Clang's AutoFDO (Automatic Feedback-Directed
Optimization)
- Add support for Clang's Propeller, a profile-guided optimization.
- Change the working directory to the external module directory for M=
builds
- Support building external modules in a separate output directory
- Enable objtool for *.mod.o and additional kernel objects
- Use lz4 instead of deprecated lz4c
- Work around a performance issue with "git describe"
- Refactor modpost
* tag 'kbuild-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (85 commits)
kbuild: rename .tmp_vmlinux.kallsyms0.syms to .tmp_vmlinux0.syms
gitignore: Don't ignore 'tags' directory
kbuild: add dependency from vmlinux to resolve_btfids
modpost: replace tdb_hash() with hash_str()
kbuild: deb-pkg: add python3:native to build dependency
genksyms: reduce indentation in export_symbol()
modpost: improve error messages in device_id_check()
modpost: rename alias symbol for MODULE_DEVICE_TABLE()
modpost: rename variables in handle_moddevtable()
modpost: move strstarts() to modpost.h
modpost: convert do_usb_table() to a generic handler
modpost: convert do_of_table() to a generic handler
modpost: convert do_pnp_device_entry() to a generic handler
modpost: convert do_pnp_card_entries() to a generic handler
modpost: call module_alias_printf() from all do_*_entry() functions
modpost: pass (struct module *) to do_*_entry() functions
modpost: remove DEF_FIELD_ADDR_VAR() macro
modpost: deduplicate MODULE_ALIAS() for all drivers
modpost: introduce module_alias_printf() helper
modpost: remove unnecessary check in do_acpi_entry()
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/dev-tools/autofdo.rst | 168 | ||||
-rw-r--r-- | Documentation/dev-tools/coccinelle.rst | 22 | ||||
-rw-r--r-- | Documentation/dev-tools/index.rst | 2 | ||||
-rw-r--r-- | Documentation/dev-tools/propeller.rst | 162 | ||||
-rw-r--r-- | Documentation/kbuild/kbuild.rst | 8 | ||||
-rw-r--r-- | Documentation/kbuild/kconfig-language.rst | 4 | ||||
-rw-r--r-- | Documentation/kbuild/makefiles.rst | 14 | ||||
-rw-r--r-- | Documentation/kbuild/modules.rst | 29 |
8 files changed, 390 insertions, 19 deletions
diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst new file mode 100644 index 000000000000..1f0a451e9ccd --- /dev/null +++ b/Documentation/dev-tools/autofdo.rst @@ -0,0 +1,168 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +Using AutoFDO with the Linux kernel +=================================== + +This enables AutoFDO build support for the kernel when using +the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization) +is a type of profile-guided optimization (PGO) used to enhance the +performance of binary executables. It gathers information about the +frequency of execution of various code paths within a binary using +hardware sampling. This data is then used to guide the compiler's +optimization decisions, resulting in a more efficient binary. AutoFDO +is a powerful optimization technique, and data indicates that it can +significantly improve kernel performance. It's especially beneficial +for workloads affected by front-end stalls. + +For AutoFDO builds, unlike non-FDO builds, the user must supply a +profile. Acquiring an AutoFDO profile can be done in several ways. +AutoFDO profiles are created by converting hardware sampling using +the "perf" tool. It is crucial that the workload used to create these +perf files is representative; they must exhibit runtime +characteristics similar to the workloads that are intended to be +optimized. Failure to do so will result in the compiler optimizing +for the wrong objective. + +The AutoFDO profile often encapsulates the program's behavior. If the +performance-critical codes are architecture-independent, the profile +can be applied across platforms to achieve performance gains. For +instance, using the profile generated on Intel architecture to build +a kernel for AMD architecture can also yield performance improvements. + +There are two methods for acquiring a representative profile: +(1) Sample real workloads using a production environment. +(2) Generate the profile using a representative load test. +When enabling the AutoFDO build configuration without providing an +AutoFDO profile, the compiler only modifies the dwarf information in +the kernel without impacting runtime performance. It's advisable to +use a kernel binary built with the same AutoFDO configuration to +collect the perf profile. While it's possible to use a kernel built +with different options, it may result in inferior performance. + +One can collect profiles using AutoFDO build for the previous kernel. +AutoFDO employs relative line numbers to match the profiles, offering +some tolerance for source changes. This mode is commonly used in a +production environment for profile collection. + +In a profile collection based on a load test, the AutoFDO collection +process consists of the following steps: + +#. Initial build: The kernel is built with AutoFDO options + without a profile. + +#. Profiling: The above kernel is then run with a representative + workload to gather execution frequency data. This data is + collected using hardware sampling, via perf. AutoFDO is most + effective on platforms supporting advanced PMU features like + LBR on Intel machines. + +#. AutoFDO profile generation: Perf output file is converted to + the AutoFDO profile via offline tools. + +The support requires a Clang compiler LLVM 17 or later. + +Preparation +=========== + +Configure the kernel with:: + + CONFIG_AUTOFDO_CLANG=y + +Customization +============= + +The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for +AutoFDO builds. One can, however, enable or disable AutoFDO build for +individual files and directories by adding a line similar to the following +to the respective kernel Makefile: + +- For enabling a single file (e.g. foo.o) :: + + AUTOFDO_PROFILE_foo.o := y + +- For enabling all files in one directory :: + + AUTOFDO_PROFILE := y + +- For disabling one file :: + + AUTOFDO_PROFILE_foo.o := n + +- For disabling all files in one directory :: + + AUTOFDO_PROFILE := n + +Workflow +======== + +Here is an example workflow for AutoFDO kernel: + +1) Build the kernel on the host machine with LLVM enabled, + for example, :: + + $ make menuconfig LLVM=1 + + Turn on AutoFDO build config:: + + CONFIG_AUTOFDO_CLANG=y + + With a configuration that with LLVM enabled, use the following command:: + + $ scripts/config -e AUTOFDO_CLANG + + After getting the config, build with :: + + $ make LLVM=1 + +2) Install the kernel on the test machine. + +3) Run the load tests. The '-c' option in perf specifies the sample + event period. We suggest using a suitable prime number, like 500009, + for this purpose. + + - For Intel platforms:: + + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + + - For AMD platforms: + + The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check, + + For Zen3:: + + $ cat proc/cpuinfo | grep " brs" + + For Zen4:: + + $ cat proc/cpuinfo | grep amd_lbr_v2 + + The following command generated the perf data file:: + + $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + +4) (Optional) Download the raw perf file to the host machine. + +5) To generate an AutoFDO profile, two offline tools are available: + create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part + of the AutoFDO project and can be found on GitHub + (https://github.com/google/autofdo), version v0.30.1 or later. + The llvm_profgen tool is included in the LLVM compiler itself. It's + important to note that the version of llvm_profgen doesn't need to match + the version of Clang. It needs to be the LLVM 19 release of Clang + or later, or just from the LLVM trunk. :: + + $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file> + + or :: + + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file> + + Note that multiple AutoFDO profile files can be merged into one via:: + + $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n> + +6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1, + (Note CONFIG_AUTOFDO_CLANG needs to be enabled):: + + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> diff --git a/Documentation/dev-tools/coccinelle.rst b/Documentation/dev-tools/coccinelle.rst index 535ce126fb4f..6e70a1e9a3c0 100644 --- a/Documentation/dev-tools/coccinelle.rst +++ b/Documentation/dev-tools/coccinelle.rst @@ -250,25 +250,17 @@ variables for .cocciconfig is as follows: - Your directory from which spatch is called is processed next - The directory provided with the ``--dir`` option is processed last, if used -Since coccicheck runs through make, it naturally runs from the kernel -proper dir; as such the second rule above would be implied for picking up a -.cocciconfig when using ``make coccicheck``. - ``make coccicheck`` also supports using M= targets. If you do not supply any M= target, it is assumed you want to target the entire kernel. The kernel coccicheck script has:: - if [ "$KBUILD_EXTMOD" = "" ] ; then - OPTIONS="--dir $srctree $COCCIINCLUDE" - else - OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE" - fi - -KBUILD_EXTMOD is set when an explicit target with M= is used. For both cases -the spatch ``--dir`` argument is used, as such third rule applies when whether -M= is used or not, and when M= is used the target directory can have its own -.cocciconfig file. When M= is not passed as an argument to coccicheck the -target directory is the same as the directory from where spatch was called. + OPTIONS="--dir $srcroot $COCCIINCLUDE" + +Here, $srcroot refers to the source directory of the target: it points to the +external module's source directory when M= used, and otherwise, to the kernel +source directory. The third rule ensures the spatch reads the .cocciconfig from +the target directory, allowing external modules to have their own .cocciconfig +file. If not using the kernel's coccicheck target, keep the above precedence order logic of .cocciconfig reading. If using the kernel's coccicheck target, diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index 53d4d124f9c5..3c0ac08b2709 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -34,6 +34,8 @@ Documentation/dev-tools/testing-overview.rst ktap checkuapi gpio-sloppy-logic-analyzer + autofdo + propeller .. only:: subproject and html diff --git a/Documentation/dev-tools/propeller.rst b/Documentation/dev-tools/propeller.rst new file mode 100644 index 000000000000..92195958e3db --- /dev/null +++ b/Documentation/dev-tools/propeller.rst @@ -0,0 +1,162 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +Using Propeller with the Linux kernel +===================================== + +This enables Propeller build support for the kernel when using Clang +compiler. Propeller is a profile-guided optimization (PGO) method used +to optimize binary executables. Like AutoFDO, it utilizes hardware +sampling to gather information about the frequency of execution of +different code paths within a binary. Unlike AutoFDO, this information +is then used right before linking phase to optimize (among others) +block layout within and across functions. + +A few important notes about adopting Propeller optimization: + +#. Although it can be used as a standalone optimization step, it is + strongly recommended to apply Propeller on top of AutoFDO, + AutoFDO+ThinLTO or Instrument FDO. The rest of this document + assumes this paradigm. + +#. Propeller uses another round of profiling on top of + AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves + "build-afdo - train-afdo - build-propeller - train-propeller - + build-optimized". + +#. Propeller requires LLVM 19 release or later for Clang/Clang++ + and the linker(ld.lld). + +#. In addition to LLVM toolchain, Propeller requires a profiling + conversion tool: https://github.com/google/autofdo with a release + after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1. + +The Propeller optimization process involves the following steps: + +#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as + you would normally do, but with a set of compile-time / link-time + flags, so that a special metadata section is created within the + kernel binary. The special section is only intend to be used by the + profiling tool, it is not part of the runtime image, nor does it + change kernel run time text sections. + +#. Profiling: The above kernel is then run with a representative + workload to gather execution frequency data. This data is collected + using hardware sampling, via perf. Propeller is most effective on + platforms supporting advanced PMU features like LBR on Intel + machines. This step is the same as profiling the kernel for AutoFDO + (the exact perf parameters can be different). + +#. Propeller profile generation: Perf output file is converted to a + pair of Propeller profiles via an offline tool. + +#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized + binary as you would normally do, but with a compile-time / + link-time flag to pick up the Propeller compile time and link time + profiles. This build step uses 3 profiles - the AutoFDO profile, + the Propeller compile-time profile and the Propeller link-time + profile. + +#. Deployment: The optimized kernel binary is deployed and used + in production environments, providing improved performance + and reduced latency. + +Preparation +=========== + +Configure the kernel with:: + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + +Customization +============= + +The default CONFIG_PROPELLER_CLANG setting covers kernel space objects +for Propeller builds. One can, however, enable or disable Propeller build +for individual files and directories by adding a line similar to the +following to the respective kernel Makefile: + +- For enabling a single file (e.g. foo.o):: + + PROPELLER_PROFILE_foo.o := y + +- For enabling all files in one directory:: + + PROPELLER_PROFILE := y + +- For disabling one file:: + + PROPELLER_PROFILE_foo.o := n + +- For disabling all files in one directory:: + + PROPELLER__PROFILE := n + + +Workflow +======== + +Here is an example workflow for building an AutoFDO+Propeller kernel: + +1) Assuming an AutoFDO profile is already collected following + instructions in the AutoFDO document, build the kernel on the host + machine, with AutoFDO and Propeller build configs :: + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + + and :: + + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name> + +2) Install the kernel on the test machine. + +3) Run the load tests. The '-c' option in perf specifies the sample + event period. We suggest using a suitable prime number, like 500009, + for this purpose. + + - For Intel platforms:: + + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + + - For AMD platforms:: + + $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + + Note you can repeat the above steps to collect multiple <perf_file>s. + +4) (Optional) Download the raw perf file(s) to the host machine. + +5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to + generate Propeller profile. :: + + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> + --format=propeller --propeller_output_module_name + --out=<propeller_profile_prefix>_cc_profile.txt + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt + + "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". + + This command generates a pair of Propeller profiles: + "<propeller_profile_prefix>_cc_profile.txt" and + "<propeller_profile_prefix>_ld_profile.txt". + + If there are more than 1 perf_file collected in the previous step, + you can create a temp list file "<perf_file_list>" with each line + containing one perf file name and run:: + + $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list> + --format=propeller --propeller_output_module_name + --out=<propeller_profile_prefix>_cc_profile.txt + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt + +6) Rebuild the kernel using the AutoFDO and Propeller + profiles. :: + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + + and :: + + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> diff --git a/Documentation/kbuild/kbuild.rst b/Documentation/kbuild/kbuild.rst index 1796b3eba37b..17c9f920f03d 100644 --- a/Documentation/kbuild/kbuild.rst +++ b/Documentation/kbuild/kbuild.rst @@ -137,12 +137,18 @@ Specify the output directory when building the kernel. This variable can also be used to point to the kernel output directory when building external modules against a pre-built kernel in a separate build directory. Please note that this does NOT specify the output directory for the -external modules themselves. +external modules themselves. (Use KBUILD_EXTMOD_OUTPUT for that purpose.) The output directory can also be specified using "O=...". Setting "O=..." takes precedence over KBUILD_OUTPUT. +KBUILD_EXTMOD_OUTPUT +-------------------- +Specify the output directory for external modules. + +Setting "MO=..." takes precedence over KBUILD_EXTMOD_OUTPUT. + KBUILD_EXTRA_WARN ----------------- Specify the extra build checks. The same value can be assigned by passing diff --git a/Documentation/kbuild/kconfig-language.rst b/Documentation/kbuild/kconfig-language.rst index 43037be96a16..2619fdf56e68 100644 --- a/Documentation/kbuild/kconfig-language.rst +++ b/Documentation/kbuild/kconfig-language.rst @@ -412,8 +412,8 @@ choices:: <choice block> "endchoice" -This defines a choice group and accepts any of the above attributes as -options. +This defines a choice group and accepts "prompt", "default", "depends on", and +"help" attributes as options. A choice only allows a single config entry to be selected. diff --git a/Documentation/kbuild/makefiles.rst b/Documentation/kbuild/makefiles.rst index 7964e0c245ae..d36519f194dc 100644 --- a/Documentation/kbuild/makefiles.rst +++ b/Documentation/kbuild/makefiles.rst @@ -449,6 +449,20 @@ $(obj) to prerequisites are referenced with $(src) (because they are not generated files). +$(srcroot) + $(srcroot) refers to the root of the source you are building, which can be + either the kernel source or the external modules source, depending on whether + KBUILD_EXTMOD is set. This can be either a relative or an absolute path, but + if KBUILD_ABS_SRCTREE=1 is set, it is always an absolute path. + +$(srctree) + $(srctree) refers to the root of the kernel source tree. When building the + kernel, this is the same as $(srcroot). + +$(objtree) + $(objtree) refers to the root of the kernel object tree. It is ``.`` when + building the kernel, but it is different when building external modules. + $(kecho) echoing information to user in a rule is often a good practice but when execution ``make -s`` one does not expect to see any output diff --git a/Documentation/kbuild/modules.rst b/Documentation/kbuild/modules.rst index cd5a54d91e6d..101de236cd0c 100644 --- a/Documentation/kbuild/modules.rst +++ b/Documentation/kbuild/modules.rst @@ -59,6 +59,12 @@ Command Syntax $ make -C /lib/modules/`uname -r`/build M=$PWD modules_install + Starting from Linux 6.13, you can use the -f option instead of -C. This + will avoid unnecessary change of the working directory. The external + module will be output to the directory where you invoke make. + + $ make -f /lib/modules/`uname -r`/build/Makefile M=$PWD + Options ------- @@ -66,7 +72,10 @@ Options of the kernel output directory if the kernel was built in a separate build directory.) - make -C $KDIR M=$PWD + You can optionally pass MO= option if you want to build the modules in + a separate directory. + + make -C $KDIR M=$PWD [MO=$BUILD_DIR] -C $KDIR The directory that contains the kernel and relevant build @@ -80,6 +89,9 @@ Options directory where the external module (kbuild file) is located. + MO=$BUILD_DIR + Specifies a separate output directory for the external module. + Targets ------- @@ -215,6 +227,21 @@ Separate Kbuild File and Makefile consisting of several hundred lines, and here it really pays off to separate the kbuild part from the rest. + Linux 6.13 and later support another way. The external module Makefile + can include the kernel Makefile directly, rather than invoking sub Make. + + Example 3:: + + --> filename: Kbuild + obj-m := 8123.o + 8123-y := 8123_if.o 8123_pci.o + + --> filename: Makefile + KDIR ?= /lib/modules/$(shell uname -r)/build + export KBUILD_EXTMOD := $(realpath $(dir $(lastword $(MAKEFILE_LIST)))) + include $(KDIR)/Makefile + + Building Multiple Modules ------------------------- |