summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)Author
2018-10-23Merge branch 'perf-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "The main updates in this cycle were: - Lots of perf tooling changes too voluminous to list (big perf trace and perf stat improvements, lots of libtraceevent reorganization, etc.), so I'll list the authors and refer to the changelog for details: Benjamin Peterson, Jérémie Galarneau, Kim Phillips, Peter Zijlstra, Ravi Bangoria, Sangwon Hong, Sean V Kelley, Steven Rostedt, Thomas Gleixner, Ding Xiang, Eduardo Habkost, Thomas Richter, Andi Kleen, Sanskriti Sharma, Adrian Hunter, Tzvetomir Stoyanov, Arnaldo Carvalho de Melo, Jiri Olsa. ... with the bulk of the changes written by Jiri Olsa, Tzvetomir Stoyanov and Arnaldo Carvalho de Melo. - Continued intel_rdt work with a focus on playing well with perf events. This also imported some non-perf RDT work due to dependencies. (Reinette Chatre) - Implement counter freezing for Arch Perfmon v4 (Skylake and newer). This allows to speed up the PMI handler by avoiding unnecessary MSR writes and make it more accurate. (Andi Kleen) - kprobes cleanups and simplification (Masami Hiramatsu) - Intel Goldmont PMU updates (Kan Liang) - ... plus misc other fixes and updates" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (155 commits) kprobes/x86: Use preempt_enable() in optimized_callback() x86/intel_rdt: Prevent pseudo-locking from using stale pointers kprobes, x86/ptrace.h: Make regs_get_kernel_stack_nth() not fault on bad stack perf/x86/intel: Export mem events only if there's PEBS support x86/cpu: Drop pointless static qualifier in punit_dev_state_show() x86/intel_rdt: Fix initial allocation to consider CDP x86/intel_rdt: CBM overlap should also check for overlap with CDP peer x86/intel_rdt: Introduce utility to obtain CDP peer tools lib traceevent, perf tools: Move struct tep_handler definition in a local header file tools lib traceevent: Separate out tep_strerror() for strerror_r() issues perf python: More portable way to make CFLAGS work with clang perf python: Make clang_has_option() work on Python 3 perf tools: Free temporary 'sys' string in read_event_files() perf tools: Avoid double free in read_event_file() perf tools: Free 'printk' string in parse_ftrace_printk() perf tools: Cleanup trace-event-info 'tdata' leak perf strbuf: Match va_{add,copy} with va_end perf test: S390 does not support watchpoints in test 22 perf auxtrace: Include missing asm/bitsperlong.h to get BITS_PER_LONG tools include: Adopt linux/bits.h ...
2018-10-23Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and misc x86 updates from Ingo Molnar: "Lots of changes in this cycle - in part because locking/core attracted a number of related x86 low level work which was easier to handle in a single tree: - Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E. McKenney, Andrea Parri) - lockdep scalability improvements and micro-optimizations (Waiman Long) - rwsem improvements (Waiman Long) - spinlock micro-optimization (Matthew Wilcox) - qspinlocks: Provide a liveness guarantee (more fairness) on x86. (Peter Zijlstra) - Add support for relative references in jump tables on arm64, x86 and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens) - Be a lot less permissive on weird (kernel address) uaccess faults on x86: BUG() when uaccess helpers fault on kernel addresses (Jann Horn) - macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav Amit) - ... and a handful of other smaller changes as well" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) locking/lockdep: Make global debug_locks* variables read-mostly locking/lockdep: Fix debug_locks off performance problem locking/pvqspinlock: Extend node size when pvqspinlock is configured locking/qspinlock_stat: Count instances of nested lock slowpaths locking/qspinlock, x86: Provide liveness guarantee x86/asm: 'Simplify' GEN_*_RMWcc() macros locking/qspinlock: Rework some comments locking/qspinlock: Re-order code locking/lockdep: Remove duplicated 'lock_class_ops' percpu array x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y futex: Replace spin_is_locked() with lockdep locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs x86/extable: Macrofy inline assembly code to work around GCC inlining bugs x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs x86/refcount: Work around GCC inlining bug x86/objtool: Use asm macros to work around GCC inlining bugs ...
2018-10-23Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The biggest change in this cycle is the conclusion of the big 'simplify RCU to two primary flavors' consolidation work - i.e. there's a single RCU flavor for any kernel variant (PREEMPT and !PREEMPT): - Consolidate the RCU-bh, RCU-preempt, and RCU-sched flavors into a single flavor similar to RCU-sched in !PREEMPT kernels and into a single flavor similar to RCU-preempt (but also waiting on preempt-disabled sequences of code) in PREEMPT kernels. This branch also includes a refactoring of rcu_{nmi,irq}_{enter,exit}() from Byungchul Park. - Now that there is only one RCU flavor in any given running kernel, the many "rsp" pointers are no longer required, and this cleanup series removes them. - This branch carries out additional cleanups made possible by the RCU flavor consolidation, including inlining now-trivial functions, updating comments and definitions, and removing now-unneeded rcutorture scenarios. - Now that there is only one flavor of RCU in any running kernel, there is also only on rcu_data structure per CPU. This means that the rcu_dynticks structure can be merged into the rcu_data structure, a task taken on by this branch. This branch also contains a -rt-related fix from Mike Galbraith. There were also other updates: - Documentation updates, including some good-eye catches from Joel Fernandes. - SRCU updates, most notably changes enabling call_srcu() to be invoked very early in the boot sequence. - Torture-test updates, including some preliminary work towards making rcutorture better able to find problems that result in insufficient grace-period forward progress. - Initial changes to RCU to better promote forward progress of grace periods, including fixing a bug found by Marius Hillenbrand and David Woodhouse, with the fix suggested by Peter Zijlstra" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (140 commits) srcu: Make early-boot call_srcu() reuse workqueue lists rcutorture: Test early boot call_srcu() srcu: Make call_srcu() available during very early boot rcu: Convert rcu_state.ofl_lock to raw_spinlock_t rcu: Remove obsolete ->dynticks_fqs and ->cond_resched_completed rcu: Switch ->dynticks to rcu_data structure, remove rcu_dynticks rcu: Switch dyntick nesting counters to rcu_data structure rcu: Switch urgent quiescent-state requests to rcu_data structure rcu: Switch lazy counts to rcu_data structure rcu: Switch last accelerate/advance to rcu_data structure rcu: Switch ->tick_nohz_enabled_snap to rcu_data structure rcu: Merge rcu_dynticks structure into rcu_data structure rcu: Remove unused rcu_dynticks_snap() from Tiny RCU rcu: Convert "1UL << x" to "BIT(x)" rcu: Avoid resched_cpu() when rescheduling the current CPU rcu: More aggressively enlist scheduler aid for nohz_full CPUs rcu: Compute jiffies_till_sched_qs from other kernel parameters rcu: Provide functions for determining if call_rcu() has been invoked rcu: Eliminate ->rcu_qs_ctr from the rcu_dynticks structure rcu: Motivate Tiny RCU forward progress ...
2018-10-23Merge branch 'x86/cache' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-23Merge tag 'pm-4.20-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These make hibernation on 32-bit x86 systems work in all of the cases in which it works on 64-bit x86 ones, update the menu cpuidle governor and the "polling" state to make them more efficient, add more hardware support to cpufreq drivers and fix issues with some of them, fix a bug in the conservative cpufreq governor, fix the operating performance points (OPP) framework and make it more stable, update the devfreq subsystem to take changes in the APIs used by into account and clean up some things all over. Specifics: - Backport hibernation bug fixes from x86-64 to x86-32 and consolidate hibernation handling on x86 to allow 32-bit systems to work in all of the cases in which 64-bit ones work (Zhimin Gu, Chen Yu). - Fix hibernation documentation (Vladimir D. Seleznev). - Update the menu cpuidle governor to fix a couple of issues with it, make it more efficient in some cases and clean it up (Rafael Wysocki). - Rework the cpuidle polling state implementation to make it more efficient (Rafael Wysocki). - Clean up the cpuidle core somewhat (Fieah Lim). - Fix the cpufreq conservative governor to take policy limits into account properly in some cases (Rafael Wysocki). - Add support for retrieving guaranteed performance information to the ACPI CPPC library and make the intel_pstate driver use it to expose the CPU base frequency via sysfs on systems with the hardware-managed P-states (HWP) feature enabled (Srinivas Pandruvada). - Fix clang warning in the CPPC cpufreq driver (Nathan Chancellor). - Get rid of device_node.name printing from cpufreq (Rob Herring). - Remove unnecessary unlikely() from the cpufreq core (Igor Stoppa). - Add support for the r8a7744 SoC to the cpufreq-dt driver (Biju Das). - Update the dt-platdev cpufreq driver to allow RK3399 to have separate tunables per cluster (Dmitry Torokhov). - Fix the dma_alloc_coherent() usage in the tegra186 cpufreq driver (Christoph Hellwig). - Make the imx6q cpufreq driver read OCOTP through nvmem for imx6ul/imx6ull (Anson Huang). - Fix several bugs in the operating performance points (OPP) framework and make it more stable (Viresh Kumar, Dave Gerlach). - Update the devfreq subsystem to take changes in the APIs used by into account, fix some issues with it and make it stop print device_node.name directly (Bjorn Andersson, Enric Balletbo i Serra, Matthias Kaehlcke, Rob Herring, Vincent Donnefort, zhong jiang). - Prepare the generic power domains (genpd) framework for dealing with domains containing CPUs (Ulf Hansson). - Prevent sysfs attributes representing low-power S0 residency counters from being exposed if low-power S0 support is not indicated in ACPI FADT (Rajneesh Bhardwaj). - Get rid of custom CPU features macros for Intel CPUs from the intel_idle and RAPL drivers (Andy Shevchenko). - Update the tasks freezer to list tasks that refused to freeze and caused a system transition to a sleep state to be aborted (Todd Brandt). - Update the pm-graph set of tools to v5.2 (Todd Brandt). - Fix some issues in the cpupower utility (Anders Roxell, Prarit Bhargava)" * tag 'pm-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (73 commits) PM / Domains: Document flags for genpd PM / Domains: Deal with multiple states but no governor in genpd PM / Domains: Don't treat zero found compatible idle states as an error cpuidle: menu: Avoid computations when result will be discarded cpuidle: menu: Drop redundant comparison cpufreq: tegra186: don't pass GFP_DMA32 to dma_alloc_coherent() cpufreq: conservative: Take limits changes into account properly Documentation: intel_pstate: Add base_frequency information cpufreq: intel_pstate: Add base_frequency attribute ACPI / CPPC: Add support for guaranteed performance cpuidle: menu: Simplify checks related to the polling state PM / tools: sleepgraph and bootgraph: upgrade to v5.2 PM / tools: sleepgraph: first batch of v5.2 changes cpupower: Fix coredump on VMWare cpupower: Fix AMD Family 0x17 msr_pstate size cpufreq: imx6q: read OCOTP through nvmem for imx6ul/imx6ull cpufreq: dt-platdev: allow RK3399 to have separate tunables per cluster cpuidle: poll_state: Revise loop termination condition cpuidle: menu: Move the latency_req == 0 special case check cpuidle: menu: Avoid computations for very close timers ...
2018-10-22umh: Add command line to user mode helpersOlivier Brunel
User mode helpers were spawned without a command line, and because an empty command line is used by many tools to identify processes as kernel threads, this could cause some issues. Notably during killing spree on shutdown, since such helper would then be skipped (i.e. not killed) which would result in the process remaining alive, and thus preventing unmouting of the rootfs (as experienced with the bpfilter umh). Fixes: 449325b52b7a ("umh: introduce fork_usermode_blob() helper") Signed-off-by: Olivier Brunel <jjk@jjacky.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-23Merge tag 'regulator-v5.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator updates from Mark Brown: "The biggest chunk of the regulator changes for this release outside of the new drivers is the conversion of the fixed regulator to use the GPIO descriptor API, there's a small addition to the GPIO API plus a bunch of updates to board files to implement it. This is some really welcome work from Linus Walleij that's had a bunch of review and has been sitting in -next for a while so I'm fairly happy there's no major issues. - Helpers for overlapping linear ranges. - Display opmode and consumer requested load in the regualtor_summary file in debugfs, plus a fix there. - Support for the fun and entertaining power off mechanism that the pfuze100 hardware implements. - Conversion of the fixed regulator API to use GPIO descriptors, including pulling in a bunch of patches to a bunch of board files. - New drivers for Cirrus Logic Lochnagar, Qualcomm PMS405, Rohm BD71847, ST PMIC1, and TI LM363x devices" * tag 'regulator-v5.0' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: (36 commits) regulator: lochnagar: Use a consisent comment style for SPDX header regulator: bd718x7: Remove struct bd718xx_pmic regulator: Fetch enable gpiods nonexclusive regulator/gpio: Allow nonexclusive GPIO access regulator: lochnagar: Add support for the Cirrus Logic Lochnagar regulator: stpmic1: Return REGULATOR_MODE_INVALID for invalid mode regulator: stpmic1: add stpmic1 regulator driver dt-bindings: regulator: document stpmic1 pmic regulators regulator: axp20x: Mark expected switch fall-throughs regulator: bd718xx: fix build warning on x86_64 regulator: fixed: Default enable high on DT regulators regulator: bd718xx: rename bd71837 to 718xx regulator: bd718XX use pickable ranges regulator/mfd: bd718xx: rename bd71837/bd71847 common instances regulator: Support regulators where voltage ranges are selectable mfd: dt bindings: add BD71847 device-tree binding documentation regulator: dt bindings: add BD71847 device-tree binding documentation regulator/mfd: Support ROHM BD71847 power management IC regulator: da905{2,5}: Remove unnecessary array check regulator: qcom: Add PMS405 regulators ...
2018-10-22Merge tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma mapping updates from Christoph Hellwig: "First batch of dma-mapping changes for 4.20. There will be a second PR as some big changes were only applied just before the end of the merge window, and I want to give them a few more days in linux-next. Summary: - mostly more consolidation of the direct mapping code, including converting over hexagon, and merging the coherent and non-coherent code into a single dma_map_ops instance (me) - cleanups for the dma_configure/dma_unconfigure callchains (me) - better handling of dma_masks in odd setups (me, Alexander Duyck) - better debugging of passing vmalloc address to the DMA API (Stephen Boyd) - CMA command line parsing fix (He Zhe)" * tag 'dma-mapping-4.20' of git://git.infradead.org/users/hch/dma-mapping: (27 commits) dma-direct: respect DMA_ATTR_NO_WARN dma-mapping: translate __GFP_NOFAIL to DMA_ATTR_NO_WARN dma-direct: document the zone selection logic dma-debug: Check for drivers mapping invalid addresses in dma_map_single() dma-direct: fix return value of dma_direct_supported dma-mapping: move dma_default_get_required_mask under ifdef dma-direct: always allow dma mask <= physiscal memory size dma-direct: implement complete bus_dma_mask handling dma-direct: refine dma_direct_alloc zone selection dma-direct: add an explicit dma_direct_get_required_mask dma-mapping: make the get_required_mask method available unconditionally unicore32: remove swiotlb support Revert "dma-mapping: clear dev->dma_ops in arch_teardown_dma_ops" dma-mapping: support non-coherent devices in dma_common_get_sgtable dma-mapping: consolidate the dma mmap implementations dma-mapping: merge direct and noncoherent ops dma-mapping: move the dma_coherent flag to struct device MIPS: don't select DMA_MAYBE_COHERENT from DMA_PERDEV_COHERENT dma-mapping: add the missing ARCH_HAS_SYNC_DMA_FOR_CPU_ALL declaration dma-mapping: fix panic caused by passing empty cma command line argument ...
2018-10-22Merge tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer updates from Jens Axboe: "This is the main pull request for block changes for 4.20. This contains: - Series enabling runtime PM for blk-mq (Bart). - Two pull requests from Christoph for NVMe, with items such as; - Better AEN tracking - Multipath improvements - RDMA fixes - Rework of FC for target removal - Fixes for issues identified by static checkers - Fabric cleanups, as prep for TCP transport - Various cleanups and bug fixes - Block merging cleanups (Christoph) - Conversion of drivers to generic DMA mapping API (Christoph) - Series fixing ref count issues with blkcg (Dennis) - Series improving BFQ heuristics (Paolo, et al) - Series improving heuristics for the Kyber IO scheduler (Omar) - Removal of dangerous bio_rewind_iter() API (Ming) - Apply single queue IPI redirection logic to blk-mq (Ming) - Set of fixes and improvements for bcache (Coly et al) - Series closing a hotplug race with sysfs group attributes (Hannes) - Set of patches for lightnvm: - pblk trace support (Hans) - SPDX license header update (Javier) - Tons of refactoring patches to cleanly abstract the 1.2 and 2.0 specs behind a common core interface. (Javier, Matias) - Enable pblk to use a common interface to retrieve chunk metadata (Matias) - Bug fixes (Various) - Set of fixes and updates to the blk IO latency target (Josef) - blk-mq queue number updates fixes (Jianchao) - Convert a bunch of drivers from the old legacy IO interface to blk-mq. This will conclude with the removal of the legacy IO interface itself in 4.21, with the rest of the drivers (me, Omar) - Removal of the DAC960 driver. The SCSI tree will introduce two replacement drivers for this (Hannes)" * tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block: (204 commits) block: setup bounce bio_sets properly blkcg: reassociate bios when make_request() is called recursively blkcg: fix edge case for blk_get_rl() under memory pressure nvme-fabrics: move controller options matching to fabrics nvme-rdma: always have a valid trsvcid mtip32xx: fully switch to the generic DMA API rsxx: switch to the generic DMA API umem: switch to the generic DMA API sx8: switch to the generic DMA API sx8: remove dead IF_64BIT_DMA_IS_POSSIBLE code skd: switch to the generic DMA API ubd: remove use of blk_rq_map_sg nvme-pci: remove duplicate check drivers/block: Remove DAC960 driver nvme-pci: fix hot removal during error handling nvmet-fcloop: suppress a compiler warning nvme-core: make implicit seed truncation explicit nvmet-fc: fix kernel-doc headers nvme-fc: rework the request initialization code nvme-fc: introduce struct nvme_fcp_op_w_sgl ...
2018-10-22Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Apart from some new arm64 features and clean-ups, this also contains the core mmu_gather changes for tracking the levels of the page table being cleared and a minor update to the generic compat_sys_sigaltstack() introducing COMPAT_SIGMINSKSZ. Summary: - Core mmu_gather changes which allow tracking the levels of page-table being cleared together with the arm64 low-level flushing routines - Support for the new ARMv8.5 PSTATE.SSBS bit which can be used to mitigate Spectre-v4 dynamically without trapping to EL3 firmware - Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack - Optimise emulation of MRS instructions to ID_* registers on ARMv8.4 - Support for Common Not Private (CnP) translations allowing threads of the same CPU to share the TLB entries - Accelerated crc32 routines - Move swapper_pg_dir to the rodata section - Trap WFI instruction executed in user space - ARM erratum 1188874 workaround (arch_timer) - Miscellaneous fixes and clean-ups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (78 commits) arm64: KVM: Guests can skip __install_bp_hardening_cb()s HYP work arm64: cpufeature: Trap CTR_EL0 access only where it is necessary arm64: cpufeature: Fix handling of CTR_EL0.IDC field arm64: cpufeature: ctr: Fix cpu capability check for late CPUs Documentation/arm64: HugeTLB page implementation arm64: mm: Use __pa_symbol() for set_swapper_pgd() arm64: Add silicon-errata.txt entry for ARM erratum 1188873 Revert "arm64: uaccess: implement unsafe accessors" arm64: mm: Drop the unused cpu parameter MAINTAINERS: fix bad sdei paths arm64: mm: Use #ifdef for the __PAGETABLE_P?D_FOLDED defines arm64: Fix typo in a comment in arch/arm64/mm/kasan_init.c arm64: xen: Use existing helper to check interrupt status arm64: Use daifflag_restore after bp_hardening arm64: daifflags: Use irqflags functions for daifflags arm64: arch_timer: avoid unused function warning arm64: Trap WFI executed in userspace arm64: docs: Document SSBS HWCAP arm64: docs: Fix typos in ELF hwcaps arm64/kprobes: remove an extra semicolon in arch_prepare_kprobe ...
2018-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf-next 2018-10-21 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Implement two new kind of BPF maps, that is, queue and stack map along with new peek, push and pop operations, from Mauricio. 2) Add support for MSG_PEEK flag when redirecting into an ingress psock sk_msg queue, and add a new helper bpf_msg_push_data() for insert data into the message, from John. 3) Allow for BPF programs of type BPF_PROG_TYPE_CGROUP_SKB to use direct packet access for __skb_buff, from Song. 4) Use more lightweight barriers for walking perf ring buffer for libbpf and perf tool as well. Also, various fixes and improvements from verifier side, from Daniel. 5) Add per-symbol visibility for DSO in libbpf and hide by default global symbols such as netlink related functions, from Andrey. 6) Two improvements to nfp's BPF offload to check vNIC capabilities in case prog is shared with multiple vNICs and to protect against mis-initializing atomic counters, from Jakub. 7) Fix for bpftool to use 4 context mode for the nfp disassembler, also from Jakub. 8) Fix a return value comparison in test_libbpf.sh and add several bpftool improvements in bash completion, documentation of bpf fs restrictions and batch mode summary print, from Quentin. 9) Fix a file resource leak in BPF selftest's load_kallsyms() helper, from Peng. 10) Fix an unused variable warning in map_lookup_and_delete_elem(), from Alexei. 11) Fix bpf_skb_adjust_room() signature in BPF UAPI helper doc, from Nicolas. 12) Add missing executables to .gitignore in BPF selftests, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-22x86/stackprotector: Remove the call to boot_init_stack_canary() from ↵Christophe Leroy
cpu_startup_entry() The following commit: d7880812b359 ("idle: Add the stack canary init to cpu_startup_entry()") ... added an x86 specific boot_init_stack_canary() call to the generic cpu_startup_entry() as a temporary hack, with the intention to remove the #ifdef CONFIG_X86 later. More than 5 years later let's finally realize that plan! :-) While implementing stack protector support for PowerPC, we found that calling boot_init_stack_canary() is also needed for PowerPC which uses per task (TLS) stack canary like the X86. However, calling boot_init_stack_canary() would break architectures using a global stack canary (ARM, SH, MIPS and XTENSA). Instead of modifying the #ifdef CONFIG_X86 to an even messier: #if defined(CONFIG_X86) || defined(CONFIG_PPC) PowerPC implemented the call to boot_init_stack_canary() in the function calling cpu_startup_entry(). Let's try the same cleanup on the x86 side as well. On x86 we have two functions calling cpu_startup_entry(): - start_secondary() - cpu_bringup_and_idle() start_secondary() already calls boot_init_stack_canary(), so it's good, and this patch adds the call to boot_init_stack_canary() in cpu_bringup_and_idle(). I.e. now x86 catches up to the rest of the world and the ugly init sequence in init/main.c can be removed from cpu_startup_entry(). As a final benefit we can also remove the <linux/stackprotector.h> dependency from <linux/sched.h>. [ mingo: Improved the changelog a bit, added language explaining x86 borkage and sched.h change. ] Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linuxppc-dev@lists.ozlabs.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181020072649.5B59310483E@pc16082vm.idsi0.si.c-s.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
David Ahern's dump indexing bug fix in 'net' overlapped the change of the function signature of inet6_fill_ifaddr() in 'net-next'. Trivially resolved. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-21Merge remote-tracking branches 'regulator/topic/bd718xx' and ↵Mark Brown
'regulator/topic/pfuze100' into regulator-next
2018-10-21memremap: Convert to XArrayMatthew Wilcox
Use the new xa_store_range function instead of the radix tree. Signed-off-by: Matthew Wilcox <willy@infradead.org>
2018-10-20bpf, verifier: avoid retpoline for map push/pop/peek operationDaniel Borkmann
Extend prior work from 09772d92cd5a ("bpf: avoid retpoline for lookup/update/delete calls on maps") to also apply to the recently added map helpers that perform push/pop/peek operations so that the indirect call can be avoided. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-20bpf, verifier: remove unneeded flow key in check_helper_mem_accessDaniel Borkmann
They PTR_TO_FLOW_KEYS is not used today to be passed into a helper as memory, so it can be removed from check_helper_mem_access(). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-20bpf, verifier: reject xadd on flow key memoryDaniel Borkmann
We should not enable xadd operation for flow key memory if not needed there anyway. There is no such issue as described in the commit f37a8cb84cce ("bpf: reject stores into ctx via st and xadd") since there's no context rewriter for flow keys today, but it also shouldn't become part of the user facing behavior to allow for it. After patch: 0: (79) r7 = *(u64 *)(r1 +144) 1: (b7) r3 = 4096 2: (db) lock *(u64 *)(r7 +0) += r3 BPF_XADD stores into R7 flow_keys is not allowed Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-20bpf, verifier: fix register type dump in xadd and stDaniel Borkmann
Using reg_type_str[insn->dst_reg] is incorrect since insn->dst_reg contains the register number but not the actual register type. Add a small reg_state() helper and use it to get to the type. Also fix up the test_verifier test cases that have an incorrect errstr. Fixes: 9d2be44a7f33 ("bpf: Reuse canonical string formatter for ctx errs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-20Merge branch 'sched-urgent-for-linus' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Ingo writes: "scheduler fixes: Two fixes: a CFS-throttling bug fix, and an interactivity fix." * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix the min_vruntime update logic in dequeue_entity() sched/fair: Fix throttle_list starvation with low CFS quota
2018-10-20Merge tag 'trace-v4.19-rc8-2' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "tracing: A few small fixes to synthetic events Masami found some issues with the creation of synthetic events. The first two patches fix handling of unsigned type, and handling of a space before an ending semi-colon. The third patch adds a selftest to test the processing of synthetic events." * tag 'trace-v4.19-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests: ftrace: Add synthetic event syntax testcase tracing: Fix synthetic event to allow semicolon at end tracing: Fix synthetic event to accept unsigned modifier
2018-10-19tracing: Fix synthetic event to allow semicolon at endMasami Hiramatsu
Fix synthetic event to allow independent semicolon at end. The synthetic_events interface accepts a semicolon after the last word if there is no space. # echo "myevent u64 var;" >> synthetic_events But if there is a space, it returns an error. # echo "myevent u64 var ;" > synthetic_events sh: write error: Invalid argument This behavior is difficult for users to understand. Let's allow the last independent semicolon too. Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-19tracing: Fix synthetic event to accept unsigned modifierMasami Hiramatsu
Fix synthetic event to accept unsigned modifier for its field type correctly. Currently, synthetic_events interface returns error for "unsigned" modifiers as below; # echo "myevent unsigned long var" >> synthetic_events sh: write error: Invalid argument This is because argv_split() breaks "unsigned long" into "unsigned" and "long", but parse_synth_field() doesn't expected it. With this fix, synthetic_events can handle the "unsigned long" correctly like as below; # echo "myevent unsigned long var" >> synthetic_events # cat synthetic_events myevent unsigned long var Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox Cc: Shuah Khan <shuah@kernel.org> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: stable@vger.kernel.org Fixes: commit 4b147936fa50 ("tracing: Add support for 'synthetic' events") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-19bpf: remove unused variableAlexei Starovoitov
fix the following warning ../kernel/bpf/syscall.c: In function ‘map_lookup_and_delete_elem’: ../kernel/bpf/syscall.c:1010:22: warning: unused variable ‘ptr’ [-Wunused-variable] void *key, *value, *ptr; ^~~ Fixes: bd513cd08f10 ("bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscall") Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKBSong Liu
BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the skb. This patch enables direct access of skb for these programs. Two helper functions bpf_compute_and_save_data_end() and bpf_restore_data_end() are introduced. There are used in __cgroup_bpf_run_filter_skb(), to compute proper data_end for the BPF program, and restore original data afterwards. Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: add MAP_LOOKUP_AND_DELETE_ELEM syscallMauricio Vasquez B
The previous patch implemented a bpf queue/stack maps that provided the peek/pop/push functions. There is not a direct relationship between those functions and the current maps syscalls, hence a new MAP_LOOKUP_AND_DELETE_ELEM syscall is added, this is mapped to the pop operation in the queue/stack maps and it is still to implement in other kind of maps. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: add queue and stack mapsMauricio Vasquez B
Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs. These maps support peek, pop and push operations that are exposed to eBPF programs through the new bpf_map[peek/pop/push] helpers. Those operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop BPF_MAP_UPDATE_ELEM -> push Queue/stack maps are implemented using a buffer, tail and head indexes, hence BPF_F_NO_PREALLOC is not supported. As opposite to other maps, queue and stack do not use RCU for protecting maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE argument that is a pointer to a memory zone where to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. Our main motivation for implementing queue/stack maps was to keep track of a pool of elements, like network ports in a SNAT, however we forsee other use cases, like for exampling saving last N kernel events in a map and then analysing from userspace. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf/verifier: add ARG_PTR_TO_UNINIT_MAP_VALUEMauricio Vasquez B
ARG_PTR_TO_UNINIT_MAP_VALUE argument is a pointer to a memory zone used to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. This will be used in the following patch that implements some new helpers that receive a pointer to be filled with a map value. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf/syscall: allow key to be null in map functionsMauricio Vasquez B
This commit adds the required logic to allow key being NULL in case the key_size of the map is 0. A new __bpf_copy_key function helper only copies the key from userpsace when key_size != 0, otherwise it enforces that key must be null. Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19bpf: rename stack trace map operationsMauricio Vasquez B
In the following patches queue and stack maps (FIFO and LIFO datastructures) will be implemented. In order to avoid confusion and a possible name clash rename stack_map_ops to stack_trace_map_ops Signed-off-by: Mauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
net/sched/cls_api.c has overlapping changes to a call to nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL to the 5th argument, and another (from 'net-next') added cb->extack instead of NULL to the 6th argument. net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to code which moved (to mr_table_dump)) in 'net-next'. Thanks to David Ahern for the heads up. Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19genirq: Fix race on spurious interrupt detectionLukas Wunner
Commit 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of threaded irqs") made detection of spurious interrupts work for threaded handlers by: a) incrementing a counter every time the thread returns IRQ_HANDLED, and b) checking whether that counter has increased every time the thread is woken. However for oneshot interrupts, the commit unmasks the interrupt before incrementing the counter. If another interrupt occurs right after unmasking but before the counter is incremented, that interrupt is incorrectly considered spurious: time | irq_thread() | irq_thread_fn() | action->thread_fn() | irq_finalize_oneshot() | unmask_threaded_irq() /* interrupt is unmasked */ | | /* interrupt fires, incorrectly deemed spurious */ | | atomic_inc(&desc->threads_handled); /* counter is incremented */ v This is observed with a hi3110 CAN controller receiving data at high volume (from a separate machine sending with "cangen -g 0 -i -x"): The controller signals a huge number of interrupts (hundreds of millions per day) and every second there are about a dozen which are deemed spurious. In theory with high CPU load and the presence of higher priority tasks, the number of incorrectly detected spurious interrupts might increase beyond the 99,900 threshold and cause disablement of the interrupt. In practice it just increments the spurious interrupt count. But that can cause people to waste time investigating it over and over. Fix it by moving the accounting before the invocation of irq_finalize_oneshot(). [ tglx: Folded change log update ] Fixes: 1e77d0a1ed74 ("genirq: Sanitize spurious interrupt detection of threaded irqs") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mathias Duckeck <m.duckeck@kunbus.de> Cc: Akshay Bhat <akshay.bhat@timesys.com> Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com> Cc: stable@vger.kernel.org # v3.16+ Link: https://lkml.kernel.org/r/1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.de
2018-10-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netGreg Kroah-Hartman
David writes: "Networking 1) Fix gro_cells leak in xfrm layer, from Li RongQing. 2) BPF selftests change RLIMIT_MEMLOCK blindly, don't do that. From Eric Dumazet. 3) AF_XDP calls synchronize_net() under RCU lock, fix from Björn Töpel. 4) Out of bounds packet access in _decode_session6(), from Alexei Starovoitov. 5) Several ethtool bugs, where we copy a struct into the kernel twice and our validations of the values in the first copy can be invalidated by the second copy due to asynchronous updates to the memory by the user. From Wenwen Wang. 6) Missing netlink attribute validation in cls_api, from Davide Caratti. 7) LLC SAP sockets neet to be SOCK_RCU FREE, from Cong Wang. 8) rxrpc operates on wrong kvec, from Yue Haibing. 9) A regression was introduced by the disassosciation of route neighbour references in rt6_probe(), causing probe for neighbourless routes to not be properly rate limited. Fix from Sabrina Dubroca. 10) Unsafe RCU locking in tipc, from Tung Nguyen. 11) Use after free in inet6_mc_check(), from Eric Dumazet. 12) PMTU from icmp packets should update the SCTP transport pathmtu, from Xin Long. 13) Missing peer put on error in rxrpc, from David Howells. 14) Fix pedit in nfp driver, from Pieter Jansen van Vuuren. 15) Fix overflowing shift statement in qla3xxx driver, from Nathan Chancellor. 16) Fix Spectre v1 in ptp code, from Gustavo A. R. Silva. 17) udp6_unicast_rcv_skb() interprets udpv6_queue_rcv_skb() return value in an inverted manner, fix from Paolo Abeni. 18) Fix missed unresolved entries in ipmr dumps, from Nikolay Aleksandrov. 19) Fix NAPI handling under high load, we can completely miss events when NAPI has to loop more than one time in a cycle. From Heiner Kallweit." * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (49 commits) ip6_tunnel: Fix encapsulation layout tipc: fix info leak from kernel tipc_event net: socket: fix a missing-check bug net: sched: Fix for duplicate class dump r8169: fix NAPI handling under high load net: ipmr: fix unresolved entry dumps net: mscc: ocelot: Fix comment in ocelot_vlant_wait_for_completion() sctp: fix the data size calculation in sctp_data_size virtio_net: avoid using netif_tx_disable() for serializing tx routine udp6: fix encap return code for resubmitting mlxsw: core: Fix use-after-free when flashing firmware during init sctp: not free the new asoc when sctp_wait_for_connect returns err sctp: fix race on sctp_id2asoc r8169: re-enable MSI-X on RTL8168g net: bpfilter: use get_pid_task instead of pid_task ptp: fix Spectre v1 vulnerability net: qla3xxx: Remove overflowing shift statement geneve, vxlan: Don't set exceptions if skb->len < mtu geneve, vxlan: Don't check skb_dst() twice sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL instead ...
2018-10-19swiotlb: add support for non-coherent DMAChristoph Hellwig
Handle architectures that are not cache coherent directly in the main swiotlb code by calling arch_sync_dma_for_{device,cpu} in all the right places from the various dma_map/unmap/sync methods when the device is non-coherent. Because swiotlb now uses dma_direct_alloc for the coherent allocation that side is already taken care of by the dma-direct code calling into arch_dma_{alloc,free} for devices that are non-coherent. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19swiotlb: don't dip into swiotlb pool for coherent allocationsChristoph Hellwig
All architectures that support swiotlb also have a zone that backs up these less than full addressing allocations (usually ZONE_DMA32). Because of that it is rather pointless to fall back to the global swiotlb buffer if the normal dma direct allocation failed - the only thing this will do is to eat up bounce buffers that would be more useful to serve streaming mappings. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19swiotlb: refactor swiotlb_map_pageChristoph Hellwig
Remove the somewhat useless map_single function, and replace it with a swiotlb_bounce_page handler that handles everything related to actually bouncing a page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19swiotlb: use swiotlb_map_page in swiotlb_map_sg_attrsChristoph Hellwig
No need to duplicate the code - map_sg is equivalent to map_page for each page in the scatterlist. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19swiotlb: merge swiotlb_unmap_page and unmap_singleChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19swiotlb: remove the overflow bufferChristoph Hellwig
Like all other dma mapping drivers just return an error code instead of an actual memory buffer. The reason for the overflow buffer was that at the time swiotlb was invented there was no way to check for dma mapping errors, but this has long been fixed. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19swiotlb: do not panic on mapping failuresChristoph Hellwig
All properly written drivers now have error handling in the dma_map_single / dma_map_page callers. As swiotlb_tbl_map_single already prints a useful warning when running out of swiotlb pool space we can also remove swiotlb_full entirely as it serves no purpose now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
2018-10-19swiotlb: mark is_swiotlb_buffer staticChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19swiotlb: remove a pointless commentChristoph Hellwig
This comments describes an aspect of the map_sg interface that isn't even exploited by swiotlb. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2018-10-19locking/lockdep: Fix debug_locks off performance problemWaiman Long
It was found that when debug_locks was turned off because of a problem found by the lockdep code, the system performance could drop quite significantly when the lock_stat code was also configured into the kernel. For instance, parallel kernel build time on a 4-socket x86-64 server nearly doubled. Further analysis into the cause of the slowdown traced back to the frequent call to debug_locks_off() from the __lock_acquired() function probably due to some inconsistent lockdep states with debug_locks off. The debug_locks_off() function did an unconditional atomic xchg to write a 0 value into debug_locks which had already been set to 0. This led to severe cacheline contention in the cacheline that held debug_locks. As debug_locks is being referenced in quite a few different places in the kernel, this greatly slow down the system performance. To prevent that trashing of debug_locks cacheline, lock_acquired() and lock_contended() now checks the state of debug_locks before proceeding. The debug_locks_off() function is also modified to check debug_locks before calling __debug_locks_off(). Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-18softirq: Fix typo in __do_softirq() commentsYangtao Li
s/s/as [ mingo: Also add a missing 'the', add proper punctuation and clarify what 'swap' means here. ] Signed-off-by: Yangtao Li <tiny.windzz@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: alexander.levin@verizon.com Cc: frederic@kernel.org Cc: joel@joelfernandes.org Cc: paulmck@linux.vnet.ibm.com Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20181018142133.12341-1-tiny.windzz@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-18Merge branches 'acpi-pm' and 'pm-sleep'Rafael J. Wysocki
* acpi-pm: ACPI / PM: LPIT: Register sysfs attributes based on FADT * pm-sleep: x86-32, hibernate: Adjust in_suspend after resumed on 32bit system x86-32, hibernate: Set up temporary text mapping for 32bit system x86-32, hibernate: Switch to relocated restore code during resume on 32bit system x86-32, hibernate: Switch to original page table after resumed x86-32, hibernate: Use the page size macro instead of constant value x86-32, hibernate: Use temp_pgt as the temporary page table x86, hibernate: Rename temp_level4_pgt to temp_pgt x86-32, hibernate: Enable CONFIG_ARCH_HIBERNATION_HEADER on 32bit system x86, hibernate: Extract the common code of 64/32 bit system x86-32/asm/power: Create stack frames in hibernate_asm_32.S PM / hibernate: Check the success of generating md5 digest before hibernation x86, hibernate: Fix nosave_regions setup for hibernation PM / sleep: Show freezing tasks that caused a suspend abort PM / hibernate: Documentation: fix image_size default value
2018-10-18Merge tag 'trace-v4.19-rc8' of ↵Greg Kroah-Hartman
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Steven writes: "tracing: Two fixes for 4.19 This fixes two bugs: - Fix size mismatch of tracepoint array - Have preemptirq test module use same clock source of the selftest" * tag 'trace-v4.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c tracepoint: Fix tracepoint array element size mismatch
2018-10-17tracing: Use trace_clock_local() for looping in preemptirq_delay_test.cSteven Rostedt (VMware)
The preemptirq_delay_test module is used for the ftrace selftest code that tests the latency tracers. The problem is that it uses ktime for the delay loop, and then checks the tracer to see if the delay loop is caught, but the tracer uses trace_clock_local() which uses various different other clocks to measure the latency. As ktime uses the clock cycles, and the code then converts that to nanoseconds, it causes rounding errors, and the preemptirq latency tests are failing due to being off by 1 (it expects to see a delay of 500000 us, but the delay is only 499999 us). This is happening due to a rounding error in the ktime (which is totally legit). The purpose of the test is to see if it can catch the delay, not to test the accuracy between trace_clock_local() and ktime_get(). Best to use apples to apples, and have the delay loop use the same clock as the latency tracer does. Cc: stable@vger.kernel.org Fixes: f96e8577da102 ("lib: Add module for testing preemptoff/irqsoff latency tracers") Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-17tracepoint: Fix tracepoint array element size mismatchMathieu Desnoyers
commit 46e0c9be206f ("kernel: tracepoints: add support for relative references") changes the layout of the __tracepoint_ptrs section on architectures supporting relative references. However, it does so without turning struct tracepoint * const into const int elsewhere in the tracepoint code, which has the following side-effect: Setting mod->num_tracepoints is done in by module.c: mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs", sizeof(*mod->tracepoints_ptrs), &mod->num_tracepoints); Basically, since sizeof(*mod->tracepoints_ptrs) is a pointer size (rather than sizeof(int)), num_tracepoints is erroneously set to half the size it should be on 64-bit arch. So a module with an odd number of tracepoints misses the last tracepoint due to effect of integer division. So in the module going notifier: for_each_tracepoint_range(mod->tracepoints_ptrs, mod->tracepoints_ptrs + mod->num_tracepoints, tp_module_going_check_quiescent, NULL); the expression (mod->tracepoints_ptrs + mod->num_tracepoints) actually evaluates to something within the bounds of the array, but miss the last tracepoint if the number of tracepoints is odd on 64-bit arch. Fix this by introducing a new typedef: tracepoint_ptr_t, which is either "const int" on architectures that have PREL32 relocations, or "struct tracepoint * const" on architectures that does not have this feature. Also provide a new tracepoint_ptr_defer() static inline to encapsulate deferencing this type rather than duplicate code and ugly idefs within the for_each_tracepoint_range() implementation. This issue appears in 4.19-rc kernels, and should ideally be fixed before the end of the rc cycle. Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Jessica Yu <jeyu@kernel.org> Link: http://lkml.kernel.org/r/20181013191050.22389-1-mathieu.desnoyers@efficios.com Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morris <james.morris@microsoft.com> Cc: James Morris <jmorris@namei.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Nicolas Pitre <nico@linaro.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Russell King <linux@armlinux.org.uk> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Thomas Garnier <thgarnie@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2018-10-17locking/pvqspinlock: Extend node size when pvqspinlock is configuredWaiman Long
The qspinlock code supports up to 4 levels of slowpath nesting using four per-CPU mcs_spinlock structures. For 64-bit architectures, they fit nicely in one 64-byte cacheline. For para-virtualized (PV) qspinlocks it needs to store more information in the per-CPU node structure than there is space for. It uses a trick to use a second cacheline to hold the extra information that it needs. So PV qspinlock needs to access two extra cachelines for its information whereas the native qspinlock code only needs one extra cacheline. Freshly added counter profiling of the qspinlock code, however, revealed that it was very rare to use more than two levels of slowpath nesting. So it doesn't make sense to penalize PV qspinlock code in order to have four mcs_spinlock structures in the same cacheline to optimize for a case in the native qspinlock code that rarely happens. Extend the per-CPU node structure to have two more long words when PV qspinlock locks are configured to hold the extra data that it needs. As a result, the PV qspinlock code will enjoy the same benefit of using just one extra cacheline like the native counterpart, for most cases. [ mingo: Minor changelog edits. ] Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539697507-28084-2-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-17locking/qspinlock_stat: Count instances of nested lock slowpathsWaiman Long
Queued spinlock supports up to 4 levels of lock slowpath nesting - user context, soft IRQ, hard IRQ and NMI. However, we are not sure how often the nesting happens. So add 3 more per-CPU stat counters to track the number of instances where nesting index goes to 1, 2 and 3 respectively. On a dual-socket 64-core 128-thread Zen server, the following were the new stat counter values under different circumstances: State slowpath index1 index2 index3 ----- -------- ------ ------ ------- After bootup 1,012,150 82 0 0 After parallel build + perf-top 125,195,009 82 0 0 So the chance of having more than 2 levels of nesting is extremely low. [ mingo: Minor changelog edits. ] Signed-off-by: Waiman Long <longman@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/1539697507-28084-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>