summaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)Author
2019-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Out of bounds access in xfrm IPSEC policy unlink, from Yue Haibing. 2) Missing length check for esp4 UDP encap, from Sabrina Dubroca. 3) Fix byte order of RX STBC access in mac80211, from Johannes Berg. 4) Inifnite loop in bpftool map create, from Alban Crequy. 5) Register mark fix in ebpf verifier after pkt/null checks, from Paul Chaignon. 6) Properly use rcu_dereference_sk_user_data in L2TP code, from Eric Dumazet. 7) Buffer overrun in marvell phy driver, from Andrew Lunn. 8) Several crash and statistics handling fixes to bnxt_en driver, from Michael Chan and Vasundhara Volam. 9) Several fixes to the TLS layer from Jakub Kicinski (copying negative amounts of data in reencrypt, reencrypt frag copying, blind nskb->sk NULL deref, etc). 10) Several UDP GRO fixes, from Paolo Abeni and Eric Dumazet. 11) PID/UID checks on ipv6 flow labels are inverted, from Willem de Bruijn. 12) Use after free in l2tp, from Eric Dumazet. 13) IPV6 route destroy races, also from Eric Dumazet. 14) SCTP state machine can erroneously run recursively, fix from Xin Long. 15) Adjust AF_PACKET msg_name length checks, add padding bytes if necessary. From Willem de Bruijn. 16) Preserve skb_iif, so that forwarded packets have consistent values even if fragmentation is involved. From Shmulik Ladkani. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits) udp: fix GRO packet of death ipv6: A few fixes on dereferencing rt->from rds: ib: force endiannes annotation selftests: fib_rule_tests: print the result and return 1 if any tests failed ipv4: ip_do_fragment: Preserve skb_iif during fragmentation net/tls: avoid NULL pointer deref on nskb->sk in fallback selftests: fib_rule_tests: Fix icmp proto with ipv6 packet: validate msg_namelen in send directly packet: in recvmsg msg_name return at least sizeof sockaddr_ll sctp: avoid running the sctp state machine recursively stmmac: pci: Fix typo in IOT2000 comment Documentation: fix netdev-FAQ.rst markup warning ipv6: fix races in ip6_dst_destroy() l2ip: fix possible use-after-free appletalk: Set error code if register_snap_client failed net: dsa: bcm_sf2: fix buffer overflow doing set_rxnfc rxrpc: Fix net namespace cleanup ipv6/flowlabel: wait rcu grace period before put_pid() vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach tcp: add sanity tests in tcp_add_backlog() ...
2019-05-02perf/x86/amd: Update generic hardware cache events for Family 17hKim Phillips
Add a new amd_hw_cache_event_ids_f17h assignment structure set for AMD families 17h and above, since a lot has changed. Specifically: L1 Data Cache The data cache access counter remains the same on Family 17h. For DC misses, PMCx041's definition changes with Family 17h, so instead we use the L2 cache accesses from L1 data cache misses counter (PMCx060,umask=0xc8). For DC hardware prefetch events, Family 17h breaks compatibility for PMCx067 "Data Prefetcher", so instead, we use PMCx05a "Hardware Prefetch DC Fills." L1 Instruction Cache PMCs 0x80 and 0x81 (32-byte IC fetches and misses) are backward compatible on Family 17h. For prefetches, we remove the erroneous PMCx04B assignment which counts how many software data cache prefetch load instructions were dispatched. LL - Last Level Cache Removing PMCs 7D, 7E, and 7F assignments, as they do not exist on Family 17h, where the last level cache is L3. L3 counters can be accessed using the existing AMD Uncore driver. Data TLB On Intel machines, data TLB accesses ("dTLB-loads") are assigned to counters that count load/store instructions retired. This is inconsistent with instruction TLB accesses, where Intel implementations report iTLB misses that hit in the STLB. Ideally, dTLB-loads would count higher level dTLB misses that hit in lower level TLBs, and dTLB-load-misses would report those that also missed in those lower-level TLBs, therefore causing a page table walk. That would be consistent with instruction TLB operation, remove the redundancy between dTLB-loads and L1-dcache-loads, and prevent perf from producing artificially low percentage ratios, i.e. the "0.01%" below: 42,550,869 L1-dcache-loads 41,591,860 dTLB-loads 4,802 dTLB-load-misses # 0.01% of all dTLB cache hits 7,283,682 L1-dcache-stores 7,912,392 dTLB-stores 310 dTLB-store-misses On AMD Families prior to 17h, the "Data Cache Accesses" counter is used, which is slightly better than load/store instructions retired, but still counts in terms of individual load/store operations instead of TLB operations. So, for AMD Families 17h and higher, this patch assigns "dTLB-loads" to a counter for L1 dTLB misses that hit in the L2 dTLB, and "dTLB-load-misses" to a counter for L1 DTLB misses that caused L2 DTLB misses and therefore also caused page table walks. This results in a much more accurate view of data TLB performance: 60,961,781 L1-dcache-loads 4,601 dTLB-loads 963 dTLB-load-misses # 20.93% of all dTLB cache hits Note that for all AMD families, data loads and stores are combined in a single accesses counter, so no 'L1-dcache-stores' are reported separately, and stores are counted with loads in 'L1-dcache-loads'. Also note that the "% of all dTLB cache hits" string is misleading because (a) "dTLB cache": although TLBs can be considered caches for page tables, in this context, it can be misinterpreted as data cache hits because the figures are similar (at least on Intel), and (b) not all those loads (technically accesses) technically "hit" at that hardware level. "% of all dTLB accesses" would be more clear/accurate. Instruction TLB On Intel machines, 'iTLB-loads' measure iTLB misses that hit in the STLB, and 'iTLB-load-misses' measure iTLB misses that also missed in the STLB and completed a page table walk. For AMD Family 17h and above, for 'iTLB-loads' we replace the erroneous instruction cache fetches counter with PMCx084 "L1 ITLB Miss, L2 ITLB Hit". For 'iTLB-load-misses' we still use PMCx085 "L1 ITLB Miss, L2 ITLB Miss", but set a 0xff umask because without it the event does not get counted. Branch Predictor (BPU) PMCs 0xc2 and 0xc3 continue to be valid across all AMD Families. Node Level Events Family 17h does not have a PMCx0e9 counter, and corresponding counters have not been made available publicly, so for now, we mark them as unsupported for Families 17h and above. Reference: "Open-Source Register Reference For AMD Family 17h Processors Models 00h-2Fh" Released 7/17/2018, Publication #56255, Revision 3.03: https://www.amd.com/system/files/TechDocs/56255_OSRR.pdf [ mingo: tidied up the line breaks. ] Signed-off-by: Kim Phillips <kim.phillips@amd.com> Cc: <stable@vger.kernel.org> # v4.9+ Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Liška <mliska@suse.cz> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pu Wen <puwen@hygon.cn> Cc: Stephane Eranian <eranian@google.com> Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Lendacky <Thomas.Lendacky@amd.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Fixes: e40ed1542dd7 ("perf/x86: Add perf support for AMD family-17h processors") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-05-02s390: simplify disabled_waitMartin Schwidefsky
The disabled_wait() function uses its argument as the PSW address when it stops the CPU with a wait PSW that is disabled for interrupts. The different callers sometimes use a specific number like 0xdeadbeef to indicate a specific failure, the early boot code uses 0 and some other calls sites use __builtin_return_address(0). At the time a dump is created the current PSW and the registers of a CPU are written to lowcore to make them avaiable to the dump analysis tool. For a CPU stopped with disabled_wait the PSW and the registers do not really make sense together, the PSW address does not point to the function the registers belong to. Simplify disabled_wait() by using _THIS_IP_ for the PSW address and drop the argument to the function. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02s390/ftrace: use HAVE_FUNCTION_GRAPH_RET_ADDR_PTRMartin Schwidefsky
Make the call chain more reliable by tagging the ftrace stack entries with the stack pointer that is associated with the return address. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02s390/unwind: introduce stack unwind APIMartin Schwidefsky
Rework the dump_trace() stack unwinder interface to support different unwinding algorithms. The new interface looks like this: struct unwind_state state; unwind_for_each_frame(&state, task, regs, start_stack) do_something(state.sp, state.ip, state.reliable); The unwind_bc.c file contains the implementation for the classic back-chain unwinder. One positive side effect of the new code is it now handles ftraced functions gracefully. It prints the real name of the return function instead of 'return_to_handler'. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02s390/opcodes: add missing instructions to the disassemblerMartin Schwidefsky
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02s390/bug: add entry size to the __bug_table sectionMartin Schwidefsky
Change the __EMIT_BUG inline assembly to emit mergeable __bug_table entries with type @progbits and specify the size of each entry. The entry size is encoded sh_entsize field of the section definition, it allows to identify which struct bug_entry to use to decode the entries. This will be needed for the objtool support. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02s390: use proper expoline sections for .dma codeMartin Schwidefsky
The text_dma.S code uses its own macro to generate an inline version of an expoline. To make it easier to identify all expolines in the kernel use a thunk and a branch to the thunk just like the rest of the kernel code does it. The name of the text_dma.S expoline thunk is __dma__s390_indirect_jump_r14 and the section is named .dma.text.__s390_indirect_jump_r14. This will be needed for the objtool support. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02s390/nospec: rename assembler generated expoline thunksMartin Schwidefsky
The assembler version of the expoline thunk use the naming __s390x_indirect_jump_rxuse_ry while the compiler generates names like __s390_indirect_jump_rx_use_ry. Make the naming more consistent. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02s390: add missing ENDPROC statements to assembler functionsMartin Schwidefsky
The assembler code in arch/s390 misses proper ENDPROC statements to properly end functions in a few places. Add them. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-05-02powerpc/32s: Fix BATs setting with CONFIG_STRICT_KERNEL_RWXChristophe Leroy
Serge reported some crashes with CONFIG_STRICT_KERNEL_RWX enabled on a book3s32 machine. Analysis shows two issues: - BATs addresses and sizes are not properly aligned. - There is a gap between the last address covered by BATs and the first address covered by pages. Memory mapped with DBATs: 0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent 1: 0xc0800000-0xc0bfffff 0x00800000 Kernel RO coherent 2: 0xc0c00000-0xc13fffff 0x00c00000 Kernel RW coherent 3: 0xc1400000-0xc23fffff 0x01400000 Kernel RW coherent 4: 0xc2400000-0xc43fffff 0x02400000 Kernel RW coherent 5: 0xc4400000-0xc83fffff 0x04400000 Kernel RW coherent 6: 0xc8400000-0xd03fffff 0x08400000 Kernel RW coherent 7: 0xd0400000-0xe03fffff 0x10400000 Kernel RW coherent Memory mapped with pages: 0xe1000000-0xefffffff 0x21000000 240M rw present dirty accessed This patch fixes both issues. With the patch, we get the following which is as expected: Memory mapped with DBATs: 0: 0xc0000000-0xc07fffff 0x00000000 Kernel RO coherent 1: 0xc0800000-0xc0bfffff 0x00800000 Kernel RO coherent 2: 0xc0c00000-0xc0ffffff 0x00c00000 Kernel RW coherent 3: 0xc1000000-0xc1ffffff 0x01000000 Kernel RW coherent 4: 0xc2000000-0xc3ffffff 0x02000000 Kernel RW coherent 5: 0xc4000000-0xc7ffffff 0x04000000 Kernel RW coherent 6: 0xc8000000-0xcfffffff 0x08000000 Kernel RW coherent 7: 0xd0000000-0xdfffffff 0x10000000 Kernel RW coherent Memory mapped with pages: 0xe0000000-0xefffffff 0x20000000 256M rw present dirty accessed Fixes: 63b2bc619565 ("powerpc/mm/32s: Use BATs for STRICT_KERNEL_RWX") Reported-by: Serge Belyshev <belyshev@depni.sinp.msu.ru> Acked-by: Segher Boessenkool <segher@kernel.crashing.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2019-05-02Merge branch 'spi-5.2' into spi-nextMark Brown
2019-05-02spi: ep93xx: Convert to use CS GPIO descriptorsLinus Walleij
This converts the EP93xx SPI master driver to use GPIO descriptors for chip select handling. EP93xx was using platform data to pass in GPIO lines, by converting all board files to use GPIO descriptor tables the core will look up the GPIO lines from the SPI device in the same manner as for device tree. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mark Brown <broonie@kernel.org>
2019-05-01Merge tag 'arc-5.1-final' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: "A few minor fixes for ARC. - regression in memset if line size !64 - avoid panic if PAE and IOC" * tag 'arc-5.1-final' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: memset: fix build with L1_CACHE_SHIFT != 6 ARC: [hsdk] Make it easier to add PAE40 region to DTB ARC: PAE40: don't panic and instead turn off hw ioc
2019-05-01gcc-9: properly declare the {pv,hv}clock_page storageLinus Torvalds
The pvlock_page and hvclock_page variables are (as the name implies) addresses to pages, created by the linker script. But we declared them as just "extern u8" variables, which _works_, but now that gcc does some more bounds checking, it causes warnings like warning: array subscript 1 is outside array bounds of ‘u8[1]’ when we then access more than one byte from those variables. Fix this by simply making the declaration of the variables match reality, which makes the compiler happy too. Signed-off-by: Linus Torvalds <torvalds@-linux-foundation.org>
2019-05-01Merge branch 'for-next/timers' of ↵Will Deacon
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into for-next/core Conflicts: arch/arm64/Kconfig arch/arm64/include/asm/arch_timer.h
2019-05-01Merge branch 'for-next/mitigations' of ↵Will Deacon
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into for-next/core
2019-05-01Merge branch 'for-next/futex' of ↵Will Deacon
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux into for-next/core
2019-05-01arm64/speculation: Support 'mitigations=' cmdline optionJosh Poimboeuf
Configure arm64 runtime CPU speculation bug mitigations in accordance with the 'mitigations=' cmdline option. This affects Meltdown, Spectre v2, and Speculative Store Bypass. The default behavior is unchanged. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> [will: reorder checks so KASLR implies KPTI and SSBS is affected by cmdline] Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-01arm64: ssbs: Don't treat CPUs with SSBS as unaffected by SSBWill Deacon
SSBS provides a relatively cheap mitigation for SSB, but it is still a mitigation and its presence does not indicate that the CPU is unaffected by the vulnerability. Tweak the mitigation logic so that we report the correct string in sysfs. Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-01arm64: enable generic CPU vulnerabilites supportMian Yousaf Kaukab
Enable CPU vulnerabilty show functions for spectre_v1, spectre_v2, meltdown and store-bypass. Signed-off-by: Mian Yousaf Kaukab <ykaukab@suse.de> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-01arm64: add sysfs vulnerability show for speculative store bypassJeremy Linton
Return status based on ssbd_state and __ssb_safe. If the mitigation is disabled, or the firmware isn't responding then return the expected machine state based on a whitelist of known good cores. Given a heterogeneous machine, the overall machine vulnerability defaults to safe but is reset to unsafe when we miss the whitelist and the firmware doesn't explicitly tell us the core is safe. In order to make that work we delay transitioning to vulnerable until we know the firmware isn't responding to avoid a case where we miss the whitelist, but the firmware goes ahead and reports the core is not vulnerable. If all the cores in the machine have SSBS, then __ssb_safe will remain true. Tested-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-01arm64: Fix size of __early_cpu_boot_statusArun KS
__early_cpu_boot_status is of type long. Use quad assembler directive to allocate proper size. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Arun KS <arunks@codeaurora.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-05-01KVM: nVMX: Fix size checks in vmx_set_nested_stateJim Mattson
The size checks in vmx_nested_state are wrong because the calculations are made based on the size of a pointer to a struct kvm_nested_state rather than the size of a struct kvm_nested_state. Reported-by: Felix Wilhelm <fwilhelm@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Drew Schmitt <dasch@google.com> Reviewed-by: Marc Orr <marcorr@google.com> Reviewed-by: Peter Shier <pshier@google.com> Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com> Fixes: 8fcc4b5923af5de58b80b53a069453b135693304 Cc: stable@ver.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-30KVM: lapic: Check for in-kernel LAPIC before deferencing apic pointerSean Christopherson
...to avoid dereferencing a null pointer when querying the per-vCPU timer advance. Fixes: 39497d7660d98 ("KVM: lapic: Track lapic timer advance per vCPU") Reported-by: syzbot+f7e65445a40d3e0e4ebf@syzkaller.appspotmail.com Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-30x86/kvm/mmu: reset MMU context when 32-bit guest switches PAEVitaly Kuznetsov
Commit 47c42e6b4192 ("KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'") introduced a regression: 32-bit PAE guests stopped working. The issue appears to be: when guest switches (enables) PAE we need to re-initialize MMU context (set context->root_level, do reset_rsvds_bits_mask(), ...) but init_kvm_tdp_mmu() doesn't do that because we threw away is_pae(vcpu) flag from mmu role. Restore it to kvm_mmu_extended_role (as we now don't need it in base role) to fix the issue. Fixes: 47c42e6b4192 ("KVM: x86: fix handling of role.cr4_pae and rename it to 'gpte_size'") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-30KVM: x86: Whitelist port 0x7e for pre-incrementing %ripSean Christopherson
KVM's recent bug fix to update %rip after emulating I/O broke userspace that relied on the previous behavior of incrementing %rip prior to exiting to userspace. When running a Windows XP guest on AMD hardware, Qemu may patch "OUT 0x7E" instructions in reaction to the OUT itself. Because KVM's old behavior was to increment %rip before exiting to userspace to handle the I/O, Qemu manually adjusted %rip to account for the OUT instruction. Arguably this is a userspace bug as KVM requires userspace to re-enter the kernel to complete instruction emulation before taking any other actions. That being said, this is a bit of a grey area and breaking userspace that has worked for many years is bad. Pre-increment %rip on OUT to port 0x7e before exiting to userspace to hack around the issue. Fixes: 45def77ebf79e ("KVM: x86: update %rip after emulating IO") Reported-by: Simon Becherer <simon@becherer.de> Reported-and-tested-by: Iakov Karpov <srid@rkmail.ru> Reported-by: Gabriele Balducci <balducci@units.it> Reported-by: Antti Antinoja <reader@fennosys.fi> Cc: stable@vger.kernel.org Cc: Takashi Iwai <tiwai@suse.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-04-30x86/mm/mem_encrypt: Disable all instrumentation for early SME setupGary Hook
Enablement of AMD's Secure Memory Encryption feature is determined very early after start_kernel() is entered. Part of this procedure involves scanning the command line for the parameter 'mem_encrypt'. To determine intended state, the function sme_enable() uses library functions cmdline_find_option() and strncmp(). Their use occurs early enough such that it cannot be assumed that any instrumentation subsystem is initialized. For example, making calls to a KASAN-instrumented function before KASAN is set up will result in the use of uninitialized memory and a boot failure. When AMD's SME support is enabled, conditionally disable instrumentation of these dependent functions in lib/string.c and arch/x86/lib/cmdline.c. [ bp: Get rid of intermediary nostackp var and cleanup whitespace. ] Fixes: aca20d546214 ("x86/mm: Add support to make use of Secure Memory Encryption") Reported-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Gary R Hook <gary.hook@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Boris Brezillon <bbrezillon@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: "dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: "luto@kernel.org" <luto@kernel.org> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: "mingo@redhat.com" <mingo@redhat.com> Cc: "peterz@infradead.org" <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/155657657552.7116.18363762932464011367.stgit@sosrh3.amd.com
2019-04-30clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable ↵Marc Zyngier
counters Instead of always going via arch_counter_get_cntvct_stable to access the counter workaround, let's have arch_timer_read_counter point to the right method. For that, we need to track whether any CPU in the system has a workaround for the counter. This is done by having an atomic variable tracking this. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30clocksource/arm_arch_timer: Remove use of workaround static keyMarc Zyngier
The use of a static key in a hotplug path has proved to be a real nightmare, and makes it impossible to have scream-free lockdep kernel. Let's remove the static key altogether, and focus on something saner. Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30clocksource/arm_arch_timer: Drop use of static key in arch_timer_reg_read_stableMarc Zyngier
Let's start with the removal of the arch_timer_read_ool_enabled static key in arch_timer_reg_read_stable. It is not a fast path, and we can simplify things a bit. Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30clocksource/arm_arch_timer: Direcly assign set_next_event workaroundMarc Zyngier
When a given timer is affected by an erratum and requires an alternative implementation of set_next_event, we do a rather complicated dance to detect and call the workaround on each set_next_event call. This is clearly idiotic, as we can perfectly detect whether this CPU requires a workaround while setting up the clock event device. This only requires the CPU-specific detection to be done a bit earlier, and we can then safely override the set_next_event pointer if we have a workaround associated to that CPU. Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by; Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30arm64: Use arch_timer_read_counter instead of arch_counter_get_cntvctMarc Zyngier
Only arch_timer_read_counter will guarantee that workarounds are applied. So let's use this one instead of arch_counter_get_cntvct. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30ARM: vdso: Remove dependency with the arch_timer driver internalsMarc Zyngier
The VDSO code uses the kernel helper that was originally designed to abstract the access between 32 and 64bit systems. It worked so far because this function is declared as 'inline'. As we're about to revamp that part of the code, the VDSO would break. Let's fix it by doing what should have been done from the start, a proper system register access. Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30arm64: Apply ARM64_ERRATUM_1188873 to Neoverse-N1Marc Zyngier
Neoverse-N1 is also affected by ARM64_ERRATUM_1188873, so let's add it to the list of affected CPUs. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [will: Update silicon-errata.txt] Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30arm64: Add part number for Neoverse N1Marc Zyngier
New CPU, new part number. You know the drill. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30arm64: Make ARM64_ERRATUM_1188873 depend on COMPATMarc Zyngier
Since ARM64_ERRATUM_1188873 only affects AArch32 EL0, it makes some sense that it should depend on COMPAT. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30arm64: Restrict ARM64_ERRATUM_1188873 mitigation to AArch32Marc Zyngier
We currently deal with ARM64_ERRATUM_1188873 by always trapping EL0 accesses for both instruction sets. Although nothing wrong comes out of that, people trying to squeeze the last drop of performance from buggy HW find this over the top. Oh well. Let's change the mitigation by flipping the counter enable bit on return to userspace. Non-broken HW gets an extra branch on the fast path, which is hopefully not the end of the world. The arch timer workaround is also removed. Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30arm64: mm: Remove pte_unmap_nested()Qian Cai
As of commit ece0e2b6406a ("mm: remove pte_*map_nested()"), pte_unmap_nested() is no longer used and can be removed from the arm64 code. Signed-off-by: Qian Cai <cai@lca.pw> [will: also remove pte_offset_map_nested()] Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30arm64: Fix compiler warning from pte_unmap() with -Wunused-but-set-variableQian Cai
When building with -Wunused-but-set-variable, the compiler shouts about a number of pte_unmap() users, since this expands to an empty macro on arm64: | mm/gup.c: In function 'gup_pte_range': | mm/gup.c:1727:16: warning: variable 'ptem' set but not used | [-Wunused-but-set-variable] | mm/gup.c: At top level: | mm/memory.c: In function 'copy_pte_range': | mm/memory.c:821:24: warning: variable 'orig_dst_pte' set but not used | [-Wunused-but-set-variable] | mm/memory.c:821:9: warning: variable 'orig_src_pte' set but not used | [-Wunused-but-set-variable] | mm/swap_state.c: In function 'swap_ra_info': | mm/swap_state.c:641:15: warning: variable 'orig_pte' set but not used | [-Wunused-but-set-variable] | mm/madvise.c: In function 'madvise_free_pte_range': | mm/madvise.c:318:9: warning: variable 'orig_pte' set but not used | [-Wunused-but-set-variable] Rewrite pte_unmap() as a static inline function, which silences the warnings. Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Will Deacon <will.deacon@arm.com>
2019-04-30x86/alternatives: Add comment about module removal racesNadav Amit
Add a comment to clarify that users of text_poke() must ensure that no races with module removal take place. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-22-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/kprobes: Use vmalloc special flagRick Edgecombe
Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special permissioned memory in vmalloc and remove places where memory was set NX and RW before freeing which is no longer needed. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-21-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/ftrace: Use vmalloc special flagRick Edgecombe
Use new flag VM_FLUSH_RESET_PERMS for handling freeing of special permissioned memory in vmalloc and remove places where memory was set NX and RW before freeing which is no longer needed. Tested-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-20-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30mm/hibernation: Make hibernation handle unmapped pagesRick Edgecombe
Make hibernate handle unmapped pages on the direct map when CONFIG_ARCH_HAS_SET_ALIAS=y is set. These functions allow for setting pages to invalid configurations, so now hibernate should check if the pages have valid mappings and handle if they are unmapped when doing a hibernate save operation. Previously this checking was already done when CONFIG_DEBUG_PAGEALLOC=y was configured. It does not appear to have a big hibernating performance impact. The speed of the saving operation before this change was measured as 819.02 MB/s, and after was measured at 813.32 MB/s. Before: [ 4.670938] PM: Wrote 171996 kbytes in 0.21 seconds (819.02 MB/s) After: [ 4.504714] PM: Wrote 178932 kbytes in 0.22 seconds (813.32 MB/s) Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-16-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/mm/cpa: Add set_direct_map_*() functionsRick Edgecombe
Add two new functions set_direct_map_default_noflush() and set_direct_map_invalid_noflush() for setting the direct map alias for the page to its default valid permissions and to an invalid state that cannot be cached in a TLB, respectively. These functions do not flush the TLB. Note, __kernel_map_pages() does something similar but flushes the TLB and doesn't reset the permission bits to default on all architectures. Also add an ARCH config ARCH_HAS_SET_DIRECT_MAP for specifying whether these have an actual implementation or a default empty one. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-15-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/alternatives: Remove the return value of text_poke_*()Nadav Amit
The return value of text_poke_early() and text_poke_bp() is useless. Remove it. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-14-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/jump-label: Remove support for custom text pokerNadav Amit
There are only two types of text poking: early and breakpoint based. The use of a function pointer to perform text poking complicates the code and is probably inefficient due to the use of indirect branches. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-13-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/modules: Avoid breaking W^X while loading modulesNadav Amit
When modules and BPF filters are loaded, there is a time window in which some memory is both writable and executable. An attacker that has already found another vulnerability (e.g., a dangling pointer) might be able to exploit this behavior to overwrite kernel code. Prevent having writable executable PTEs in this stage. In addition, avoiding having W+X mappings can also slightly simplify the patching of modules code on initialization (e.g., by alternatives and static-key), as would be done in the next patch. This was actually the main motivation for this patch. To avoid having W+X mappings, set them initially as RW (NX) and after they are set as RO set them as X as well. Setting them as executable is done as a separate step to avoid one core in which the old PTE is cached (hence writable), and another which sees the updated PTE (executable), which would break the W^X protection. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Rik van Riel <riel@surriel.com> Link: https://lkml.kernel.org/r/20190426001143.4983-12-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/kprobes: Set instruction page as executableNadav Amit
Set the page as executable after allocation. This patch is a preparatory patch for a following patch that makes module allocated pages non-executable. While at it, do some small cleanup of what appears to be unnecessary masking. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-11-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-30x86/ftrace: Set trampoline pages as executableNadav Amit
Since alloc_module() will not set the pages as executable soon, set ftrace trampoline pages as executable after they are allocated. For the time being, do not change ftrace to use the text_poke() interface. As a result, ftrace still breaks W^X. Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: <akpm@linux-foundation.org> Cc: <ard.biesheuvel@linaro.org> Cc: <deneen.t.dock@intel.com> Cc: <kernel-hardening@lists.openwall.com> Cc: <kristen@linux.intel.com> Cc: <linux_dti@icloud.com> Cc: <will.deacon@arm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190426001143.4983-10-namit@vmware.com Signed-off-by: Ingo Molnar <mingo@kernel.org>