summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-09-26wimax/i2400m: remove unlikely() from WARN*() conditionDenis Efremov
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely() internally. Link: http://lkml.kernel.org/r/20190829165025.15750-6-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26fs: remove unlikely() from WARN_ON() conditionDenis Efremov
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely() internally. Link: http://lkml.kernel.org/r/20190829165025.15750-5-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26xen/events: remove unlikely() from WARN() conditionDenis Efremov
"unlikely(WARN(x))" is excessive. WARN() already uses unlikely() internally. Link: http://lkml.kernel.org/r/20190829165025.15750-4-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Joe Perches <joe@perches.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-26Merge tag 'wireless-drivers-for-davem-2019-09-26' of ↵David S. Miller
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo says: ==================== wireless-drivers fixes for 5.4 First set of fixes for 5.4 sent during the merge window. Most are regressions fixes but the mt7615 problem has been since it was merged. iwlwifi * fix a build regression related CONFIG_THERMAL * avoid using GEO_TX_POWER_LIMIT command on certain firmware versions rtw88 * fixes for skb leaks zd1211rw * fix a compiler warning on 32 bit mt76 * fix the firmware paths for mt7615 to match with linux-firmware wil6210 * fix use of skb after free ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26bpf: Clean up indentation issue in BTF kflag processingColin Ian King
There is a statement that is indented one level too deeply, remove the extraneous tab. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20190925093835.19515-1-colin.king@canonical.com
2019-09-26jffs2: Fix mounting under new mount APIDavid Howells
The mounting of jffs2 is broken due to the changes from the new mount API because it specifies a "source" operation, but then doesn't actually process it. But because it specified it, it doesn't return -ENOPARAM and the caller doesn't process it either and the source gets lost. Fix this by simply removing the source parameter from jffs2 and letting the VFS deal with it in the default manner. To test it, enable CONFIG_MTD_MTDRAM and allow the default size and erase block size parameters, then try and mount the /dev/mtdblock<N> file that that creates as jffs2. No need to initialise it. Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API") Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David Howells <dhowells@redhat.com> cc: David Woodhouse <dwmw2@infradead.org> cc: Richard Weinberger <richard@nod.at> cc: linux-mtd@lists.infradead.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-26libbpf: Teach btf_dumper to emit stand-alone anonymous enum definitionsAndrii Nakryiko
BTF-to-C converter previously skipped anonymous enums in an assumption that those are embedded in struct's field definitions. This is not always the case and a lot of kernel constants are defined as part of anonymous enums. This change fixes the logic by eagerly marking all types as either referenced by any other type or not. This is enough to distinguish two classes of anonymous enums and emit previously omitted enum definitions. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20190925203745.3173184-1-andriin@fb.com
2019-09-26MAINTAINERS: Add myself as reviewer for the PWM subsystemUwe Kleine-König
I spend some time in the nearer past reviewing PWM patches. Honor this by adding me as a reviewer. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-09-26MAINTAINERS: Add patchwork link for PWM entryUwe Kleine-König
This instance collects patches and Thierry updates the patches' status there, so I consider it used and suitable to document it officially. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-09-26MAINTAINERS: Add a selection of PWM related keywords to the PWM entryUwe Kleine-König
This is just a small subset of the relevant functions, but should at least catch all new code as every consumer has to call pwm_apply_state() (or the legacy function pwm_config()) and every PWM provider has to implement pwm_ops. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-09-26pwm: mediatek: Add MT7629 compatible stringSam Shih
This adds pwm support for MT7629, and separate mt7629 compatible string from mt7622 Signed-off-by: Sam Shih <sam.shih@mediatek.com> Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
2019-09-26io_uring: make CQ ring wakeups be more efficientJens Axboe
For batched IO, it's not uncommon for waiters to ask for more than 1 IO to complete before being woken up. This is a problem with wait_event() since tasks will get woken for every IO that completes, re-check condition, then go back to sleep. For batch counts on the order of what you do for high IOPS, that can result in 10s of extra wakeups for the waiting task. Add a private wake function that checks for the wake up count criteria being met before calling autoremove_wake_function(). Pavel reports that one test case he has runs 40% faster with proper batching of wakeups. Reported-by: Pavel Begunkov <asml.silence@gmail.com> Tested-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26ipv6: do not free rt if FIB_LOOKUP_NOREF is set on suppress ruleJason A. Donenfeld
Commit 7d9e5f422150 removed references from certain dsts, but accounting for this never translated down into the fib6 suppression code. This bug was triggered by WireGuard users who use wg-quick(8), which uses the "suppress-prefix" directive to ip-rule(8) for routing all of their internet traffic without routing loops. The test case added here causes the reference underflow by causing packets to evaluate a suppress rule. Fixes: 7d9e5f422150 ("ipv6: convert major tx path to use RT6_LOOKUP_F_DST_NOREF") Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26openvswitch: change type of UPCALL_PID attribute to NLA_UNSPECLi RongQing
userspace openvswitch patch "(dpif-linux: Implement the API functions to allow multiple handler threads read upcall)" changes its type from U32 to UNSPEC, but leave the kernel unchanged and after kernel 6e237d099fac "(netlink: Relax attr validation for fixed length types)", this bug is exposed by the below warning [ 57.215841] netlink: 'ovs-vswitchd': attribute type 5 has an invalid length. Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's") Signed-off-by: Li RongQing <lirongqing@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26dt-bindings: net: ravb: Add support for r8a774b1 SoCBiju Das
Document RZ/G2N (R8A774B1) SoC bindings. Signed-off-by: Biju Das <biju.das@bp.renesas.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26net: stmmac: Fix page pool sizeThierry Reding
The size of individual pages in the page pool in given by an order. The order is the binary logarithm of the number of pages that make up one of the pages in the pool. However, the driver currently passes the number of pages rather than the order, so it ends up wasting quite a bit of memory. Fix this by taking the binary logarithm and passing that in the order field. Fixes: 2af6106ae949 ("net: stmmac: Introducing support for Page Pool") Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26macsec: drop skb sk before calling gro_cells_receiveXin Long
Fei Liu reported a crash when doing netperf on a topo of macsec dev over veth: [ 448.919128] refcount_t: underflow; use-after-free. [ 449.090460] Call trace: [ 449.092895] refcount_sub_and_test+0xb4/0xc0 [ 449.097155] tcp_wfree+0x2c/0x150 [ 449.100460] ip_rcv+0x1d4/0x3a8 [ 449.103591] __netif_receive_skb_core+0x554/0xae0 [ 449.108282] __netif_receive_skb+0x28/0x78 [ 449.112366] netif_receive_skb_internal+0x54/0x100 [ 449.117144] napi_gro_complete+0x70/0xc0 [ 449.121054] napi_gro_flush+0x6c/0x90 [ 449.124703] napi_complete_done+0x50/0x130 [ 449.128788] gro_cell_poll+0x8c/0xa8 [ 449.132351] net_rx_action+0x16c/0x3f8 [ 449.136088] __do_softirq+0x128/0x320 The issue was caused by skb's true_size changed without its sk's sk_wmem_alloc increased in tcp/skb_gro_receive(). Later when the skb is being freed and the skb's truesize is subtracted from its sk's sk_wmem_alloc in tcp_wfree(), underflow occurs. macsec is calling gro_cells_receive() to receive a packet, which actually requires skb->sk to be NULL. However when macsec dev is over veth, it's possible the skb->sk is still set if the skb was not unshared or expanded from the peer veth. ip_rcv() is calling skb_orphan() to drop the skb's sk for tproxy, but it is too late for macsec's calling gro_cells_receive(). So fix it by dropping the skb's sk earlier on rx path of macsec. Fixes: 5491e7c6b1a9 ("macsec: enable GRO and RPS on macsec devices") Reported-by: Xiumei Mu <xmu@redhat.com> Reported-by: Fei Liu <feliu@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26iocost: bump up default latency targets for hard disksTejun Heo
The default hard disk param sets latency targets at 50ms. As the default target percentiles are zero, these don't directly regulate vrate; however, they're still used to calculate the period length - 100ms in this case. This is excessively low. A SATA drive with QD32 saturated with random IOs can easily reach avg completion latency of several hundred msecs. A period duration which is substantially lower than avg completion latency can lead to wildly fluctuating vrate. Let's bump up the default latency targets to 250ms so that the period duration is sufficiently long. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26iocost: improve nr_lagging handlingTejun Heo
Some IOs may span multiple periods. As latencies are collected on completion, the inbetween periods won't register them and may incorrectly decide to increase vrate. nr_lagging tracks these IOs to avoid those situations. Currently, whenever there are IOs which are spanning from the previous period, busy_level is reset to 0 if negative thus suppressing vrate increase. This has the following two problems. * When latency target percentiles aren't set, vrate adjustment should only be governed by queue depth depletion; however, the current code keeps nr_lagging active which pulls in latency results and can keep down vrate unexpectedly. * When lagging condition is detected, it resets the entire negative busy_level. This turned out to be way too aggressive on some devices which sometimes experience extended latencies on a small subset of commands. In addition, a lagging IO will be accounted as latency target miss on completion anyway and resetting busy_level amplifies its impact unnecessarily. This patch fixes the above two problems by disabling nr_lagging counting when latency target percentiles aren't set and blocking vrate increases when there are lagging IOs while leaving busy_level as-is. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26iocost: better trace vrate changesTejun Heo
vrate_adj tracepoint traces vrate changes; however, it does so only when busy_level is non-zero. busy_level turning to zero can sometimes be as interesting an event. This patch also enables vrate_adj tracepoint on other vrate related events - busy_level changes and non-zero nr_lagging. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26Merge tag 'mlx5-fixes-2019-09-24' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-09-24 This series introduces some fixes to mlx5 driver. For more information please see tag log below. Please pull and let me know if there is any problem. For -stable v4.20: ('net/mlx5e: Fix traffic duplication in ethtool steering') For -stable v4.19: ('net/mlx5: Add device ID of upcoming BlueField-2') For -stable v5.3: ('net/mlx5e: Fix matching on tunnel addresses type') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26smb3: pass mode bits into create callsSteve French
We need to populate an ACL (security descriptor open context) on file and directory correct. This patch passes in the mode. Followon patch will build the open context and the security descriptor (from the mode) that goes in the open context. Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2019-09-26net: print proper warning on dst underflowJason A. Donenfeld
Proper warnings with stack traces make it much easier to figure out what's doing the double free and create more meaningful bug reports from users. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26net/sched: cbs: Fix not adding cbs instance to listVinicius Costa Gomes
When removing a cbs instance when offloading is enabled, the crash below can be observed. The problem happens because that when offloading is enabled, the cbs instance is not added to the list. Also, the current code doesn't handle correctly the case when offload is disabled without removing the qdisc: if the link speed changes the credit calculations will be wrong. When we create the cbs instance with offloading enabled, it's not added to the notification list, when later we disable offloading, it's not in the list, so link speed changes will not affect it. The solution for both issues is the same, add the cbs instance being created unconditionally to the global list, even if the link state notification isn't useful "right now". Crash log: [518758.189866] BUG: kernel NULL pointer dereference, address: 0000000000000000 [518758.189870] #PF: supervisor read access in kernel mode [518758.189871] #PF: error_code(0x0000) - not-present page [518758.189872] PGD 0 P4D 0 [518758.189874] Oops: 0000 [#1] SMP PTI [518758.189876] CPU: 3 PID: 4825 Comm: tc Not tainted 5.2.9 #1 [518758.189877] Hardware name: Gigabyte Technology Co., Ltd. Z390 AORUS ULTRA/Z390 AORUS ULTRA-CF, BIOS F7 03/14/2019 [518758.189881] RIP: 0010:__list_del_entry_valid+0x29/0xa0 [518758.189883] Code: 90 48 b8 00 01 00 00 00 00 ad de 55 48 8b 17 4c 8b 47 08 48 89 e5 48 39 c2 74 27 48 b8 00 02 00 00 00 00 ad de 49 39 c0 74 2d <49> 8b 30 48 39 fe 75 3d 48 8b 52 08 48 39 f2 75 4c b8 01 00 00 00 [518758.189885] RSP: 0018:ffffa27e43903990 EFLAGS: 00010207 [518758.189887] RAX: dead000000000200 RBX: ffff8bce69f0f000 RCX: 0000000000000000 [518758.189888] RDX: 0000000000000000 RSI: ffff8bce69f0f064 RDI: ffff8bce69f0f1e0 [518758.189890] RBP: ffffa27e43903990 R08: 0000000000000000 R09: ffff8bce69e788c0 [518758.189891] R10: ffff8bce62acd400 R11: 00000000000003cb R12: ffff8bce69e78000 [518758.189892] R13: ffff8bce69f0f140 R14: 0000000000000000 R15: 0000000000000000 [518758.189894] FS: 00007fa1572c8f80(0000) GS:ffff8bce6e0c0000(0000) knlGS:0000000000000000 [518758.189895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [518758.189896] CR2: 0000000000000000 CR3: 000000040a398006 CR4: 00000000003606e0 [518758.189898] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [518758.189899] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [518758.189900] Call Trace: [518758.189904] cbs_destroy+0x32/0xa0 [sch_cbs] [518758.189906] qdisc_destroy+0x45/0x120 [518758.189907] qdisc_put+0x25/0x30 [518758.189908] qdisc_graft+0x2c1/0x450 [518758.189910] tc_get_qdisc+0x1c8/0x310 [518758.189912] ? get_page_from_freelist+0x91a/0xcb0 [518758.189914] rtnetlink_rcv_msg+0x293/0x360 [518758.189916] ? kmem_cache_alloc_node_trace+0x178/0x260 [518758.189918] ? __kmalloc_node_track_caller+0x38/0x50 [518758.189920] ? rtnl_calcit.isra.0+0xf0/0xf0 [518758.189922] netlink_rcv_skb+0x48/0x110 [518758.189923] rtnetlink_rcv+0x10/0x20 [518758.189925] netlink_unicast+0x15b/0x1d0 [518758.189926] netlink_sendmsg+0x1ea/0x380 [518758.189929] sock_sendmsg+0x2f/0x40 [518758.189930] ___sys_sendmsg+0x295/0x2f0 [518758.189932] ? ___sys_recvmsg+0x151/0x1e0 [518758.189933] ? do_wp_page+0x7e/0x450 [518758.189935] __sys_sendmsg+0x48/0x80 [518758.189937] __x64_sys_sendmsg+0x1a/0x20 [518758.189939] do_syscall_64+0x53/0x1f0 [518758.189941] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [518758.189942] RIP: 0033:0x7fa15755169a [518758.189944] Code: 48 c7 c0 ff ff ff ff eb be 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 18 b8 2e 00 00 00 c5 fc 77 0f 05 <48> 3d 00 f0 ff ff 77 5e c3 0f 1f 44 00 00 48 83 ec 28 89 54 24 1c [518758.189946] RSP: 002b:00007ffda58b60b8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [518758.189948] RAX: ffffffffffffffda RBX: 000055e4b836d9a0 RCX: 00007fa15755169a [518758.189949] RDX: 0000000000000000 RSI: 00007ffda58b6128 RDI: 0000000000000003 [518758.189951] RBP: 00007ffda58b6190 R08: 0000000000000001 R09: 000055e4b9d848a0 [518758.189952] R10: 0000000000000000 R11: 0000000000000246 R12: 000000005d654b49 [518758.189953] R13: 0000000000000000 R14: 00007ffda58b6230 R15: 00007ffda58b6210 [518758.189955] Modules linked in: sch_cbs sch_etf sch_mqprio netlink_diag unix_diag e1000e igb intel_pch_thermal thermal video backlight pcc_cpufreq [518758.189960] CR2: 0000000000000000 [518758.189961] ---[ end trace 6a13f7aaf5376019 ]--- [518758.189963] RIP: 0010:__list_del_entry_valid+0x29/0xa0 [518758.189964] Code: 90 48 b8 00 01 00 00 00 00 ad de 55 48 8b 17 4c 8b 47 08 48 89 e5 48 39 c2 74 27 48 b8 00 02 00 00 00 00 ad de 49 39 c0 74 2d <49> 8b 30 48 39 fe 75 3d 48 8b 52 08 48 39 f2 75 4c b8 01 00 00 00 [518758.189967] RSP: 0018:ffffa27e43903990 EFLAGS: 00010207 [518758.189968] RAX: dead000000000200 RBX: ffff8bce69f0f000 RCX: 0000000000000000 [518758.189969] RDX: 0000000000000000 RSI: ffff8bce69f0f064 RDI: ffff8bce69f0f1e0 [518758.189971] RBP: ffffa27e43903990 R08: 0000000000000000 R09: ffff8bce69e788c0 [518758.189972] R10: ffff8bce62acd400 R11: 00000000000003cb R12: ffff8bce69e78000 [518758.189973] R13: ffff8bce69f0f140 R14: 0000000000000000 R15: 0000000000000000 [518758.189975] FS: 00007fa1572c8f80(0000) GS:ffff8bce6e0c0000(0000) knlGS:0000000000000000 [518758.189976] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [518758.189977] CR2: 0000000000000000 CR3: 000000040a398006 CR4: 00000000003606e0 [518758.189979] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [518758.189980] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: e0a7683d30e9 ("net/sched: cbs: fix port_rate miscalculation") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26drivers: net: Fix Kconfig indentationKrzysztof Kozlowski
Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Kalle Valo <kvalo@codeaurora.org> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26net: Fix Kconfig indentationKrzysztof Kozlowski
Adjust indentation from spaces to tab (+optional two spaces) as in coding style with command like: $ sed -e 's/^ /\t/' -i */Kconfig Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26MAINTAINERS: add Yanjun to FORCEDETH maintainers listRain River
Yanjun has been spending quite a lot of time fixing bugs in FORCEDETH source code. I'd like to add Yanjun to maintainers list. Signed-off-by: Rain River <rain.1986.08.12@gmail.com> Acked-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-09-26block: don't release queue's sysfs lock during switching elevatorMing Lei
cecf5d87ff20 ("block: split .sysfs_lock into two locks") starts to release & acquire sysfs_lock before registering/un-registering elevator queue during switching elevator for avoiding potential deadlock from showing & storing 'queue/iosched' attributes and removing elevator's kobject. Turns out there isn't such deadlock because 'q->sysfs_lock' isn't required in .show & .store of queue/iosched's attributes, and just elevator's sysfs lock is acquired in elv_iosched_store() and elv_iosched_show(). So it is safe to hold queue's sysfs lock when registering/un-registering elevator queue. The biggest issue is that commit cecf5d87ff20 assumes that concurrent write on 'queue/scheduler' can't happen. However, this assumption isn't true, because kernfs_fop_write() only guarantees that concurrent write aren't called on the same open file, but the write could be from different open on the file. So we can't release & re-acquire queue's sysfs lock during switching elevator, otherwise use-after-free on elevator could be triggered. Fixes the issue by not releasing queue's sysfs lock during switching elevator. Fixes: cecf5d87ff20 ("block: split .sysfs_lock into two locks") Cc: Christoph Hellwig <hch@infradead.org> Cc: Hannes Reinecke <hare@suse.com> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26blk-mq: move lockdep_assert_held() into elevator_exitMing Lei
Commit c48dac137a62 ("block: don't hold q->sysfs_lock in elevator_init_mq") removes q->sysfs_lock from elevator_init_mq(), but forgot to deal with lockdep_assert_held() called in blk_mq_sched_free_requests() which is run in failure path of elevator_init_mq(). blk_mq_sched_free_requests() is called in the following 3 functions: elevator_init_mq() elevator_exit() blk_cleanup_queue() In blk_cleanup_queue(), blk_mq_sched_free_requests() is followed exactly by 'mutex_lock(&q->sysfs_lock)'. So moving the lockdep_assert_held() from blk_mq_sched_free_requests() into elevator_exit() for fixing the report by syzbot. Reported-by: syzbot+da3b7677bb913dc1b737@syzkaller.appspotmail.com Fixed: c48dac137a62 ("block: don't hold q->sysfs_lock in elevator_init_mq") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-26Merge tag 'perf-core-for-mingo-5.5-20190925' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf record: Stephane Eranian: - Fix priv level with branch sampling for paranoid=2, i.e. the kernel checks if perf_event_attr_attr.exclude_hv is set in addition to .exclude_kernel, so reset both to zero. Arnaldo Carvalho de Melo: - Don't warn about not being able to read kernel maps (kallsyms, etc) when kernel samples aren't being collected. perf list: Kim Phillips: - Allow plurals for metric, metricgroup., i.e.: $ perf list metrics was showing nothing, which is very confusing, make it work like: $ perf stat metric perf stat: Andi Kleen: - Free memory access/leaks detected via valgrind, related to metrics. Libraries: libperf: Jiri Olsa: - Move more stuff from tools/perf, this time a first stab at moving perf_mmap methods. libtracevent: Steven Rostedt (VMware): - Round up in tep_print_event() time precision. Tzvetomir Stoyanov (VMware): - Man pages for event print and related and plugins APIs. - Move traceevent plugins in its own subdirectory. Feature detection: Thomas Richter: - Add detection of java-11-openjdk-devel package, in addition to the older versions supported. Architecture specific: S/390: Thomas Richter (2): - Include JVMTI support for s390 Vendor events: AMD: Kim Phillips: - Add L3 cache events for Family 17h. - Remove redundant '['. PowerPC: Mamatha Inamdar: - Remove P8 HW events which are not supported. Cleanups: Arnaldo Carvalho de Melo: - Remove needless headers, add needed ones, move things around to reduce the headers dependency tree, speeding up builds by not doing needless compiles when unrelated stuff gets changed. - Ditch unused code that was dragging headers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-09-26Merge tag 'drm-fixes-5.4-2019-09-25' of ↵Dave Airlie
git://people.freedesktop.org/~agd5f/linux into drm-next drm-fixes-5.4-2019-09-25: amdgpu: - Fix a 64 bit divide - Prevent a memory leak in a failure case in dc - Load proper gfx firmware on navi14 variants drm-fixes-5.4-2019-09-19: amdgpu: - Add more navi12 and navi14 PCI ids - Misc fixes for renoir - Fix bandwidth issues with multiple displays on vega20 - Support for Dali - Fix a possible oops with KFD on hawaii - Fix for backlight level after resume on some APUs - Other misc fixes Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexdeucher@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190925213500.3490-1-alexander.deucher@amd.com
2019-09-25checkpatch: check for nested (un)?likely() callsDenis Efremov
IS_ERR(), IS_ERR_OR_NULL(), IS_ERR_VALUE() and WARN*() already contain unlikely() optimization internally. Thus, there is no point in calling these functions and defines under likely()/unlikely(). This check is based on the coccinelle rule developed by Enrico Weigelt https://lore.kernel.org/lkml/1559767582-11081-1-git-send-email-info@metux.net/ Link: http://lkml.kernel.org/r/20190829165025.15750-1-efremov@linux.com Signed-off-by: Denis Efremov <efremov@linux.com> Cc: Joe Perches <joe@perches.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Anton Altaparmakov <anton@tuxera.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Boris Pismenny <borisp@mellanox.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Denis Efremov <efremov@linux.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Sean Paul <sean@poorly.run> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25hexagon: drop empty and unused free_initrd_memMike Rapoport
hexagon never reserves or initializes initrd and the only mention of it is the empty free_initrd_mem() function. As we have a generic implementation of free_initrd_mem(), there is no need to define an empty stub for the hexagon implementation and it can be dropped. Link: http://lkml.kernel.org/r/1565858133-25852-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Richard Kuo <rkuo@codeaurora.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: factor out common parts between MADV_COLD and MADV_PAGEOUTMinchan Kim
There are many common parts between MADV_COLD and MADV_PAGEOUT. This patch factor them out to save code duplication. Link: http://lkml.kernel.org/r/20190726023435.214162-6-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: kbuild test robot <lkp@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: introduce MADV_PAGEOUTMinchan Kim
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ [minchan@kernel.org: clear PG_active on MADV_PAGEOUT] Link: http://lkml.kernel.org/r/20190802200643.GA181880@google.com [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-5-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIMMinchan Kim
The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. Link: http://lkml.kernel.org/r/20190726023435.214162-3-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: kbuild test robot <lkp@intel.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: introduce MADV_COLDMinchan Kim
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [akpm@linux-foundation.org: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/20190726023435.214162-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Chris Zankel <chris@zankel.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tim Murray <timmurray@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: untag user pointers in mmap/munmap/mremap/brkCatalin Marinas
There isn't a good reason to differentiate between the user address space layout modification syscalls and the other memory permission/attributes ones (e.g. mprotect, madvise) w.r.t. the tagged address ABI. Untag the user addresses on entry to these functions. Link: http://lkml.kernel.org/r/20190821164730.47450-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Will Deacon <will@kernel.org> Acked-by: Andrey Konovalov <andreyknvl@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Dave P Martin <Dave.Martin@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25vfio/type1: untag user pointers in vaddr_get_pfnAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. vaddr_get_pfn() uses provided user pointers for vma lookups, which can only by done with untagged pointers. Untag user pointers in this function. Link: http://lkml.kernel.org/r/87422b4d72116a975896f2b19b00f38acbd28f33.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Will Deacon <will@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25tee/shm: untag user pointers in tee_shm_registerAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. tee_shm_register()->optee_shm_unregister()->check_mem_type() uses provided user pointers for vma lookups (via __check_mem_type()), which can only by done with untagged pointers. Untag user pointers in this function. Link: http://lkml.kernel.org/r/4b993f33196b3566ac81285ff8453219e2079b45.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Jens Wiklander <jens.wiklander@linaro.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25media/v4l2-core: untag user pointers in videobuf_dma_contig_user_getAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. videobuf_dma_contig_user_get() uses provided user pointers for vma lookups, which can only by done with untagged pointers. Untag the pointers in this function. Link: http://lkml.kernel.org/r/100436d5f8e4349a78f27b0bbb27e4801fcb946b.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25drm/radeon: untag user pointers in radeon_gem_userptr_ioctlAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. In radeon_gem_userptr_ioctl() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This funcation also calls radeon_ttm_tt_pin_userptr(), which uses provided user pointers for vma lookups, which can only by done with untagged pointers. This patch untags user pointers in radeon_gem_userptr_ioctl(). Link: http://lkml.kernel.org/r/c856babeb67195b35603b8d5ba386a2819cec5ff.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Kees Cook <keescook@chromium.org> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25drm/amdgpu: untag user pointersAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. In amdgpu_gem_userptr_ioctl() and amdgpu_amdkfd_gpuvm.c/init_user_pages() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This patch untag user pointers in amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_ alloc_memory_of_gpu() for the KFD case. This also makes sure that an untagged pointer is passed to amdgpu_ttm_tt_get_user_pages(), which uses it for vma lookups. Link: http://lkml.kernel.org/r/d684e1df08f2ecb6bc292e222b64fa9efbc26e69.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25userfaultfd: untag user pointersAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. userfaultfd code use provided user pointers for vma lookups, which can only by done with untagged pointers. Untag user pointers in validate_range(). Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25fs/namespace: untag user pointers in copy_mount_optionsAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. In copy_mount_options a user address is being subtracted from TASK_SIZE. If the address is lower than TASK_SIZE, the size is calculated to not allow the exact_copy_from_user() call to cross TASK_SIZE boundary. However if the address is tagged, then the size will be calculated incorrectly. Untag the address before subtracting. Link: http://lkml.kernel.org/r/1de225e4a54204bfd7f25dac2635e31aa4aa1d90.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: untag user pointers in get_vaddr_framesAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. get_vaddr_frames uses provided user pointers for vma lookups, which can only by done with untagged pointers. Instead of locating and changing all callers of this function, perform untagging in it. Link: http://lkml.kernel.org/r/28f05e49c92b2a69c4703323d6c12208f3d881fe.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: untag user pointers in mm/gup.cAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. mm/gup.c provides a kernel interface that accepts user addresses and manipulates user pages directly (for example get_user_pages, that is used by the futex syscall). Since a user can provided tagged addresses, we need to handle this case. Add untagging to gup.c functions that use user addresses for vma lookups. Link: http://lkml.kernel.org/r/4731bddba3c938658c10ff4ed55cc01c60f4c8f8.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25mm: untag user pointers passed to memory syscallsAndrey Konovalov
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. This patch allows tagged pointers to be passed to the following memory syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect, mremap, msync, munlock, move_pages. The mmap and mremap syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma. Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25lib: untag user pointers in strn*_userAndrey Konovalov
Patch series "arm64: untag user pointers passed to the kernel", v19. === Overview arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces. Right now the kernel is already able to handle user faults with tagged pointers, due to these patches: 1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer") 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers") 3. 276e9327 ("arm64: entry: improve data abort handling of tagged pointers") This patchset extends tagged pointer support to syscall arguments. As per the proposed ABI change [3], tagged pointers are only allowed to be passed to syscalls when they point to memory ranges obtained by anonymous mmap() or sbrk() (see the patchset [3] for more details). For non-memory syscalls this is done by untaging user pointers when the kernel performs pointer checking to find out whether the pointer comes from userspace (most notably in access_ok). The untagging is done only when the pointer is being checked, the tag is preserved as the pointer makes its way through the kernel and stays tagged when the kernel dereferences the pointer when perfoming user memory accesses. The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma. Other memory syscalls (mprotect, etc.) don't do user memory accesses but rather deal with memory ranges, and untagged pointers are better suited to describe memory ranges internally. Thus for memory syscalls we untag pointers completely when they enter the kernel. === Other approaches One of the alternative approaches to untagging that was considered is to completely strip the pointer tag as the pointer enters the kernel with some kind of a syscall wrapper, but that won't work with the countless number of different ioctl calls. With this approach we would need a custom wrapper for each ioctl variation, which doesn't seem practical. An alternative approach to untagging pointers in memory syscalls prologues is to inspead allow tagged pointers to be passed to find_vma() (and other vma related functions) and untag them there. Unfortunately, a lot of find_vma() callers then compare or subtract the returned vma start and end fields against the pointer that was being searched. Thus this approach would still require changing all find_vma() callers. === Testing The following testing approaches has been taken to find potential issues with user pointer untagging: 1. Static testing (with sparse [2] and separately with a custom static analyzer based on Clang) to track casts of __user pointers to integer types to find places where untagging needs to be done. 2. Static testing with grep to find parts of the kernel that call find_vma() (and other similar functions) or directly compare against vm_start/vm_end fields of vma. 3. Static testing with grep to find parts of the kernel that compare user pointers with TASK_SIZE or other similar consts and macros. 4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running a modified syzkaller version that passes tagged pointers to the kernel. Based on the results of the testing the requried patches have been added to the patchset. === Notes This patchset is meant to be merged together with "arm64 relaxed ABI" [3]. This patchset is a prerequisite for ARM's memory tagging hardware feature support [4]. This patchset has been merged into the Pixel 2 & 3 kernel trees and is now being used to enable testing of Pixel phones with HWASan. Thanks! [1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html [2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292 [3] https://lkml.org/lkml/2019/6/12/745 [4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a This patch (of 11) This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. strncpy_from_user and strnlen_user accept user addresses as arguments, and do not go through the same path as copy_from_user and others, so here we need to handle the case of tagged user addresses separately. Untag user pointers passed to these functions. Note, that this patch only temporarily untags the pointers to perform validity checks, but then uses them as is to perform user memory accesses. [andreyknvl@google.com: fix sparc4 build] Link: http://lkml.kernel.org/r/CAAeHK+yx4a-P0sDrXTUxMvO2H0CJZUFPffBrg_cU7oJOZyC7ew@mail.gmail.com Link: http://lkml.kernel.org/r/c5a78bcad3e94d6cda71fcaa60a423231ae71e4c.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jens Wiklander <jens.wiklander@linaro.org> Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-25lib/lzo/lzo1x_compress.c: fix alignment bug in lzo-rleDave Rodgman
Fix an unaligned access which breaks on platforms where this is not permitted (e.g., Sparc). Link: http://lkml.kernel.org/r/20190912145502.35229-1-dave.rodgman@arm.com Signed-off-by: Dave Rodgman <dave.rodgman@arm.com> Cc: Dave Rodgman <dave.rodgman@arm.com> Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>