Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
|
|
Most architectures only support a single hardcoded page size. In order
to ensure that each one of these sets the corresponding Kconfig symbols,
change over the PAGE_SHIFT definition to the common one and allow
only the hardware page size to be selected.
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Stafford Horne <shorne@gmail.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Introduce a generic way to query whether the data cache is virtually
aliased on all architectures. Its purpose is to ensure that subsystems
which are incompatible with virtually aliased data caches (e.g. FS_DAX)
can reliably query this.
For data cache aliasing, there are three scenarios dependending on the
architecture. Here is a breakdown based on my understanding:
A) The data cache is always aliasing:
* arc
* csky
* m68k (note: shared memory mappings are incoherent ? SHMLBA is missing there.)
* sh
* parisc
B) The data cache aliasing is statically known or depends on querying CPU
state at runtime:
* arm (cache_is_vivt() || cache_is_vipt_aliasing())
* mips (cpu_has_dc_aliases)
* nios2 (NIOS2_DCACHE_SIZE > PAGE_SIZE)
* sparc32 (vac_cache_size > PAGE_SIZE)
* sparc64 (L1DCACHE_SIZE > PAGE_SIZE)
* xtensa (DCACHE_WAY_SIZE > PAGE_SIZE)
C) The data cache is never aliasing:
* alpha
* arm64 (aarch64)
* hexagon
* loongarch (but with incoherent write buffers, which are disabled since
commit d23b7795 ("LoongArch: Change SHMLBA from SZ_64K to PAGE_SIZE"))
* microblaze
* openrisc
* powerpc
* riscv
* s390
* um
* x86
Require architectures in A) and B) to select ARCH_HAS_CPU_CACHE_ALIASING and
implement "cpu_dcache_is_aliasing()".
Architectures in C) don't select ARCH_HAS_CPU_CACHE_ALIASING, and thus
cpu_dcache_is_aliasing() simply evaluates to "false".
Note that this leaves "cpu_icache_is_aliasing()" to be implemented as future
work. This would be useful to gate features like XIP on architectures
which have aliasing CPU dcache-icache but not CPU dcache-dcache.
Use "cpu_dcache" and "cpu_cache" rather than just "dcache" and "cache"
to clarify that we really mean "CPU data cache" and "CPU cache" to
eliminate any possible confusion with VFS "dentry cache" and "page
cache".
Link: https://lore.kernel.org/lkml/20030910210416.GA24258@mail.jlokier.co.uk/
Link: https://lkml.kernel.org/r/20240215144633.96437-9-mathieu.desnoyers@efficios.com
Fixes: d92576f1167c ("dax: does not work correctly with virtual aliasing caches")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Cc: Michael Sclafani <dm-devel@lists.linux.dev>
Cc: Mike Snitzer <snitzer@kernel.org>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This modifies our user mode stack expansion code to always take the
mmap_lock for writing before modifying the VM layout.
It's actually something we always technically should have done, but
because we didn't strictly need it, we were being lazy ("opportunistic"
sounds so much better, doesn't it?) about things, and had this hack in
place where we would extend the stack vma in-place without doing the
proper locking.
And it worked fine. We just needed to change vm_start (or, in the case
of grow-up stacks, vm_end) and together with some special ad-hoc locking
using the anon_vma lock and the mm->page_table_lock, it all was fairly
straightforward.
That is, it was all fine until Ruihan Li pointed out that now that the
vma layout uses the maple tree code, we *really* don't just change
vm_start and vm_end any more, and the locking really is broken. Oops.
It's not actually all _that_ horrible to fix this once and for all, and
do proper locking, but it's a bit painful. We have basically three
different cases of stack expansion, and they all work just a bit
differently:
- the common and obvious case is the page fault handling. It's actually
fairly simple and straightforward, except for the fact that we have
something like 24 different versions of it, and you end up in a maze
of twisty little passages, all alike.
- the simplest case is the execve() code that creates a new stack.
There are no real locking concerns because it's all in a private new
VM that hasn't been exposed to anybody, but lockdep still can end up
unhappy if you get it wrong.
- and finally, we have GUP and page pinning, which shouldn't really be
expanding the stack in the first place, but in addition to execve()
we also use it for ptrace(). And debuggers do want to possibly access
memory under the stack pointer and thus need to be able to expand the
stack as a special case.
None of these cases are exactly complicated, but the page fault case in
particular is just repeated slightly differently many many times. And
ia64 in particular has a fairly complicated situation where you can have
both a regular grow-down stack _and_ a special grow-up stack for the
register backing store.
So to make this slightly more manageable, the bulk of this series is to
first create a helper function for the most common page fault case, and
convert all the straightforward architectures to it.
Thus the new 'lock_mm_and_find_vma()' helper function, which ends up
being used by x86, arm, powerpc, mips, riscv, alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa. So we not only convert more
than half the architectures, we now have more shared code and avoid some
of those twisty little passages.
And largely due to this common helper function, the full diffstat of
this series ends up deleting more lines than it adds.
That still leaves eight architectures (ia64, m68k, microblaze, openrisc,
parisc, s390, sparc64 and um) that end up doing 'expand_stack()'
manually because they are doing something slightly different from the
normal pattern. Along with the couple of special cases in execve() and
GUP.
So there's a couple of patches that first create 'locked' helper
versions of the stack expansion functions, so that there's a obvious
path forward in the conversion. The execve() case is then actually
pretty simple, and is a nice cleanup from our old "grow-up stackls are
special, because at execve time even they grow down".
The #ifdef CONFIG_STACK_GROWSUP in that code just goes away, because
it's just more straightforward to write out the stack expansion there
manually, instead od having get_user_pages_remote() do it for us in some
situations but not others and have to worry about locking rules for GUP.
And the final step is then to just convert the remaining odd cases to a
new world order where 'expand_stack()' is called with the mmap_lock held
for reading, but where it might drop it and upgrade it to a write, only
to return with it held for reading (in the success case) or with it
completely dropped (in the failure case).
In the process, we remove all the stack expansion from GUP (where
dropping the lock wouldn't be ok without special rules anyway), and add
it in manually to __access_remote_vm() for ptrace().
Thanks to Adrian Glaubitz and Frank Scheiner who tested the ia64 cases.
Everything else here felt pretty straightforward, but the ia64 rules for
stack expansion are really quite odd and very different from everything
else. Also thanks to Vegard Nossum who caught me getting one of those
odd conditions entirely the wrong way around.
Anyway, I think I want to actually move all the stack expansion code to
a whole new file of its own, rather than have it split up between
mm/mmap.c and mm/memory.c, but since this will have to be backported to
the initial maple tree vma introduction anyway, I tried to keep the
patches _fairly_ minimal.
Also, while I don't think it's valid to expand the stack from GUP, the
final patch in here is a "warn if some crazy GUP user wants to try to
expand the stack" patch. That one will be reverted before the final
release, but it's left to catch any odd cases during the merge window
and release candidates.
Reported-by: Ruihan Li <lrh2000@pku.edu.cn>
* branch 'expand-stack':
gup: add warning if some caller would seem to want stack expansion
mm: always expand the stack with the mmap write lock held
execve: expand new process stack manually ahead of time
mm: make find_extend_vma() fail if write lock not held
powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
arm/mm: Convert to using lock_mm_and_find_vma()
riscv/mm: Convert to using lock_mm_and_find_vma()
mips/mm: Convert to using lock_mm_and_find_vma()
powerpc/mm: Convert to using lock_mm_and_find_vma()
arm64/mm: Convert to using lock_mm_and_find_vma()
mm: make the page fault mmap locking killable
mm: introduce new 'lock_mm_and_find_vma()' page fault helper
|
|
This does the simple pattern conversion of alpha, arc, csky, hexagon,
loongarch, nios2, sh, sparc32, and xtensa to the lock_mm_and_find_vma()
helper. They all have the regular fault handling pattern without odd
special cases.
The remaining architectures all have something that keeps us from a
straightforward conversion: ia64 and parisc have stacks that can grow
both up as well as down (and ia64 has special address region checks).
And m68k, microblaze, openrisc, sparc64, and um end up having extra
rules about only expanding the stack down a limited amount below the
user space stack pointer. That is something that x86 used to do too
(long long ago), and it probably could just be skipped, but it still
makes the conversion less than trivial.
Note that this conversion was done manually and with the exception of
alpha without any build testing, because I have a fairly limited cross-
building environment. The cases are all simple, and I went through the
changes several times, but...
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.747254502@linutronix.de
|
|
Pull arch/csky updates from Guo Ren:
- Remove CPU_TLB_SIZE config
- Prevent spurious page faults
* tag 'csky-for-linus-6.4' of https://github.com/c-sky/csky-linux:
csky: mmu: Prevent spurious page faults
csky: remove obsolete config CPU_TLB_SIZE
|
|
The default value of ARCH_FORCE_MAX_ORDER matches the generic default
defined in the MM code, the architecture does not support huge pages, so
there is no need to keep ARCH_FORCE_MAX_ORDER option available.
Drop it.
Link: https://lkml.kernel.org/r/20230324052233.2654090-5-rppt@kernel.org
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: David Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Rich Felker <dalias@libc.org>
Cc: "Russell King (Oracle)" <linux@armlinux.org.uk>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Commit 9d35dc3006a9 ("csky: Revert mmu ASID mechanism") removes the only
use of CONFIG_CPU_TLB_SIZE. Since then, this config has no effect and can
be deleted.
Remove the obsolete config CPU_TLB_SIZE.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
MAX_ORDER currently defined as number of orders page allocator supports:
user can ask buddy allocator for page order between 0 and MAX_ORDER-1.
This definition is counter-intuitive and lead to number of bugs all over
the kernel.
Change the definition of MAX_ORDER to be inclusive: the range of orders
user can ask from buddy allocator is 0..MAX_ORDER now.
[kirill@shutemov.name: fix min() warning]
Link: https://lkml.kernel.org/r/20230315153800.32wib3n5rickolvh@box
[akpm@linux-foundation.org: fix another min_t warning]
[kirill@shutemov.name: fixups per Zi Yan]
Link: https://lkml.kernel.org/r/20230316232144.b7ic4cif4kjiabws@box.shutemov.name
[akpm@linux-foundation.org: fix underlining in docs]
Link: https://lore.kernel.org/oe-kbuild-all/202303191025.VRCTk6mP-lkp@intel.com/
Link: https://lkml.kernel.org/r/20230315113133.11326-11-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This reverts commit 9866d141a0977ace974400bf1f793dfc163409ce.
The csky rseq support has been merged without ever notifying the rseq
maintainers, and without any of the required asssembler glue in the rseq
selftests, which means it is entirely untested.
It is also derived from a non-upstream riscv patch which has known bugs.
The assembly part of this revert should be carefully reviewed by the
architecture maintainer because it touches code which has changed since
the merge of the reverted patch.
The rseq selftests assembly glue should be introduced at the same time
as the architecture rseq support. Without the presence of any test, I
recommend reverting rseq support from csky for now.
Link: https://lore.kernel.org/lkml/1257037909.25426.1626705790861.JavaMail.zimbra@efficios.com/
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-csky@vger.kernel.org
|
|
There is a spelling mistake in a Kconfig option description. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
To follow the existing per-arch conventions, using "current_stack_pointer"
to set sp.
This will let it be used in non-arch places(like HARDENED_USERCOPY).
Refer to the implementation of riscv commit fdecfea09328 ("riscv: Rename
"sp_in_global" to "current_stack_pointer"").
Link: https://lore.kernel.org/lkml/20220224060411.1855683-1-keescook@chromium.org/
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
This Kconfig option is used by individual arch to set its desired
MAX_ORDER. Rename it to reflect its actual use.
Link: https://lkml.kernel.org/r/20220815143959.1511278-1-zi.yan@sent.com
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: Guo Ren <guoren@kernel.org> [csky]
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com> [arm64]
Acked-by: Huacai Chen <chenhuacai@kernel.org> [LoongArch]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Taichi Sugaya <sugaya.taichi@socionext.com>
Cc: Neil Armstrong <narmstrong@baylibre.com>
Cc: Qin Jian <qinjian@cqplus1.com>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Zankel <chris@zankel.net>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Pull csky updates from Guo Ren:
- Add jump-label implementation
- Add qspinlock support
- Enable ARCH_INLINE_READ*/WRITE*/SPIN*
- Some fixups and a coding convention
* tag 'csky-for-linus-6.0-rc1' of https://github.com/c-sky/csky-linux:
csky: abiv1: Fixup compile error
csky: cmpxchg: Coding convention for BUILD_BUG()
csky: Enable ARCH_INLINE_READ*/WRITE*/SPIN*
csky: Add qspinlock support
csky: Add jump-label implementation
csky: Move HEAD_TEXT_SECTION out of __init_begin-end
csky: Correct position of _stext
csky: Use the bitmap API to allocate bitmaps
csky/kprobe: reclaim insn_slot on kprobe unregistration
|
|
Enable ARCH_INLINE_READ*/WRITE*/SPIN* when !PREEMPTION, it is copied
from arch/arm64. It could reduce procedure calls and improves
performance.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Enable qspinlock by the requirements mentioned in a8ad07e5240c9
("asm-generic: qspinlock: Indicate the use of mixed-size atomics").
C-SKY only has "ldex/stex" for all atomic operations. So csky give a
strong forward guarantee for "ldex/stex." That means when ldex grabbed
the cache line into $L1, it would block other cores from snooping the
address with several cycles. The atomic_fetch_add & xchg16 has the same
forward guarantee level in C-SKY.
Qspinlock has better code size and performance in a fast path.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Add jump-label implementation for static branch
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Context tracking is going to be used not only to track user transitions
but also idle/IRQs/NMIs. The user tracking part will then become a
separate feature. Prepare Kconfig for that.
[ frederic: Apply Max Filippov feedback. ]
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Cc: Yu Liao <liaoyu15@huawei.com>
Cc: Phil Auld <pauld@redhat.com>
Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
Cc: Alex Belits <abelits@marvell.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Tested-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
|
|
Try to access RAM with the largest bit width possible, but without
doing unaligned accesses.
A further improvement could be to use multiple read and writes as the
assembly version was trying to do.
Tested on a BeagleV Starlight with a SiFive U74 core, where the
improvement is noticeable.
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
There are no remaining callers of set_fs(), so CONFIG_SET_FS
can be removed globally, along with the thread_info field and
any references to it.
This turns access_ok() into a cheaper check against TASK_SIZE_MAX.
As CONFIG_SET_FS is now gone, drop all remaining references to
set_fs()/get_fs(), mm_segment_t, user_addr_max() and uaccess_kernel().
Acked-by: Sam Ravnborg <sam@ravnborg.org> # for sparc32 changes
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Tested-by: Sergey Matyukevich <sergey.matyukevich@synopsys.com> # for arc changes
Acked-by: Stafford Horne <shorne@gmail.com> # [openrisc, asm-generic]
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Now that all architectures have a working futex implementation in any
configuration, remove the runtime detection code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Vineet Gupta <vgupta@kernel.org>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Link: https://lore.kernel.org/r/20211026100432.1730393-2-arnd@kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core changes:
- Prevent a potential deadlock when initial priority is assigned to a
newly created interrupt thread. A recent change to plug a race
between cpuset and __sched_setscheduler() introduced a new lock
dependency which is now triggered. Break the lock dependency chain
by moving the priority assignment to the thread function.
- A couple of small updates to make the irq core RT safe.
- Confine the irq_cpu_online/offline() API to the only left unfixable
user Cavium Octeon so that it does not grow new usage.
- A small documentation update
Driver changes:
- A large cross architecture rework to move irq_enter/exit() into the
architecture code to make addressing the NOHZ_FULL/RCU issues
simpler.
- The obligatory new irq chip driver for Microchip EIC
- Modularize a few irq chip drivers
- Expand usage of devm_*() helpers throughout the driver code
- The usual small fixes and improvements all over the place"
* tag 'irq-core-2021-10-31' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
h8300: Fix linux/irqchip.h include mess
dt-bindings: irqchip: renesas-irqc: Document r8a774e1 bindings
MIPS: irq: Avoid an unused-variable error
genirq: Hide irq_cpu_{on,off}line() behind a deprecated option
irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()
MIPS: loongson64: Drop call to irq_cpu_offline()
irq: remove handle_domain_{irq,nmi}()
irq: remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
irq: riscv: perform irqentry in entry code
irq: openrisc: perform irqentry in entry code
irq: csky: perform irqentry in entry code
irq: arm64: perform irqentry in entry code
irq: arm: perform irqentry in entry code
irq: add a (temporary) CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY
irq: nds32: avoid CONFIG_HANDLE_DOMAIN_IRQ
irq: arc: avoid CONFIG_HANDLE_DOMAIN_IRQ
irq: add generic_handle_arch_irq()
irq: unexport handle_irq_desc()
irq: simplify handle_domain_{irq,nmi}()
irq: mips: simplify do_domain_IRQ()
...
|
|
Now that entry code handles IRQ entry (including setting the IRQ regs)
before calling irqchip code, irqchip code can safely call
generic_handle_domain_irq(), and there's no functional reason for it to
call handle_domain_irq().
Let's cement this split of responsibility and remove handle_domain_irq()
entirely, updating irqchip drivers to call generic_handle_domain_irq().
For consistency, handle_domain_nmi() is similarly removed and replaced
with a generic_handle_domain_nmi() function which also does not perform
any entry logic.
Previously handle_domain_{irq,nmi}() had a WARN_ON() which would fire
when they were called in an inappropriate context. So that we can
identify similar issues going forward, similar WARN_ON_ONCE() logic is
added to the generic_handle_*() functions, and comments are updated for
clarity and consistency.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
In preparation for removing HANDLE_DOMAIN_IRQ_IRQENTRY, have arch/csky
perform all the irqentry accounting in its entry code. As arch/csky uses
GENERIC_IRQ_MULTI_HANDLER, we can use generic_handle_arch_irq() to do
so.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
Going forward we want architecture/entry code to perform all the
necessary work to enter/exit IRQ context, with irqchip code merely
handling the mapping of the interrupt to any handler(s). Among other
reasons, this is necessary to consistently fix some longstanding issues
with the ordering of lockdep/RCU/tracing instrumentation which many
architectures get wrong today in their entry code.
Importantly, rcu_irq_{enter,exit}() must be called precisely once per
IRQ exception, so that rcu_is_cpu_rrupt_from_idle() can correctly
identify when an interrupt was taken from an idle context which must be
explicitly preempted. Currently handle_domain_irq() calls
rcu_irq_{enter,exit}() via irq_{enter,exit}(), but entry code needs to
be able to call rcu_irq_{enter,exit}() earlier for correct ordering
across lockdep/RCU/tracing updates for sequences such as:
lockdep_hardirqs_off(CALLER_ADDR0);
rcu_irq_enter();
trace_hardirqs_off_finish();
To permit each architecture to be converted to the new style in turn,
this patch adds a new CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY selected by all
current users of HANDLE_DOMAIN_IRQ, which gates the existing behaviour.
When CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY is not selected,
handle_domain_irq() requires entry code to perform the
irq_{enter,exit}() work, with an explicit check for this matching the
style of handle_domain_nmi().
Subsequent patches will:
1) Add the necessary IRQ entry accounting to each architecture in turn,
dropping CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY from that architecture's
Kconfig.
2) Remove CONFIG_HANDLE_DOMAIN_IRQ_IRQENTRY once it is no longer
selected.
3) Convert irqchip drivers to consistently use
generic_handle_domain_irq() rather than handle_domain_irq().
4) Remove handle_domain_irq() and CONFIG_HANDLE_DOMAIN_IRQ.
... which should leave us with a clear split of responsiblity across the
entry and irqchip code, making it possible to perform additional
cleanups and fixes for the aforementioned longstanding issues with entry
code.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
Building csky:allmodconfig results in the following build errors.
arch/csky/mm/tcm.c:9:2: error:
#error "You should define ITCM_RAM_BASE"
9 | #error "You should define ITCM_RAM_BASE"
| ^~~~~
arch/csky/mm/tcm.c:14:2: error:
#error "You should define DTCM_RAM_BASE"
14 | #error "You should define DTCM_RAM_BASE"
| ^~~~~
arch/csky/mm/tcm.c:18:2: error:
#error "You should define correct DTCM_RAM_BASE"
18 | #error "You should define correct DTCM_RAM_BASE"
This is seen with compile tests since those enable HAVE_TCM,
but do not provide useful default values for ITCM_RAM_BASE or
DTCM_RAM_BASE. Disable HAVE_TCM for commpile tests to avoid
the error.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Compiling csky:allmodconfig with an upstream C compiler results
in the following error.
csky-linux-gcc: error:
unrecognized command-line option '-mbacktrace';
did you mean '-fbacktrace'?
Select ARCH_WANT_FRAME_POINTERS only if gcc supports it to
avoid the error.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guo Ren <guoren@kernel.org>
|
|
Make architectures select TRACE_IRQFLAGS_SUPPORT instead of
having many defines.
Link: https://lkml.kernel.org/r/20210731052233.4703-2-masahiroy@kernel.org
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com> #arch/arc
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
|
|
- GENERIC_ALLOCATOR is duplicated
- Remove USB_ARCH_HAS_OHCI & USB_ARCH_HAS_EHCI, because they
have been removed from linux.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
e1000's #define of CONFIG_RAM_BASE conflicts with a Kconfig symbol in
arch/csky/Kconfig.
The symbol in e1000 has been around longer, so change arch/csky/ to use
DRAM_BASE instead of RAM_BASE to remove the conflict. (although e1000
is also a 2-line change)
Link: https://lkml.kernel.org/r/20210411055335.7111-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Guo Ren <guoren@kernel.org>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
HAVE_GENERIC_VDSO
It could help to reduce the latency of the time-related functions
in user space.
We have referenced arm's and riscv's implementation for the patch.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Vincent Chen <vincent.chen@sifive.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
Here is the log after enabled:
[ 1.798972] kmemleak: Kernel memory leak detector initialized (mem pool available: 15851)
[ 1.798983] kmemleak: Automatic memory scanning thread started
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
There are two implementation of spinlock in arch/csky:
- simple one (NR_CPU = 1,2)
- tick's one (NR_CPU = 3,4)
Remove the simple one.
There is already smp_mb in spinlock, so remove the definition of
smp_mb__after_spinlock.
Link: https://lore.kernel.org/linux-csky/20200807081253.GD2674@hirez.programming.kicks-ass.net/#t
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Peter Zijlstra <peterz@infradead.org>k
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
Arnd said:
I would guess that for csky, this is a mistake, as the architecture
is fairly new and should be able to implement it.
Guo reply:
The c610, c807, c810 don't support SMP, so futex_cmpxchg_enabled = 1
with asm-generic's implementation.
For c860, there is no HAVE_FUTEX_CMPXCHG and cmpxchg_inatomic/inuser
implementation, so futex_cmpxchg_enabled = 0.
Thx for point it out, we'll implement cmpxchg_inatomic/inuser for
C860 and still use asm-generic for non-smp CPUs.
LTP test:
futex_wait01 1 TPASS : futex_wait(): errno=ETIMEDOUT(110): Connection timed out
futex_wait01 2 TPASS : futex_wait(): errno=EAGAIN/EWOULDBLOCK(11): Resource temporarily unavailable
futex_wait01 3 TPASS : futex_wait(): errno=ETIMEDOUT(110): Connection timed out
futex_wait01 4 TPASS : futex_wait(): errno=EAGAIN/EWOULDBLOCK(11): Resource temporarily unavailable
futex_wait02 1 TPASS : futex_wait() woken up
futex_wait03 1 TPASS : futex_wait() woken up
futex_wait04 1 TPASS : futex_wait() returned -1: errno=EAGAIN/EWOULDBLOCK(11): Resource temporarily unavailable
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/lkml/CAK8P3a3+WaQNyJ6Za2qfu6=0mBgU1hApnRXrdp1b1=P7wwyRUg@mail.gmail.com/
|
|
There are two ways for translating va to pa for csky:
- Use TLB(Translate Lookup Buffer) and PTW (Page Table Walk)
- Use SSEG0/1 (Simple Segment Mapping)
We use tlb mapping 0-2G and 3G-4G virtual address area and SSEG0/1
are for 2G-2.5G and 2.5G-3G translation. We could disable SSEG0
to use 2G-2.5G as TLB user mapping.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic cross-architecture timer cleanup from Arnd Bergmann:
"This cleans up two ancient timer features that were never completed in
the past, CONFIG_GENERIC_CLOCKEVENTS and CONFIG_ARCH_USES_GETTIMEOFFSET.
There was only one user left for the ARCH_USES_GETTIMEOFFSET variant
of clocksource implementations, the ARM EBSA110 platform. Rather than
changing to use modern timekeeping, we remove the platform entirely as
Russell no longer uses his machine and nobody else seems to have one
any more.
The conditional code for using arch_gettimeoffset() is removed as a
result.
For CONFIG_GENERIC_CLOCKEVENTS, there are still a couple of platforms
not using clockevent drivers: parisc, ia64, most of m68k, and one Arm
platform. These all do timer ticks slighly differently, and this gets
cleaned up to the point they at least all call the same helper
function.
Instead of most platforms using 'select GENERIC_CLOCKEVENTS' in
Kconfig, the polarity is now reversed, with the few remaining ones
selecting LEGACY_TIMER_TICK instead"
* tag 'asm-generic-timers-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
timekeeping: default GENERIC_CLOCKEVENTS to enabled
timekeeping: remove xtime_update
m68k: remove timer_interrupt() function
m68k: change remaining timers to legacy_timer_tick
m68k: m68328: use legacy_timer_tick()
m68k: sun3/sun3c: use legacy_timer_tick
m68k: split heartbeat out of timer function
m68k: coldfire: use legacy_timer_tick()
parisc: use legacy_timer_tick
ARM: rpc: use legacy_timer_tick
ia64: convert to legacy_timer_tick
timekeeping: add CONFIG_LEGACY_TIMER_TICK
timekeeping: remove arch_gettimeoffset
net: remove am79c961a driver
ARM: remove ebsa110 platform
|
|
No reason having the same code in every architecture.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Guo Ren <guoren@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20201103095857.681196473@linutronix.de
|
|
Almost all machines use GENERIC_CLOCKEVENTS, so it feels wrong to
require each one to select that symbol manually.
Instead, enable it whenever CONFIG_LEGACY_TIMER_TICK is disabled as
a simplification. It should be possible to select both
GENERIC_CLOCKEVENTS and LEGACY_TIMER_TICK from an architecture now
and decide at runtime between the two.
For the clockevents arch-support.txt file, this means that additional
architectures are marked as TODO when they have at least one machine
that still uses LEGACY_TIMER_TICK, rather than being marked 'ok' when
at least one machine has been converted. This means that both m68k and
arm (for riscpc) revert to TODO.
At this point, we could just always enable CONFIG_GENERIC_CLOCKEVENTS
rather than leaving it off when not needed. I built an m68k
defconfig kernel (using gcc-10.1.0) and found that this would add
around 5.5KB in kernel image size:
text data bss dec hex filename
3861936 1092236 196656 5150828 4e986c obj-m68k/vmlinux-no-clockevent
3866201 1093832 196184 5156217 4ead79 obj-m68k/vmlinux-clockevent
On Arm (MACH_RPC), that difference appears to be twice as large,
around 11KB on top of an 6MB vmlinux.
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull initial set_fs() removal from Al Viro:
"Christoph's set_fs base series + fixups"
* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Allow a NULL pos pointer to __kernel_read
fs: Allow a NULL pos pointer to __kernel_write
powerpc: remove address space overrides using set_fs()
powerpc: use non-set_fs based maccess routines
x86: remove address space overrides using set_fs()
x86: make TASK_SIZE_MAX usable from assembly code
x86: move PAGE_OFFSET, TASK_SIZE & friends to page_{32,64}_types.h
lkdtm: remove set_fs-based tests
test_bitmap: remove user bitmap tests
uaccess: add infrastructure for kernel builds with set_fs()
fs: don't allow splice read/write without explicit ops
fs: don't allow kernel reads and writes without iter ops
sysctl: Convert to iter interfaces
proc: add a read_iter method to proc proc_ops
proc: cleanup the compat vs no compat file ops
proc: remove a level of indentation in proc_get_inode
|
|
In order to make adding configurable features into seccomp easier,
it's better to have the options at one single location, considering
especially that the bulk of seccomp code is arch-independent. An quick
look also show that many SECCOMP descriptions are outdated; they talk
about /proc rather than prctl.
As a result of moving the config option and keeping it default on,
architectures arm, arm64, csky, riscv, sh, and xtensa did not have SECCOMP
on by default prior to this and SECCOMP will be default in this change.
Architectures microblaze, mips, powerpc, s390, sh, and sparc have an
outdated depend on PROC_FS and this dependency is removed in this change.
Suggested-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/lkml/CAG48ez1YWz9cnp08UZgeieYRhHdqh-ch7aNwc4JRBnGyrmgfMg@mail.gmail.com/
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
[kees: added HAVE_ARCH_SECCOMP help text, tweaked wording]
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/9ede6ef35c847e58d61e476c6a39540520066613.1600951211.git.yifeifz2@illinois.edu
|
|
Add a CONFIG_SET_FS option that is selected by architecturess that
implement set_fs, which is all of them initially. If the option is not
set stubs for routines related to overriding the address space are
provided so that architectures can start to opt out of providing set_fs.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Pull arch/csky updates from Guo Ren:
"New features:
- seccomp-filter
- err-injection
- top-down&random mmap-layout
- irq_work
- show_ipi
- context-tracking
Fixes & Optimizations:
- kprobe_on_ftrace
- optimize panic print"
* tag 'csky-for-linus-5.9-rc1' of https://github.com/c-sky/csky-linux:
csky: Add context tracking support
csky: Add arch_show_interrupts for IPI interrupts
csky: Add irq_work support
csky: Fixup warning by EXPORT_SYMBOL(kmap)
csky: Set CONFIG_NR_CPU 4 as default
csky: Use top-down mmap layout
csky: Optimize the trap processing flow
csky: Add support for function error injection
csky: Fixup kprobes handler couldn't change pc
csky: Fixup duplicated restore sp in RESTORE_REGS_FTRACE
csky: Add cpu feature register hint for smp
csky: Add SECCOMP_FILTER supported
csky: remove unusued thread_saved_pc and *_segments functions/macros
|
|
This patch support context tracking with no hz full.
Here is the test result with dynticks-testing (see tick_stop):
cat /sys/kernel/debug/tracing/per_cpu/cpu0/trace
tracer: nop
entries-in-buffer/entries-written: 356/356 #P:1
_-----=> irqs-off
/ _----=> need-resched
| / _---=> hardirq/softirq
|| / _--=> preempt-depth
||| / delay
TASK-PID CPU# |||| TIMESTAMP FUNCTION
| | | |||| | |
...
sleep-192 [000] d.h. 167.088270: hrtimer_expire_entry: hrtimer=(ptrval) function=tick_sched_timer now=166436355700
sleep-192 [000] d.h. 167.092279: hrtimer_expire_entry: hrtimer=(ptrval) function=tick_sched_timer now=166440365700
<idle>-0 [000] d.h2 167.096492: hrtimer_expire_entry: hrtimer=(ptrval) function=tick_sched_timer now=166444578400
<idle>-0 [000] d..1 167.097876: tick_stop: success=1 dependency=NONE
^^^^^^^^^
<idle>-0 [000] d.h1 168.818206: hrtimer_expire_entry: hrtimer=(ptrval) function=tick_sched_timer now=168166280900
kworker/u2:0-7 [000] .... 168.821760: workqueue_execute_start: work struct (ptrval): function wb_workfn
kworker/u2:0-7 [000] d.h1 168.824464: hrtimer_expire_entry: hrtimer=(ptrval) function=tick_sched_timer now=168172547100
kworker/0:1-18 [000] .... 168.825053: workqueue_execute_start: work struct (ptrval): function vmstat_update
kworker/0:1-18 [000] .... 168.825238: workqueue_execute_start: work struct (ptrval): function vmstat_shepherd
kworker/0:1-18 [000] .... 168.825516: workqueue_execute_start: work struct (ptrval): function neigh_periodic_work
<idle>-0 [000] d..1 168.826121: tick_stop: success=1 dependency=NONE
kworker/u2:0-7 [000] .... 169.377327: workqueue_execute_start: work struct (ptrval): function flush_to_ldisc
<idle>-0 [000] d..1 169.379832: tick_stop: success=1 dependency=NONE
kworker/u2:0-7 [000] .... 169.607935: workqueue_execute_start: work struct (ptrval): function flush_to_ldisc
kworker/u2:0-7 [000] d.h1 169.608148: hrtimer_expire_entry: hrtimer=(ptrval) function=tick_sched_timer now=168956235500
<idle>-0 [000] d..1 169.608654: tick_stop: success=1 dependency=NONE
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Greentime Hu <greentime.hu@sifive.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
The C860 processors support 4 cores smp for maximum, so set NR_CPU
to 4 as default
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
Follow riscv mmap layout with commit "riscv: make mmap allocation
top-down by default (54c95a11cc1b)".
Before:
cat /proc/self/maps
00008000-000dc000 r-xp 00000000 fe:00 17 /bin/busybox
000dc000-000dd000 r--p 000d3000 fe:00 17 /bin/busybox
000dd000-000de000 rw-p 000d4000 fe:00 17 /bin/busybox
2aaa8000-2aac6000 r-xp 00000000 fe:00 196 /lib/ld-2.28.9000.so
2aac6000-2aac7000 r-xp 00000000 00:00 0 [vdso]
2aac7000-2aac8000 r--p 0001e000 fe:00 196 /lib/ld-2.28.9000.so
2aac8000-2aac9000 rw-p 0001f000 fe:00 196 /lib/ld-2.28.9000.so
2aac9000-2aad9000 r-xp 00000000 fe:00 219 /lib/libresolv-2.28.9000.so
2aad9000-2aada000 r--p 0000f000 fe:00 219 /lib/libresolv-2.28.9000.so
2aada000-2aadb000 rw-p 00010000 fe:00 219 /lib/libresolv-2.28.9000.so
2aadb000-2aadd000 rw-p 00000000 00:00 0
2aadd000-2ac27000 r-xp 00000000 fe:00 203 /lib/libc-2.28.9000.so
2ac27000-2ac28000 ---p 0014a000 fe:00 203 /lib/libc-2.28.9000.so
2ac28000-2ac2a000 r--p 0014a000 fe:00 203 /lib/libc-2.28.9000.so
2ac2a000-2ac2b000 rw-p 0014c000 fe:00 203 /lib/libc-2.28.9000.so
2ac2b000-2ac2e000 rw-p 00000000 00:00 0
7fb99000-7fbba000 rwxp 00000000 00:00 0 [stack]
After:
cat /proc/self/maps
00008000-000dc000 r-xp 00000000 fe:00 17 /bin/busybox
000dc000-000dd000 r--p 000d3000 fe:00 17 /bin/busybox
000dd000-000de000 rw-p 000d4000 fe:00 17 /bin/busybox
77e13000-77f5d000 r-xp 00000000 fe:00 203 /lib/libc-2.28.9000.so
77f5d000-77f5e000 ---p 0014a000 fe:00 203 /lib/libc-2.28.9000.so
77f5e000-77f60000 r--p 0014a000 fe:00 203 /lib/libc-2.28.9000.so
77f60000-77f61000 rw-p 0014c000 fe:00 203 /lib/libc-2.28.9000.so
77f61000-77f66000 rw-p 00000000 00:00 0
77f66000-77f76000 r-xp 00000000 fe:00 219 /lib/libresolv-2.28.9000.so
77f76000-77f77000 r--p 0000f000 fe:00 219 /lib/libresolv-2.28.9000.so
77f77000-77f78000 rw-p 00010000 fe:00 219 /lib/libresolv-2.28.9000.so
77f78000-77f96000 r-xp 00000000 fe:00 196 /lib/ld-2.28.9000.so
77f96000-77f97000 r-xp 00000000 00:00 0 [vdso]
77f97000-77f98000 r--p 0001e000 fe:00 196 /lib/ld-2.28.9000.so
77f98000-77f99000 rw-p 0001f000 fe:00 196 /lib/ld-2.28.9000.so
7fd7b000-7fd9c000 rwxp 00000000 00:00 0 [stack]
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
Inspired by the commit 42d038c4fb00 ("arm64: Add support for function
error injection"), this patch supports function error injection for
csky.
This patch mainly support two functions: one is regs_set_return_value()
which is used to overwrite the return value; the another function is
override_function_with_return() which is to override the probed
function returning and jump to its caller.
Test log:
cd /sys/kernel/debug/fail_function/
echo sys_clone > inject
echo 100 > probability
echo 1 > interval
ls /
[ 108.644163] FAULT_INJECTION: forcing a failure.
[ 108.644163] name fail_function, interval 1, probability 100, space 0, times 1
[ 108.647799] CPU: 0 PID: 104 Comm: sh Not tainted 5.8.0-rc5+ #46
[ 108.648384] Call Trace:
[ 108.649339] [<8005eed4>] walk_stackframe+0x0/0xf0
[ 108.649679] [<8005f16a>] show_stack+0x32/0x5c
[ 108.649927] [<8040f9d2>] dump_stack+0x6e/0x9c
[ 108.650271] [<80406f7e>] should_fail+0x15e/0x1ac
[ 108.650720] [<80118ba8>] fei_kprobe_handler+0x28/0x5c
[ 108.651519] [<80754110>] kprobe_breakpoint_handler+0x144/0x1cc
[ 108.652289] [<8005d6da>] trap_c+0x8e/0x110
[ 108.652816] [<8005ce8c>] csky_trap+0x5c/0x70
-sh: can't fork: Invalid argument
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
secure_computing() is called first in syscall_trace_enter() so that
a system call will be aborted quickly without doing succeeding syscall
tracing if seccomp rules want to deny that system call.
TODO:
- Update https://github.com/seccomp/libseccomp csky support
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Cc: Arnd Bergmann <arnd@arndb.de>
|
|
All architectures support copy_thread_tls() now, so remove the legacy
copy_thread() function and the HAVE_COPY_THREAD_TLS config option. Everyone
uses the same process creation calling convention based on
copy_thread_tls() and struct kernel_clone_args. This will make it easier to
maintain the core process creation code under kernel/, simplifies the
callpaths and makes the identical for all architectures.
Cc: linux-arch@vger.kernel.org
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Acked-by: Greentime Hu <green.hu@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|
The implementation of show_stack will panic with wrong fp:
addr = *fp++;
because the fp isn't checked properly.
The current implementations of show_stack, wchan and stack_trace
haven't been designed properly, so just deprecate them.
This patch is a reference to riscv's way, all codes are modified from
arm's. The patch is passed with:
- cat /proc/<pid>/stack
- cat /proc/<pid>/wchan
- echo c > /proc/sysrq-trigger
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
|