Age | Commit message (Collapse) | Author |
|
After switching to filesystem context support, we no longer need to
handle mount options string when chasing dfs referrals. Now, we set
the new values directly into smb3_fs_context.
Start working on a separate source file to handle most dfs related
mount functions as connect.c has already became too big. The
remaining functions will be moved gradually in follow-up patches.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Use filesystem context support to handle dfs links.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
To work around some Window servers that return
STATUS_OBJECT_NAME_INVALID on query infos under DFS namespaces that
contain non-ASCII characters, we started checking for -ENOENT on every
file open, and if so, then send additional requests to figure out
whether it is a DFS link or not. It means that all those requests
will be sent to every non-existing file.
So, in order to reduce the number of roundtrips, check earlier whether
status code is STATUS_OBJECT_NAME_INVALID and tcon supports dfs, and
if so, then map -ENOENT to -EREMOTE so mount or automount will take
care of chasing the DFS link -- if it isn't an DFS link, then -ENOENT
will be returned appropriately.
Before patch
SMB2 438 Create Request File: ada.test\dfs\foo;GetInfo Request...
SMB2 310 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;...
SMB2 228 Ioctl Request FSCTL_DFS_GET_REFERRALS, File: \ada.test\dfs\foo
SMB2 143 Ioctl Response, Error: STATUS_OBJECT_PATH_NOT_FOUND
SMB2 438 Create Request File: ada.test\dfs\foo;GetInfo Request...
SMB2 310 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;...
SMB2 228 Ioctl Request FSCTL_DFS_GET_REFERRALS, File: \ada.test\dfs\foo
SMB2 143 Ioctl Response, Error: STATUS_OBJECT_PATH_NOT_FOUND
After patch
SMB2 438 Create Request File: ada.test\dfs\foo;GetInfo Request...
SMB2 310 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;...
SMB2 438 Create Request File: ada.test\dfs\foo;GetInfo Request...
SMB2 310 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;...
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
cifs_tcon::status wasn't correctly updated to TID_GOOD after
establishing initial IPC connection thus staying at TID_NEW as long as
it wasn't reconnected.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
cifs_tcon::status wasn't correctly updated to TID_GOOD after initial
tree connect thus staying at TID_NEW as long as it was connected.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull cifs client updates from Steve French:
- SMB3.1.1 POSIX Extensions fixes
- remove use of generic_writepages() and ->cifs_writepage(), in favor
of ->cifs_writepages() and ->migrate_folio()
- memory management fixes
- mount parm parsing fixes
- minor cleanup fixes
* tag '6.2-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Remove duplicated include in cifsglob.h
cifs: fix oops during encryption
cifs: print warning when conflicting soft vs. hard mount options specified
cifs: fix missing display of three mount options
cifs: fix various whitespace errors in headers
cifs: minor cleanup of some headers
cifs: skip alloc when request has no pages
cifs: remove ->writepage
cifs: stop using generic_writepages
cifs: wire up >migrate_folio
cifs: Parse owner/group for stat in smb311 posix extensions
cifs: Add "extbuf" and "extbuflen" args to smb2_compound_op()
Fix path in cifs/usage.rst
|
|
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on (see merge
commit 382b5b87a97d: "Fix a number of issues with MTE, such as
races on the tags being initialised vs the PG_mte_tagged flag as
well as the lack of support for VM_SHARED when KVM is involved.
Patches from Catalin Marinas and Peter Collingbourne").
- Merge the pKVM shadow vcpu state tracking that allows the
hypervisor to have its own view of a vcpu, keeping that state
private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB
pages only) as a prefix of the oncoming support for 4kB and 16kB
pages.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
s390:
- Second batch of the lazy destroy patches
- First batch of KVM changes for kernel virtual != physical address
support
- Removal of a unused function
x86:
- Allow compiling out SMM support
- Cleanup and documentation of SMM state save area format
- Preserve interrupt shadow in SMM state save area
- Respond to generic signals during slow page faults
- Fixes and optimizations for the non-executable huge page errata
fix.
- Reprogram all performance counters on PMU filter change
- Cleanups to Hyper-V emulation and tests
- Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
guest running on top of a L1 Hyper-V hypervisor)
- Advertise several new Intel features
- x86 Xen-for-KVM:
- Allow the Xen runstate information to cross a page boundary
- Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
- Add support for 32-bit guests in SCHEDOP_poll
- Notable x86 fixes and cleanups:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
a few years back when eliminating unnecessary barriers when
switching between vmcs01 and vmcs02.
- Clean up vmread_error_trampoline() to make it more obvious that
params must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL
irrespective of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM
incorrectly thinking a guest needs TSC scaling when running on a
CPU with a constant TSC, but no hardware-enumerated TSC
frequency.
- Advertise (on AMD) that the SMM_CTL MSR is not supported
- Remove unnecessary exports
Generic:
- Support for responding to signals during page faults; introduces
new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
Selftests:
- Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
- Fix build errors that occur in certain setups (unsure exactly what
is unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
- Introduce actual atomics for clear/set_bit() in selftests
- Add support for pinning vCPUs in dirty_log_perf_test.
- Rename the so called "perf_util" framework to "memstress".
- Add a lightweight psuedo RNG for guest use, and use it to randomize
the access pattern and write vs. read percentage in the memstress
tests.
- Add a common ucall implementation; code dedup and pre-work for
running SEV (and beyond) guests in selftests.
- Provide a common constructor and arch hook, which will eventually
be used by x86 to automatically select the right hypercall (AMD vs.
Intel).
- A bunch of added/enabled/fixed selftests for ARM64, covering
memslots, breakpoints, stage-2 faults and access tracking.
- x86-specific selftest changes:
- Clean up x86's page table management.
- Clean up and enhance the "smaller maxphyaddr" test, and add a
related test to cover generic emulation failure.
- Clean up the nEPT support checks.
- Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
- Fix an ordering issue in the AMX test introduced by recent
conversions to use kvm_cpu_has(), and harden the code to guard
against similar bugs in the future. Anything that tiggers
caching of KVM's supported CPUID, kvm_cpu_has() in this case,
effectively hides opt-in XSAVE features if the caching occurs
before the test opts in via prctl().
Documentation:
- Remove deleted ioctls from documentation
- Clean up the docs for the x86 MSR filter.
- Various fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
KVM: x86: Add proper ReST tables for userspace MSR exits/flags
KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
KVM: arm64: selftests: Align VA space allocator with TTBR0
KVM: arm64: Fix benign bug with incorrect use of VA_BITS
KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
KVM: x86: Advertise that the SMM_CTL MSR is not supported
KVM: x86: remove unnecessary exports
KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
tools: KVM: selftests: Convert clear/set_bit() to actual atomics
tools: Drop "atomic_" prefix from atomic test_and_set_bit()
tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
perf tools: Use dedicated non-atomic clear/set bit helpers
tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
KVM: arm64: selftests: Enable single-step without a "full" ucall()
KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
KVM: Remove stale comment about KVM_REQ_UNHALT
KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
KVM: Reference to kvm_userspace_memory_region in doc and comments
KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
...
|
|
Pull ksmbd updates from Steve French:
"Six ksmbd server fixes"
* tag '6.2-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: Convert to use sysfs_emit()/sysfs_emit_at() APIs
ksmbd: Fix resource leak in smb2_lock()
ksmbd: Fix resource leak in ksmbd_session_rpc_open()
ksmbd: replace one-element arrays with flexible-array members
ksmbd: use F_SETLK when unlocking a file
ksmbd: set SMB2_SESSION_FLAG_ENCRYPT_DATA when enforcing data encryption for this share
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added two features: F2FS_IOC_START_ATOMIC_REPLACE
and a per-block age-based extent cache.
F2FS_IOC_START_ATOMIC_REPLACE is a variant of the previous atomic
write feature which guarantees a per-file atomicity. It would be more
efficient than AtomicFile implementation in Android framework.
The per-block age-based extent cache implements another type of extent
cache in memory which keeps the per-block age in a file, so that block
allocator could split the hot and cold data blocks more accurately.
Enhancements:
- introduce F2FS_IOC_START_ATOMIC_REPLACE
- refactor extent_cache to add a new per-block-age-based extent cache support
- introduce discard_urgent_util, gc_mode, max_ordered_discard sysfs knobs
- add proc entry to show discard_plist info
- optimize iteration over sparse directories
- add barrier mount option
Bug fixes:
- avoid victim selection from previous victim section
- fix to enable compress for newly created file if extension matches
- set zstd compress level correctly
- initialize locks early in f2fs_fill_super() to fix bugs reported by syzbot
- correct i_size change for atomic writes
- allow to read node block after shutdown
- allow to set compression for inlined file
- fix gc mode when gc_urgent_high_remaining is 1
- should put a page when checking the summary info
Minor fixes and various clean-ups in GC, discard, debugfs, sysfs, and
doc"
* tag 'f2fs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits)
f2fs: reset wait_ms to default if any of the victims have been selected
f2fs: fix some format WARNING in debug.c and sysfs.c
f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super()
f2fs: fix iostat parameter for discard
f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache
f2fs: add block_age-based extent cache
f2fs: allocate the extent_cache by default
f2fs: refactor extent_cache to support for read and more
f2fs: remove unnecessary __init_extent_tree
f2fs: move internal functions into extent_cache.c
f2fs: specify extent cache for read explicitly
f2fs: introduce f2fs_is_readonly() for readability
f2fs: remove F2FS_SET_FEATURE() and F2FS_CLEAR_FEATURE() macro
f2fs: do some cleanup for f2fs module init
MAINTAINERS: Add f2fs bug tracker link
f2fs: remove the unused flush argument to change_curseg
f2fs: open code allocate_segment_by_default
f2fs: remove struct segment_allocation default_salloc_ops
f2fs: introduce discard_urgent_util sysfs node
f2fs: define MIN_DISCARD_GRANULARITY macro
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull kernel hardening updates from Kees Cook:
- Convert flexible array members, fix -Wstringop-overflow warnings, and
fix KCFI function type mismatches that went ignored by maintainers
(Gustavo A. R. Silva, Nathan Chancellor, Kees Cook)
- Remove the remaining side-effect users of ksize() by converting
dma-buf, btrfs, and coredump to using kmalloc_size_roundup(), add
more __alloc_size attributes, and introduce full testing of all
allocator functions. Finally remove the ksize() side-effect so that
each allocation-aware checker can finally behave without exceptions
- Introduce oops_limit (default 10,000) and warn_limit (default off) to
provide greater granularity of control for panic_on_oops and
panic_on_warn (Jann Horn, Kees Cook)
- Introduce overflows_type() and castable_to_type() helpers for cleaner
overflow checking
- Improve code generation for strscpy() and update str*() kern-doc
- Convert strscpy and sigphash tests to KUnit, and expand memcpy tests
- Always use a non-NULL argument for prepare_kernel_cred()
- Disable structleak plugin in FORTIFY KUnit test (Anders Roxell)
- Adjust orphan linker section checking to respect CONFIG_WERROR (Xin
Li)
- Make sure siginfo is cleared for forced SIGKILL (haifeng.xu)
- Fix um vs FORTIFY warnings for always-NULL arguments
* tag 'hardening-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (31 commits)
ksmbd: replace one-element arrays with flexible-array members
hpet: Replace one-element array with flexible-array member
um: virt-pci: Avoid GCC non-NULL warning
signal: Initialize the info in ksignal
lib: fortify_kunit: build without structleak plugin
panic: Expose "warn_count" to sysfs
panic: Introduce warn_limit
panic: Consolidate open-coded panic_on_warn checks
exit: Allow oops_limit to be disabled
exit: Expose "oops_count" to sysfs
exit: Put an upper limit on how often we can oops
panic: Separate sysctl logic from CONFIG_SMP
mm/pgtable: Fix multiple -Wstringop-overflow warnings
mm: Make ksize() a reporting-only function
kunit/fortify: Validate __alloc_size attribute results
drm/sti: Fix return type of sti_{dvo,hda,hdmi}_connector_mode_valid()
drm/fsl-dcu: Fix return type of fsl_dcu_drm_connector_mode_valid()
driver core: Add __alloc_size hint to devm allocators
overflow: Introduce overflows_type() and castable_to_type()
coredump: Proactively round up to kmalloc bucket size
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux
Pull orangefs updates from Mike Marshall:
- fix problems with memory leaks on exit in sysfs and debufs (Zhang)
- remove an unused variable and an unneeded assignment (Colin)
* tag 'for-linus-6.2-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Fix kmemleak in orangefs_{kernel,client}_debug_init()
orangefs: Fix kmemleak in orangefs_sysfs_init()
orangefs: Fix kmemleak in orangefs_prepare_debugfs_help_string()
orangefs: Fix sysfs not cleanup when dev init failed
orangefs: remove redundant assignment to variable buffer_index
orangefs: remove variable i
|
|
Pull cph update from Ilya Dryomov:
"A fix to facilitate prompt cap releases on async creates from Xiubo.
This should address sporadic "client isn't responding to mclientcaps
(revoke) ..." warnings and potential associated MDS hangs"
* tag 'ceph-for-6.2-rc1' of https://github.com/ceph/ceph-client:
ceph: try to check caps immediately after async creating finishes
ceph: remove useless session parameter for check_caps()
|
|
Pull XFS updates from Darrick Wong:
"The highlight of this is a batch of fixes for the online metadata
checking code as we start the loooong march towards merging online
repair. I aim to merge that in time for the 2023 LTS.
There are also a large number of data corruption and race condition
fixes in this patchset. Most notably fixed are write() calls to
unwritten extents racing with writeback, which required some late(r
than I prefer) code changes to iomap to support the necessary
revalidations. I don't really like iomap changes going in past -rc4,
but Dave and I have been working on it long enough that I chose to
push it for 6.2 anyway.
There are also a number of other subtle problems fixed, including the
log racing with inode writeback to write inodes with incorrect link
count to disk; file data mapping corruptions as a result of incorrect
lock cycling when attaching dquots; refcount metadata corruption if
one actually manages to share a block 2^32 times; and the log
clobbering cow staging extents if they were formerly metadata blocks.
Summary:
- Fix a race condition w.r.t. percpu inode free counters
- Fix a broken error return in xfs_remove
- Print FS UUID at mount/unmount time
- Numerous fixes to the online fsck code
- Fix inode locking inconsistency problems when dealing with realtime
metadata files
- Actually merge pull requests so that we capture the cover letter
contents
- Fix a race between rebuilding VFS inode state and the AIL flushing
inodes that could cause corrupt inodes to be written to the
filesystem
- Fix a data corruption problem resulting from a write() to an
unwritten extent racing with writeback started on behalf of memory
reclaim changing the extent state
- Add debugging knobs so that we can test iomap invalidation
- Fix the blockdev pagecache contents being stale after unmounting
the filesystem, leading to spurious xfs_db errors and corrupt
metadumps
- Fix a file mapping corruption bug due to ilock cycling when
attaching dquots to a file during delalloc reservation
- Fix a refcount btree corruption problem due to the refcount
adjustment code not handling MAXREFCOUNT correctly, resulting in
unnecessary record splits
- Fix COW staging extent alloctions not being classified as USERDATA,
which results in filestreams being ignored and possible data
corruption if the allocation was filled from the AGFL and the block
buffer is still being tracked in the AIL
- Fix new duplicated includes
- Fix a race between the dquot shrinker and dquot freeing that could
cause a UAF"
* tag 'xfs-6.2-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (50 commits)
xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
xfs: Remove duplicated include in xfs_iomap.c
xfs: invalidate xfs_bufs when allocating cow extents
xfs: get rid of assert from xfs_btree_islastblock
xfs: estimate post-merge refcounts correctly
xfs: hoist refcount record merge predicates
xfs: fix super block buf log item UAF during force shutdown
xfs: wait iclog complete before tearing down AIL
xfs: attach dquots to inode before reading data/cow fork mappings
xfs: shut up -Wuninitialized in xfsaild_push
xfs: use memcpy, not strncpy, to format the attr prefix during listxattr
xfs: invalidate block device page cache during unmount
xfs: add debug knob to slow down write for fun
xfs: add debug knob to slow down writeback for fun
xfs: drop write error injection is unfixable, remove it
xfs: use iomap_valid method to detect stale cached iomaps
iomap: write iomap validity checks
xfs: xfs_bmap_punch_delalloc_range() should take a byte range
iomap: buffered write failure should not truncate the page cache
xfs,iomap: move delalloc punching to iomap
...
|
|
./fs/cifs/cifsglob.h: linux/scatterlist.h is included more than once.
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3459
Fixes: f7f291e14dde ("cifs: fix oops during encryption")
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- More userfaultfs work from Peter Xu
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying
- Some filemap cleanups from Vishal Moola
- David Hildenbrand added the ability to selftest anon memory COW
handling
- Some cpuset simplifications from Liu Shixin
- Addition of vmalloc tracing support by Uladzislau Rezki
- Some pagecache folioifications and simplifications from Matthew
Wilcox
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
it
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword.
This series should have been in the non-MM tree, my bad
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages
- Peter Xu utilized the PTE marker code for handling swapin errors
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand
- zram support for multiple compression streams from Sergey Senozhatsky
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations
- Vishal Moola removed the try_to_release_page() wrapper
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range()
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages()
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines
- Many singleton patches, as usual
* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
mm: mmu_gather: allow more than one batch of delayed rmaps
mm: fix typo in struct pglist_data code comment
kmsan: fix memcpy tests
mm: add cond_resched() in swapin_walk_pmd_entry()
mm: do not show fs mm pc for VM_LOCKONFAULT pages
selftests/vm: ksm_functional_tests: fixes for 32bit
selftests/vm: cow: fix compile warning on 32bit
selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
mm,thp,rmap: fix races between updates of subpages_mapcount
mm: memcg: fix swapcached stat accounting
mm: add nodes= arg to memory.reclaim
mm: disable top-tier fallback to reclaim on proactive reclaim
selftests: cgroup: make sure reclaim target memcg is unprotected
selftests: cgroup: refactor proactive reclaim code to reclaim_until()
mm: memcg: fix stale protection of reclaim target memcg
mm/mmap: properly unaccount memory on mas_preallocate() failure
omfs: remove ->writepage
jfs: remove ->writepage
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"Another fairly sizable pull request, by EFI subsystem standards.
Most of the work was done by me, some of it in collaboration with the
distro and bootloader folks (GRUB, systemd-boot), where the main focus
has been on removing pointless per-arch differences in the way EFI
boots a Linux kernel.
- Refactor the zboot code so that it incorporates all the EFI stub
logic, rather than calling the decompressed kernel as a EFI app.
- Add support for initrd= command line option to x86 mixed mode.
- Allow initrd= to be used with arbitrary EFI accessible file systems
instead of just the one the kernel itself was loaded from.
- Move some x86-only handling and manipulation of the EFI memory map
into arch/x86, as it is not used anywhere else.
- More flexible handling of any random seeds provided by the boot
environment (i.e., systemd-boot) so that it becomes available much
earlier during the boot.
- Allow improved arch-agnostic EFI support in loaders, by setting a
uniform baseline of supported features, and adding a generic magic
number to the DOS/PE header. This should allow loaders such as GRUB
or systemd-boot to reduce the amount of arch-specific handling
substantially.
- (arm64) Run EFI runtime services from a dedicated stack, and use it
to recover from synchronous exceptions that might occur in the
firmware code.
- (arm64) Ensure that we don't allocate memory outside of the 48-bit
addressable physical range.
- Make EFI pstore record size configurable
- Add support for decoding CXL specific CPER records"
* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
arm64: efi: Recover from synchronous exceptions occurring in firmware
arm64: efi: Execute runtime services from a dedicated stack
arm64: efi: Limit allocations to 48-bit addressable physical region
efi: Put Linux specific magic number in the DOS header
efi: libstub: Always enable initrd command line loader and bump version
efi: stub: use random seed from EFI variable
efi: vars: prohibit reading random seed variables
efi: random: combine bootloader provided RNG seed with RNG protocol output
efi/cper, cxl: Decode CXL Error Log
efi/cper, cxl: Decode CXL Protocol Error Section
efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
efi: x86: Move EFI runtime map sysfs code to arch/x86
efi: runtime-maps: Clarify purpose and enable by default for kexec
efi: pstore: Add module parameter for setting the record size
efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
efi: memmap: Move manipulation routines into x86 arch tree
efi: memmap: Move EFI fake memmap support into x86 arch tree
efi: libstub: Undeprecate the command line initrd loader
efi: libstub: Add mixed mode support to command line initrd loader
efi: libstub: Permit mixed mode return types other than efi_status_t
...
|
|
Pull block updates from Jens Axboe:
- NVMe pull requests via Christoph:
- Support some passthrough commands without CAP_SYS_ADMIN (Kanchan
Joshi)
- Refactor PCIe probing and reset (Christoph Hellwig)
- Various fabrics authentication fixes and improvements (Sagi
Grimberg)
- Avoid fallback to sequential scan due to transient issues (Uday
Shankar)
- Implement support for the DEAC bit in Write Zeroes (Christoph
Hellwig)
- Allow overriding the IEEE OUI and firmware revision in configfs
for nvmet (Aleksandr Miloserdov)
- Force reconnect when number of queue changes in nvmet (Daniel
Wagner)
- Minor fixes and improvements (Uros Bizjak, Joel Granados, Sagi
Grimberg, Christoph Hellwig, Christophe JAILLET)
- Fix and cleanup nvme-fc req allocation (Chaitanya Kulkarni)
- Use the common tagset helpers in nvme-pci driver (Christoph
Hellwig)
- Cleanup the nvme-pci removal path (Christoph Hellwig)
- Use kstrtobool() instead of strtobool (Christophe JAILLET)
- Allow unprivileged passthrough of Identify Controller (Joel
Granados)
- Support io stats on the mpath device (Sagi Grimberg)
- Minor nvmet cleanup (Sagi Grimberg)
- MD pull requests via Song:
- Code cleanups (Christoph)
- Various fixes
- Floppy pull request from Denis:
- Fix a memory leak in the init error path (Yuan)
- Series fixing some batch wakeup issues with sbitmap (Gabriel)
- Removal of the pktcdvd driver that was deprecated more than 5 years
ago, and subsequent removal of the devnode callback in struct
block_device_operations as no users are now left (Greg)
- Fix for partition read on an exclusively opened bdev (Jan)
- Series of elevator API cleanups (Jinlong, Christoph)
- Series of fixes and cleanups for blk-iocost (Kemeng)
- Series of fixes and cleanups for blk-throttle (Kemeng)
- Series adding concurrent support for sync queues in BFQ (Yu)
- Series bringing drbd a bit closer to the out-of-tree maintained
version (Christian, Joel, Lars, Philipp)
- Misc drbd fixes (Wang)
- blk-wbt fixes and tweaks for enable/disable (Yu)
- Fixes for mq-deadline for zoned devices (Damien)
- Add support for read-only and offline zones for null_blk
(Shin'ichiro)
- Series fixing the delayed holder tracking, as used by DM (Yu,
Christoph)
- Series enabling bio alloc caching for IRQ based IO (Pavel)
- Series enabling userspace peer-to-peer DMA (Logan)
- BFQ waker fixes (Khazhismel)
- Series fixing elevator refcount issues (Christoph, Jinlong)
- Series cleaning up references around queue destruction (Christoph)
- Series doing quiesce by tagset, enabling cleanups in drivers
(Christoph, Chao)
- Series untangling the queue kobject and queue references (Christoph)
- Misc fixes and cleanups (Bart, David, Dawei, Jinlong, Kemeng, Ye,
Yang, Waiman, Shin'ichiro, Randy, Pankaj, Christoph)
* tag 'for-6.2/block-2022-12-08' of git://git.kernel.dk/linux: (247 commits)
blktrace: Fix output non-blktrace event when blk_classic option enabled
block: sed-opal: Don't include <linux/kernel.h>
sed-opal: allow using IOC_OPAL_SAVE for locking too
blk-cgroup: Fix typo in comment
block: remove bio_set_op_attrs
nvmet: don't open-code NVME_NS_ATTR_RO enumeration
nvme-pci: use the tagset alloc/free helpers
nvme: add the Apple shared tag workaround to nvme_alloc_io_tag_set
nvme: only set reserved_tags in nvme_alloc_io_tag_set for fabrics controllers
nvme: consolidate setting the tagset flags
nvme: pass nr_maps explicitly to nvme_alloc_io_tag_set
block: bio_copy_data_iter
nvme-pci: split out a nvme_pci_ctrl_is_dead helper
nvme-pci: return early on ctrl state mismatch in nvme_reset_work
nvme-pci: rename nvme_disable_io_queues
nvme-pci: cleanup nvme_suspend_queue
nvme-pci: remove nvme_pci_disable
nvme-pci: remove nvme_disable_admin_queue
nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl
nvme: use nvme_wait_ready in nvme_shutdown_ctrl
...
|
|
Pull io_uring updates from Jens Axboe:
- Always ensure proper ordering in case of CQ ring overflow, which then
means we can remove some work-arounds for that (Dylan)
- Support completion batching for multishot, greatly increasing the
efficiency for those (Dylan)
- Flag epoll/eventfd wakeups done from io_uring, so that we can easily
tell if we're recursing into io_uring again.
Previously, this would have resulted in repeated multishot
notifications if we had a dependency there. That could happen if an
eventfd was registered as the ring eventfd, and we multishot polled
for events on it. Or if an io_uring fd was added to epoll, and
io_uring had a multishot request for the epoll fd.
Test cases here:
https://git.kernel.dk/cgit/liburing/commit/?id=919755a7d0096fda08fb6d65ac54ad8d0fe027cd
Previously these got terminated when the CQ ring eventually
overflowed, now it's handled gracefully (me).
- Tightening of the IOPOLL based completions (Pavel)
- Optimizations of the networking zero-copy paths (Pavel)
- Various tweaks and fixes (Dylan, Pavel)
* tag 'for-6.2/io_uring-2022-12-08' of git://git.kernel.dk/linux: (41 commits)
io_uring: keep unlock_post inlined in hot path
io_uring: don't use complete_post in kbuf
io_uring: spelling fix
io_uring: remove io_req_complete_post_tw
io_uring: allow multishot polled reqs to defer completion
io_uring: remove overflow param from io_post_aux_cqe
io_uring: add lockdep assertion in io_fill_cqe_aux
io_uring: make io_fill_cqe_aux static
io_uring: add io_aux_cqe which allows deferred completion
io_uring: allow defer completion for aux posted cqes
io_uring: defer all io_req_complete_failed
io_uring: always lock in io_apoll_task_func
io_uring: remove iopoll spinlock
io_uring: iopoll protect complete_post
io_uring: inline __io_req_complete_put()
io_uring: remove io_req_tw_post_queue
io_uring: use io_req_task_complete() in timeout
io_uring: hold locks for io_req_complete_failed
io_uring: add completion locking for iopoll
io_uring: kill io_cqring_ev_posted() and __io_cq_unlock_post()
...
|
|
Pull iomap update from Darrick Wong:
- Minor code cleanup to eliminate unnecessary bit shifting
* tag 'iomap-6.2-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: directly use logical block size
|
|
Pull vfs remap_range update from Darrick Wong:
- Make some minor adjustments to the remap range preparation function
to skip file updates when the request length is adjusted downwards to
zero.
* tag 'vfs-6.2-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
fs/remap_range: avoid spurious writeback on zero length request
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull simple-xattr updates from Christian Brauner:
"This ports the simple xattr infrastucture to rely on a simple rbtree
protected by a read-write lock instead of a linked list protected by a
spinlock.
A while ago we received reports about scaling issues for filesystems
using the simple xattr infrastructure that also support setting a
larger number of xattrs. Specifically, cgroups and tmpfs.
Both cgroupfs and tmpfs can be mounted by unprivileged users in
unprivileged containers and root in an unprivileged container can set
an unrestricted number of security.* xattrs and privileged users can
also set unlimited trusted.* xattrs. A few more words on further that
below. Other xattrs such as user.* are restricted for kernfs-based
instances to a fairly limited number.
As there are apparently users that have a fairly large number of
xattrs we should scale a bit better. Using a simple linked list
protected by a spinlock used for set, get, and list operations doesn't
scale well if users use a lot of xattrs even if it's not a crazy
number.
Let's switch to a simple rbtree protected by a rwlock. It scales way
better and gets rid of the perf issues some people reported. We
originally had fancier solutions even using an rcu+seqlock protected
rbtree but we had concerns about being to clever and also that
deletion from an rbtree with rcu+seqlock isn't entirely safe.
The rbtree plus rwlock is perfectly fine. By far the most common
operation is getting an xattr. While setting an xattr is not and
should be comparatively rare. And listxattr() often only happens when
copying xattrs between files or together with the contents to a new
file.
Holding a lock across listxattr() is unproblematic because it doesn't
list the values of xattrs. It can only be used to list the names of
all xattrs set on a file. And the number of xattr names that can be
listed with listxattr() is limited to XATTR_LIST_MAX aka 65536 bytes.
If a larger buffer is passed then vfs_listxattr() caps it to
XATTR_LIST_MAX and if more xattr names are found it will return
-E2BIG. In short, the maximum amount of memory that can be retrieved
via listxattr() is limited and thus listxattr() bounded.
Of course, the API is broken as documented on xattr(7) already. While
I have no idea how the xattr api ended up in this state we should
probably try to come up with something here at some point. An iterator
pattern similar to readdir() as an alternative to listxattr() or
something else.
Right now it is extremly strange that users can set millions of xattrs
but then can't use listxattr() to know which xattrs are actually set.
And it's really trivial to do:
for i in {1..1000000}; do setfattr -n security.$i -v $i ./file1; done
And around 5000 xattrs it's impossible to use listxattr() to figure
out which xattrs are actually set. So I have suggested that we try to
limit the number of xattrs for simple xattrs at least. But that's a
future patch and I don't consider it very urgent.
A bonus of this port to rbtree+rwlock is that we shrink the memory
consumption for users of the simple xattr infrastructure.
This also adds kernel documentation to all the functions"
* tag 'fs.xattr.simple.rework.rbtree.rwlock.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
xattr: use rbtree for simple_xattrs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull lsm updates from Paul Moore:
- Improve the error handling in the device cgroup such that memory
allocation failures when updating the access policy do not
potentially alter the policy.
- Some minor fixes to reiserfs to ensure that it properly releases
LSM-related xattr values.
- Update the security_socket_getpeersec_stream() LSM hook to take
sockptr_t values.
Previously the net/BPF folks updated the getsockopt code in the
network stack to leverage the sockptr_t type to make it easier to
pass both kernel and __user pointers, but unfortunately when they did
so they didn't convert the LSM hook.
While there was/is no immediate risk by not converting the LSM hook,
it seems like this is a mistake waiting to happen so this patch
proactively does the LSM hook conversion.
- Convert vfs_getxattr_alloc() to return an int instead of a ssize_t
and cleanup the callers. Internally the function was never going to
return anything larger than an int and the callers were doing some
very odd things casting the return value; this patch fixes all that
and helps bring a bit of sanity to vfs_getxattr_alloc() and its
callers.
- More verbose, and helpful, LSM debug output when the system is booted
with "lsm.debug" on the command line. There are examples in the
commit description, but the quick summary is that this patch provides
better information about which LSMs are enabled and the ordering in
which they are processed.
- General comment and kernel-doc fixes and cleanups.
* tag 'lsm-pr-20221212' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm:
lsm: Fix description of fs_context_parse_param
lsm: Add/fix return values in lsm_hooks.h and fix formatting
lsm: Clarify documentation of vm_enough_memory hook
reiserfs: Add missing calls to reiserfs_security_free()
lsm,fs: fix vfs_getxattr_alloc() return type and caller error paths
device_cgroup: Roll back to original exceptions after copy failure
LSM: Better reporting of actual LSMs at boot
lsm: make security_socket_getpeersec_stream() sockptr_t safe
audit: Fix some kernel-doc warnings
lsm: remove obsoleted comments for security hooks
fs: edit a comment made in bad taste
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This adds file truncation support to Landlock, contributed by Günther
Noack. As described by Günther [1], the goal of these patches is to
work towards a more complete coverage of file system operations that
are restrictable with Landlock.
The known set of currently unsupported file system operations in
Landlock is described at [2]. Out of the operations listed there,
truncate is the only one that modifies file contents, so these patches
should make it possible to prevent the direct modification of file
contents with Landlock.
The new LANDLOCK_ACCESS_FS_TRUNCATE access right covers both the
truncate(2) and ftruncate(2) families of syscalls, as well as open(2)
with the O_TRUNC flag. This includes usages of creat() in the case
where existing regular files are overwritten.
Additionally, this introduces a new Landlock security blob associated
with opened files, to track the available Landlock access rights at
the time of opening the file. This is in line with Unix's general
approach of checking the read and write permissions during open(), and
associating this previously checked authorization with the opened
file. An ongoing patch documents this use case [3].
In order to treat truncate(2) and ftruncate(2) calls differently in an
LSM hook, we split apart the existing security_path_truncate hook into
security_path_truncate (for truncation by path) and
security_file_truncate (for truncation of previously opened files)"
Link: https://lore.kernel.org/r/20221018182216.301684-1-gnoack3000@gmail.com [1]
Link: https://www.kernel.org/doc/html/v6.1/userspace-api/landlock.html#filesystem-flags [2]
Link: https://lore.kernel.org/r/20221209193813.972012-1-mic@digikod.net [3]
* tag 'landlock-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
samples/landlock: Document best-effort approach for LANDLOCK_ACCESS_FS_REFER
landlock: Document Landlock's file truncation support
samples/landlock: Extend sample tool to support LANDLOCK_ACCESS_FS_TRUNCATE
selftests/landlock: Test ftruncate on FDs created by memfd_create(2)
selftests/landlock: Test FD passing from restricted to unrestricted processes
selftests/landlock: Locally define __maybe_unused
selftests/landlock: Test open() and ftruncate() in multiple scenarios
selftests/landlock: Test file truncation support
landlock: Support file truncation
landlock: Document init_layer_masks() helper
landlock: Refactor check_access_path_dual() into is_access_to_paths_allowed()
security: Create file_truncate hook from path_truncate hook
|
|
git://git.infradead.org/users/hch/configfs
Pull configfs updates from Christoph Hellwig:
- fix a memory leak in configfs_create_dir (Chen Zhongjin)
- remove mentions of committable items that were implemented (Bartosz
Golaszewski)
* tag 'configfs-6.2-2022-12-13' of git://git.infradead.org/users/hch/configfs:
configfs: remove mentions of committable items
configfs: fix possible memory leak in configfs_create_dir()
|
|
Pull NFS client updates from Trond Myklebust
"Bugfixes:
- Fix NULL pointer dereference in the mount parser
- Fix memory stomp in decode_attr_security_label
- Fix credential leak in _nfs4_discover_trunking()
- Fix buffer leak in rpcrdma_req_create()
- Fix leaked socket in rpc_sockname()
- Fix deadlock between nfs4_open_recover_helper() and delegreturn
- Fix an Oops in nfs_d_automount()
- Fix potential race in nfs_call_unlink()
- Multiple fixes for the open context mode
- NFSv4.2 READ_PLUS fixes
- Fix a regression in which small rsize/wsize values are being
forbidden
- Fail client initialisation if the NFSv4.x state manager thread
can't run
- Avoid spurious warning of lost lock that is being unlocked.
- Ensure the initialisation of struct nfs4_label
Features and cleanups:
- Trigger the "ls -l" readdir heuristic sooner
- Clear the file access cache upon login to ensure supplementary
group info is in sync between the client and server
- pnfs: Fix up the logging of layout stateids
- NFSv4.2: Change the default KConfig value for READ_PLUS
- Use sysfs_emit() instead of scnprintf() where appropriate"
* tag 'nfs-for-6.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits)
NFSv4.2: Change the default KConfig value for READ_PLUS
NFSv4.x: Fail client initialisation if state manager thread can't run
fs: nfs: sysfs: use sysfs_emit() to instead of scnprintf()
NFS: use sysfs_emit() to instead of scnprintf()
NFS: Allow very small rsize & wsize again
NFSv4.2: Fix up READ_PLUS alignment
NFSv4.2: Set the correct size scratch buffer for decoding READ_PLUS
SUNRPC: Fix missing release socket in rpc_sockname()
xprtrdma: Fix regbuf data not freed in rpcrdma_req_create()
NFS: avoid spurious warning of lost lock that is being unlocked.
nfs: fix possible null-ptr-deref when parsing param
NFSv4: check FMODE_EXEC from open context mode in nfs4_opendata_access()
NFS: make sure open context mode have FMODE_EXEC when file open for exec
NFS4.x/pnfs: Fix up logging of layout stateids
NFS: Fix a race in nfs_call_unlink()
NFS: Fix an Oops in nfs_d_automount()
NFSv4: Fix a deadlock between nfs4_open_recover_helper() and delegreturn
NFSv4: Fix a credential leak in _nfs4_discover_trunking()
NFS: Trigger the "ls -l" readdir heuristic sooner
NFSv4.2: Fix initialisation of struct nfs4_label
...
|
|
Pull nfsd updates from Chuck Lever:
"This release introduces support for the CB_RECALL_ANY operation. NFSD
can send this operation to request that clients return any delegations
they choose. The server uses this operation to handle low memory
scenarios or indicate to a client when that client has reached the
maximum number of delegations the server supports.
The NFSv4.2 READ_PLUS operation has been simplified temporarily whilst
support for sparse files in local filesystems and the VFS is improved.
Two major data structure fixes appear in this release:
- The nfs4_file hash table is replaced with a resizable hash table to
reduce the latency of NFSv4 OPEN operations.
- Reference counting in the NFSD filecache has been hardened against
races.
In furtherance of removing support for NFSv2 in a subsequent kernel
release, a new Kconfig option enables server-side support for NFSv2 to
be left out of a kernel build.
MAINTAINERS has been updated to indicate that changes to fs/exportfs
should go through the NFSD tree"
* tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (49 commits)
NFSD: Avoid clashing function prototypes
SUNRPC: Fix crasher in unwrap_integ_data()
SUNRPC: Make the svc_authenticate tracepoint conditional
NFSD: Use only RQ_DROPME to signal the need to drop a reply
SUNRPC: Clean up xdr_write_pages()
SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails
NFSD: add CB_RECALL_ANY tracepoints
NFSD: add delegation reaper to react to low memory condition
NFSD: add support for sending CB_RECALL_ANY
NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
trace: Relocate event helper files
NFSD: pass range end to vfs_fsync_range() instead of count
lockd: fix file selection in nlmsvc_cancel_blocked
lockd: ensure we use the correct file descriptor when unlocking
lockd: set missing fl_flags field when retrieving args
NFSD: Use struct_size() helper in alloc_session()
nfsd: return error if nfs4_setacl fails
lockd: set other missing fields when unlocking files
NFSD: Add an nfsd_file_fsync tracepoint
sunrpc: svc: Remove an unused static function svc_ungetu32()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"This round there are a lot of cleanups and moved code so the diffstat
looks huge, otherwise there are some nice performance improvements and
an update to raid56 reliability.
User visible features:
- raid56 reliability vs performance trade off:
- fix destructive RMW for raid5 data (raid6 still needs work): do
full checksum verification for all data during RMW cycle, this
should prevent rewriting potentially corrupted data without
notice
- stripes are cached in memory which should reduce the performance
impact but still can hurt some workloads
- checksums are verified after repair again
- this is the last option without introducing additional features
(write intent bitmap, journal, another tree), the extra checksum
read/verification was supposed to be avoided by the original
implementation exactly for performance reasons but that caused
all the reliability problems
- discard=async by default for devices that support it
- implement emergency flush reserve to avoid almost all unnecessary
transaction aborts due to ENOSPC in cases where there are too many
delayed refs or delayed allocation
- skip block group synchronization if there's no change in used
bytes, can reduce transaction commit count for some workloads
Performance improvements:
- fiemap and lseek:
- overall speedup due to skipping unnecessary or duplicate
searches (-40% run time)
- cache some data structures and sharedness of extents (-30% run
time)
- send:
- faster backref resolution when finding clones
- cached leaf to root mapping for faster backref walking
- improved clone/sharing detection
- overall run time improvements (-70%)
Core:
- module initialization converted to a table of function pointers run
in a sequence
- preparation for fscrypt, extend passing file names across calls,
dir item can store encryption status
- raid56 updates:
- more accurate error tracking of sectors within stripe
- simplify recovery path and remove dedicated endio worker kthread
- simplify scrub call paths
- refactoring to support the extra data checksum verification
during RMW cycle
- tree block parentness checks consolidated and done at metadata read
time
- improved error handling
- cleanups:
- move a lot of code for better synchronization between kernel and
user space sources, split big files
- enum cleanups
- GFP flag cleanups
- header file cleanups, prototypes, dependencies
- redundant parameter cleanups
- inline extent handling simplifications
- inode parameter conversion
- data structure cleanups, reductions, renames, merges"
* tag 'for-6.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (249 commits)
btrfs: print transaction aborted messages with an error level
btrfs: sync some cleanups from progs into uapi/btrfs.h
btrfs: do not BUG_ON() on ENOMEM when dropping extent items for a range
btrfs: fix extent map use-after-free when handling missing device in read_one_chunk
btrfs: remove outdated logic from overwrite_item() and add assertion
btrfs: unify overwrite_item() and do_overwrite_item()
btrfs: replace strncpy() with strscpy()
btrfs: fix uninitialized variable in find_first_clear_extent_bit
btrfs: fix uninitialized parent in insert_state
btrfs: add might_sleep() annotations
btrfs: add stack helpers for a few btrfs items
btrfs: add nr_global_roots to the super block definition
btrfs: remove BTRFS_LEAF_DATA_OFFSET
btrfs: add helpers for manipulating leaf items and data
btrfs: add eb to btrfs_node_key_ptr_offset
btrfs: pass the extent buffer for the btrfs_item_nr helpers
btrfs: move the csum helpers into ctree.h
btrfs: move eb offset helpers into extent_io.h
btrfs: move file_extent_item helpers into file-item.h
btrfs: move leaf_data_end into ctree.c
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"These patches include the usual cleanups and minor fixes, the removal
of code that is no longer needed due to recent improvements, and
improvements to processing large volumes of messages during heavy
locking activity.
Summary:
- Misc code cleanup
- Fix a couple of socket handling bugs: a double release on an error
path and a data-ready race in an accept loop
- Remove code for resending dir-remove messages. This code is no
longer needed since the midcomms layer now ensures the messages are
resent if needed
- Add tracepoints for dlm messages
- Improve callback queueing by replacing the fixed array with a list
- Simplify the handling of a remove message followed by a lookup
message by sending both without releasing a spinlock in between
- Improve the concurrency of sending and receiving messages by
holding locks for a shorter time, and changing how workqueues are
used
- Remove old code for shutting down sockets, which is no longer
needed with the reliable connection handling that was recently
added"
* tag 'dlm-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: (37 commits)
fs: dlm: fix building without lockdep
fs: dlm: parallelize lowcomms socket handling
fs: dlm: don't init error value
fs: dlm: use saved sk_error_report()
fs: dlm: use sock2con without checking null
fs: dlm: remove dlm_node_addrs lookup list
fs: dlm: don't put dlm_local_addrs on heap
fs: dlm: cleanup listen sock handling
fs: dlm: remove socket shutdown handling
fs: dlm: use listen sock as dlm running indicator
fs: dlm: use list_first_entry_or_null
fs: dlm: remove twice INIT_WORK
fs: dlm: add midcomms init/start functions
fs: dlm: add dst nodeid for msg tracing
fs: dlm: rename seq to h_seq for msg tracing
fs: dlm: rename DLM_IFL_NEED_SCHED to DLM_IFL_CB_PENDING
fs: dlm: ast do WARN_ON_ONCE() on hotpath
fs: dlm: drop lkb ref in bug case
fs: dlm: avoid false-positive checker warning
fs: dlm: use WARN_ON_ONCE() instead of WARN_ON()
...
|
|
Pull jfs updates from David Kleikamp:
"Assorted JFS fixes for 6.2"
* tag 'jfs-6.2' of https://github.com/kleikamp/linux-shaggy:
jfs: makes diUnmount/diMount in jfs_mount_rw atomic
jfs: Fix a typo in function jfs_umount
fs: jfs: fix shift-out-of-bounds in dbDiscardAG
jfs: Fix fortify moan in symlink
jfs: remove redundant assignments to ipaimap and ipaimap2
jfs: remove unused declarations for jfs
fs/jfs/jfs_xattr.h: Fix spelling typo in comment
MAINTAINERS: git://github -> https://github.com for kleikamp
fs/jfs: replace ternary operator with min_t()
fs: jfs: fix shift-out-of-bounds in dbAllocAG
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf and ext2 fixes from Jan Kara:
- a couple of smaller cleanups and fixes for ext2
- fixes of a data corruption issues in udf when handling holes and
preallocation extents
- fixes and cleanups of several smaller issues in udf
- add maintainer entry for isofs
* tag 'fixes_for_v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix extending file within last block
udf: Discard preallocation before extending file with a hole
udf: Do not bother looking for prealloc extents if i_lenExtents matches i_size
udf: Fix preallocation discarding at indirect extent boundary
udf: Increase UDF_MAX_READ_VERSION to 0x0260
fs/ext2: Fix code indentation
ext2: unbugger ext2_empty_dir()
udf: remove ->writepage
ext2: remove ->writepage
ext2: Don't flush page immediately for DIRSYNC directories
ext2: Fix some kernel-doc warnings
maintainers: Add ISOFS entry
udf: Avoid double brelse() in udf_rename()
fs: udf: Optimize udf_free_in_core_inode and udf_find_fileset function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull xattr audit fix from Seth Forshee:
"This is a single patch to remove auditing of the capability check in
simple_xattr_list().
This check is done to check whether trusted xattrs should be included
by listxattr(2). SELinux will normally log a denial when capable() is
called and the task's SELinux context doesn't have the corresponding
capability permission allowed, which can end up spamming the log.
Since a failed check here cannot be used to infer malicious intent,
auditing is of no real value, and it makes sense to stop auditing the
capability check"
* tag 'fs.xattr.simple.noaudit.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: don't audit the capability check in simple_xattr_list()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull squashfs update from Seth Forshee:
"This is a simple patch to enable idmapped mounts for squashfs.
All functionality squashfs needs to support idmapped mounts is already
implemented in generic VFS code, so all that is needed is to set
FS_ALLOW_IDMAP in fs_flags"
* tag 'fs.idmapped.squashfs.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
squashfs: enable idmapped mounts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse update from Miklos Szeredi:
- Allow some write requests to proceed in parallel
- Fix a performance problem with allow_sys_admin_access
- Add a special kind of invalidation that doesn't immediately purge
submounts
- On revalidation treat the target of rename(RENAME_NOREPLACE) the same
as open(O_EXCL)
- Use type safe helpers for some mnt_userns transformations
* tag 'fuse-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: Rearrange fuse_allow_current_process checks
fuse: allow non-extending parallel direct writes on the same file
fuse: remove the unneeded result variable
fuse: port to vfs{g,u}id_t and associated helpers
fuse: Remove user_ns check for FUSE_DEV_IOC_CLONE
fuse: always revalidate rename target dentry
fuse: add "expire only" mode to FUSE_NOTIFY_INVAL_ENTRY
fs/fuse: Replace kmap() with kmap_local_page()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs update from Miklos Szeredi:
- Fix a couple of bugs found by syzbot
- Don't ingore some open flags set by fcntl(F_SETFL)
- Fix failure to create a hard link in certain cases
- Use type safe helpers for some mnt_userns transformations
- Improve performance of mount
- Misc cleanups
* tag 'ovl-update-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: Kconfig: Fix spelling mistake "undelying" -> "underlying"
ovl: use inode instead of dentry where possible
ovl: Add comment on upperredirect reassignment
ovl: use plain list filler in indexdir and workdir cleanup
ovl: do not reconnect upper index records in ovl_indexdir_cleanup()
ovl: fix comment typos
ovl: port to vfs{g,u}id_t and associated helpers
ovl: Use ovl mounter's fsuid and fsgid in ovl_link()
ovl: Use "buf" flexible array for memcpy() destination
ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags
ovl: fix use inode directly in rcu-walk mode
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"In this cycle, large folios are now enabled in the iomap/fscache mode
for uncompressed files first. In order to do that, we've also cleaned
up better interfaces between erofs and fscache, which are acked by
fscache/netfs folks and included in this pull request.
Other than that, there are random fixes around erofs over fscache and
crafted images by syzbot, minor cleanups and documentation updates.
Summary:
- Enable large folios for iomap/fscache mode
- Avoid sysfs warning due to mounting twice with the same fsid and
domain_id in fscache mode
- Refine fscache interface among erofs, fscache, and cachefiles
- Use kmap_local_page() only for metabuf
- Fixes around crafted images found by syzbot
- Minor cleanups and documentation updates"
* tag 'erofs-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: validate the extent length for uncompressed pclusters
erofs: fix missing unmap if z_erofs_get_extent_compressedlen() fails
erofs: Fix pcluster memleak when its block address is zero
erofs: use kmap_local_page() only for erofs_bread()
erofs: enable large folios for fscache mode
erofs: support large folios for fscache mode
erofs: switch to prepare_ondemand_read() in fscache mode
fscache,cachefiles: add prepare_ondemand_read() callback
erofs: clean up cached I/O strategies
erofs: update documentation
erofs: check the uniqueness of fsid in shared domain in advance
erofs: enable large folios for iomap mode
|
|
git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fsverity updates from Eric Biggers:
"The main change this cycle is to stop using the PG_error flag to track
verity failures, and instead just track failures at the bio level.
This follows a similar fscrypt change that went into 6.1, and it is a
step towards freeing up PG_error for other uses.
There's also one other small cleanup"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fsverity: simplify fsverity_get_digest()
fsverity: stop using PG_error to track error status
|
|
Pull fscrypt updates from Eric Biggers:
"This release adds SM4 encryption support, contributed by Tianjia
Zhang. SM4 is a Chinese block cipher that is an alternative to AES.
I recommend against using SM4, but (according to Tianjia) some people
are being required to use it. Since SM4 has been turning up in many
other places (crypto API, wireless, TLS, OpenSSL, ARMv8 CPUs, etc.),
it hasn't been very controversial, and some people have to use it, I
don't think it would be fair for me to reject this optional feature.
Besides the above, there are a couple cleanups"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: add additional documentation for SM4 support
fscrypt: remove unused Speck definitions
fscrypt: Add SM4 XTS/CTS symmetric algorithm support
blk-crypto: Add support for SM4-XTS blk crypto mode
fscrypt: add comment for fscrypt_valid_enc_modes_v1()
fscrypt: pass super_block to fscrypt_put_master_key_activeref()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"A large number of cleanups and bug fixes, with many of the bug fixes
found by Syzbot and fuzzing. (Many of the bug fixes involve less-used
ext4 features such as fast_commit, inline_data and bigalloc)
In addition, remove the writepage function for ext4, since the
medium-term plan is to remove ->writepage() entirely. (The VM doesn't
need or want writepage() for writeback, since it is fine with
->writepages() so long as ->migrate_folio() is implemented)"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
ext4: fix reserved cluster accounting in __es_remove_extent()
ext4: fix inode leak in ext4_xattr_inode_create() on an error path
ext4: allocate extended attribute value in vmalloc area
ext4: avoid unaccounted block allocation when expanding inode
ext4: initialize quota before expanding inode in setproject ioctl
ext4: stop providing .writepage hook
mm: export buffer_migrate_folio_norefs()
ext4: switch to using write_cache_pages() for data=journal writeout
jbd2: switch jbd2_submit_inode_data() to use fs-provided hook for data writeout
ext4: switch to using ext4_do_writepages() for ordered data writeout
ext4: move percpu_rwsem protection into ext4_writepages()
ext4: provide ext4_do_writepages()
ext4: add support for writepages calls that cannot map blocks
ext4: drop pointless IO submission from ext4_bio_write_page()
ext4: remove nr_submitted from ext4_bio_write_page()
ext4: move keep_towrite handling to ext4_bio_write_page()
ext4: handle redirtying in ext4_bio_write_page()
ext4: fix kernel BUG in 'ext4_write_inline_data_end()'
ext4: make ext4_mb_initialize_context return void
ext4: fix deadlock due to mbcache entry corruption
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull idmapping updates from Christian Brauner:
"Last cycle we've already made the interaction with idmapped mounts
more robust and type safe by introducing the vfs{g,u}id_t type. This
cycle we concluded the conversion and removed the legacy helpers.
Currently we still pass around the plain namespace that was attached
to a mount. This is in general pretty convenient but it makes it easy
to conflate namespaces that are relevant on the filesystem - with
namespaces that are relevent on the mount level. Especially for
filesystem developers without detailed knowledge in this area this can
be a potential source for bugs.
Instead of passing the plain namespace we introduce a dedicated type
struct mnt_idmap and replace the pointer with a pointer to a struct
mnt_idmap. There are no semantic or size changes for the mount struct
caused by this.
We then start converting all places aware of idmapped mounts to rely
on struct mnt_idmap. Once the conversion is done all helpers down to
the really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take
a struct mnt_idmap argument instead of two namespace arguments. This
way it becomes impossible to conflate the two removing and thus
eliminating the possibility of any bugs. Fwiw, I fixed some issues in
that area a while ago in ntfs3 and ksmbd in the past. Afterwards only
low-level code can ultimately use the associated namespace for any
permission checks. Even most of the vfs can be completely obivious
about this ultimately and filesystems will never interact with it in
any form in the future.
A struct mnt_idmap currently encompasses a simple refcount and pointer
to the relevant namespace the mount is idmapped to. If a mount isn't
idmapped then it will point to a static nop_mnt_idmap and if it
doesn't that it is idmapped. As usual there are no allocations or
anything happening for non-idmapped mounts. Everthing is carefully
written to be a nop for non-idmapped mounts as has always been the
case.
If an idmapped mount is created a struct mnt_idmap is allocated and a
reference taken on the relevant namespace. Each mount that gets
idmapped or inherits the idmap simply bumps the reference count on
struct mnt_idmap. Just a reminder that we only allow a mount to change
it's idmapping a single time and only if it hasn't already been
attached to the filesystems and has no active writers.
The actual changes are fairly straightforward but this will have huge
benefits for maintenance and security in the long run even if it
causes some churn.
Note that this also makes it possible to extend struct mount_idmap in
the future. For example, it would be possible to place the namespace
pointer in an anonymous union together with an idmapping struct. This
would allow us to expose an api to userspace that would let it specify
idmappings directly instead of having to go through the detour of
setting up namespaces at all"
* tag 'fs.idmapped.mnt_idmap.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
acl: conver higher-level helpers to rely on mnt_idmap
fs: introduce dedicated idmap type for mounts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfsuid updates from Christian Brauner:
"Last cycle we introduced the vfs{g,u}id_t types and associated helpers
to gain type safety when dealing with idmapped mounts. That initial
work already converted a lot of places over but there were still some
left,
This converts all remaining places that still make use of non-type
safe idmapping helpers to rely on the new type safe vfs{g,u}id based
helpers.
Afterwards it removes all the old non-type safe helpers"
* tag 'fs.vfsuid.conversion.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: remove unused idmapping helpers
ovl: port to vfs{g,u}id_t and associated helpers
fuse: port to vfs{g,u}id_t and associated helpers
ima: use type safe idmapping helpers
apparmor: use type safe idmapping helpers
caps: use type safe idmapping helpers
fs: use type safe idmapping helpers
mnt_idmapping: add missing helpers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull setgid inheritance updates from Christian Brauner:
"This contains the work to make setgid inheritance consistent between
modifying a file and when changing ownership or mode as this has been
a repeated source of very subtle bugs. The gist is that we perform the
same permission checks in the write path as we do in the ownership and
mode changing paths after this series where we're currently doing
different things.
We've already made setgid inheritance a lot more consistent and
reliable in the last releases by moving setgid stripping from the
individual filesystems up into the vfs. This aims to make the logic
even more consistent and easier to understand and also to fix
long-standing overlayfs setgid inheritance bugs. Miklos was nice
enough to just let me carry the trivial overlayfs patches from Amir
too.
Below is a more detailed explanation how the current difference in
setgid handling lead to very subtle bugs exemplified via overlayfs
which is a victim of the current rules. I hope this explains why I
think taking the regression risk here is worth it.
A long while ago I found a few setgid inheritance bugs in overlayfs in
the write path in certain conditions. Amir recently picked this back
up in [1] and I jumped on board to fix this more generally.
On the surface all that overlayfs would need to fix setgid inheritance
would be to call file_remove_privs() or file_modified() but actually
that isn't enough because the setgid inheritance api is wildly
inconsistent in that area.
Before this pr setgid stripping in file_remove_privs()'s old
should_remove_suid() helper was inconsistent with other parts of the
vfs. Specifically, it only raises ATTR_KILL_SGID if the inode is
S_ISGID and S_IXGRP but not if the inode isn't in the caller's groups
and the caller isn't privileged over the inode although we require
this already in setattr_prepare() and setattr_copy() and so all
filesystem implement this requirement implicitly because they have to
use setattr_{prepare,copy}() anyway.
But the inconsistency shows up in setgid stripping bugs for overlayfs
in xfstests (e.g., generic/673, generic/683, generic/685, generic/686,
generic/687). For example, we test whether suid and setgid stripping
works correctly when performing various write-like operations as an
unprivileged user (fallocate, reflink, write, etc.):
echo "Test 1 - qa_user, non-exec file $verb"
setup_testfile
chmod a+rws $junk_file
commit_and_check "$qa_user" "$verb" 64k 64k
The test basically creates a file with 6666 permissions. While the
file has the S_ISUID and S_ISGID bits set it does not have the S_IXGRP
set.
On a regular filesystem like xfs what will happen is:
sys_fallocate()
-> vfs_fallocate()
-> xfs_file_fallocate()
-> file_modified()
-> __file_remove_privs()
-> dentry_needs_remove_privs()
-> should_remove_suid()
-> __remove_privs()
newattrs.ia_valid = ATTR_FORCE | kill;
-> notify_change()
-> setattr_copy()
In should_remove_suid() we can see that ATTR_KILL_SUID is raised
unconditionally because the file in the test has S_ISUID set.
But we also see that ATTR_KILL_SGID won't be set because while the
file is S_ISGID it is not S_IXGRP (see above) which is a condition for
ATTR_KILL_SGID being raised.
So by the time we call notify_change() we have attr->ia_valid set to
ATTR_KILL_SUID | ATTR_FORCE.
Now notify_change() sees that ATTR_KILL_SUID is set and does:
ia_valid = attr->ia_valid |= ATTR_MODE
attr->ia_mode = (inode->i_mode & ~S_ISUID);
which means that when we call setattr_copy() later we will definitely
update inode->i_mode. Note that attr->ia_mode still contains S_ISGID.
Now we call into the filesystem's ->setattr() inode operation which
will end up calling setattr_copy(). Since ATTR_MODE is set we will
hit:
if (ia_valid & ATTR_MODE) {
umode_t mode = attr->ia_mode;
vfsgid_t vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
if (!vfsgid_in_group_p(vfsgid) &&
!capable_wrt_inode_uidgid(mnt_userns, inode, CAP_FSETID))
mode &= ~S_ISGID;
inode->i_mode = mode;
}
and since the caller in the test is neither capable nor in the group
of the inode the S_ISGID bit is stripped.
But assume the file isn't suid then ATTR_KILL_SUID won't be raised
which has the consequence that neither the setgid nor the suid bits
are stripped even though it should be stripped because the inode isn't
in the caller's groups and the caller isn't privileged over the inode.
If overlayfs is in the mix things become a bit more complicated and
the bug shows up more clearly.
When e.g., ovl_setattr() is hit from ovl_fallocate()'s call to
file_remove_privs() then ATTR_KILL_SUID and ATTR_KILL_SGID might be
raised but because the check in notify_change() is questioning the
ATTR_KILL_SGID flag again by requiring S_IXGRP for it to be stripped
the S_ISGID bit isn't removed even though it should be stripped:
sys_fallocate()
-> vfs_fallocate()
-> ovl_fallocate()
-> file_remove_privs()
-> dentry_needs_remove_privs()
-> should_remove_suid()
-> __remove_privs()
newattrs.ia_valid = ATTR_FORCE | kill;
-> notify_change()
-> ovl_setattr()
/* TAKE ON MOUNTER'S CREDS */
-> ovl_do_notify_change()
-> notify_change()
/* GIVE UP MOUNTER'S CREDS */
/* TAKE ON MOUNTER'S CREDS */
-> vfs_fallocate()
-> xfs_file_fallocate()
-> file_modified()
-> __file_remove_privs()
-> dentry_needs_remove_privs()
-> should_remove_suid()
-> __remove_privs()
newattrs.ia_valid = attr_force | kill;
-> notify_change()
The fix for all of this is to make file_remove_privs()'s
should_remove_suid() helper perform the same checks as we already
require in setattr_prepare() and setattr_copy() and have
notify_change() not pointlessly requiring S_IXGRP again. It doesn't
make any sense in the first place because the caller must calculate
the flags via should_remove_suid() anyway which would raise
ATTR_KILL_SGID
Note that some xfstests will now fail as these patches will cause the
setgid bit to be lost in certain conditions for unprivileged users
modifying a setgid file when they would've been kept otherwise. I
think this risk is worth taking and I explained and mentioned this
multiple times on the list [2].
Enforcing the rules consistently across write operations and
chmod/chown will lead to losing the setgid bit in cases were it
might've been retained before.
While I've mentioned this a few times but it's worth repeating just to
make sure that this is understood. For the sake of maintainability,
consistency, and security this is a risk worth taking.
If we really see regressions for workloads the fix is to have special
setgid handling in the write path again with different semantics from
chmod/chown and possibly additional duct tape for overlayfs. I'll
update the relevant xfstests with if you should decide to merge this
second setgid cleanup.
Before that people should be aware that there might be failures for
fstests where unprivileged users modify a setgid file"
Link: https://lore.kernel.org/linux-fsdevel/20221003123040.900827-1-amir73il@gmail.com [1]
Link: https://lore.kernel.org/linux-fsdevel/20221122142010.zchf2jz2oymx55qi@wittgenstein [2]
* tag 'fs.ovl.setgid.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: use consistent setgid checks in is_sxid()
ovl: remove privs in ovl_fallocate()
ovl: remove privs in ovl_copyfile()
attr: use consistent sgid stripping checks
attr: add setattr_should_drop_sgid()
fs: move should_remove_suid()
attr: add in_group_or_capable()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull VFS acl updates from Christian Brauner:
"This contains the work that builds a dedicated vfs posix acl api.
The origins of this work trace back to v5.19 but it took quite a while
to understand the various filesystem specific implementations in
sufficient detail and also come up with an acceptable solution.
As we discussed and seen multiple times the current state of how posix
acls are handled isn't nice and comes with a lot of problems: The
current way of handling posix acls via the generic xattr api is error
prone, hard to maintain, and type unsafe for the vfs until we call
into the filesystem's dedicated get and set inode operations.
It is already the case that posix acls are special-cased to death all
the way through the vfs. There are an uncounted number of hacks that
operate on the uapi posix acl struct instead of the dedicated vfs
struct posix_acl. And the vfs must be involved in order to interpret
and fixup posix acls before storing them to the backing store, caching
them, reporting them to userspace, or for permission checking.
Currently a range of hacks and duct tape exist to make this work. As
with most things this is really no ones fault it's just something that
happened over time. But the code is hard to understand and difficult
to maintain and one is constantly at risk of introducing bugs and
regressions when having to touch it.
Instead of continuing to hack posix acls through the xattr handlers
this series builds a dedicated posix acl api solely around the get and
set inode operations.
Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl()
helpers must be used in order to interact with posix acls. They
operate directly on the vfs internal struct posix_acl instead of
abusing the uapi posix acl struct as we currently do. In the end this
removes all of the hackiness, makes the codepaths easier to maintain,
and gets us type safety.
This series passes the LTP and xfstests suites without any
regressions. For xfstests the following combinations were tested:
- xfs
- ext4
- btrfs
- overlayfs
- overlayfs on top of idmapped mounts
- orangefs
- (limited) cifs
There's more simplifications for posix acls that we can make in the
future if the basic api has made it.
A few implementation details:
- The series makes sure to retain exactly the same security and
integrity module permission checks. Especially for the integrity
modules this api is a win because right now they convert the uapi
posix acl struct passed to them via a void pointer into the vfs
struct posix_acl format to perform permission checking on the mode.
There's a new dedicated security hook for setting posix acls which
passes the vfs struct posix_acl not a void pointer. Basing checking
on the posix acl stored in the uapi format is really unreliable.
The vfs currently hacks around directly in the uapi struct storing
values that frankly the security and integrity modules can't
correctly interpret as evidenced by bugs we reported and fixed in
this area. It's not necessarily even their fault it's just that the
format we provide to them is sub optimal.
- Some filesystems like 9p and cifs need access to the dentry in
order to get and set posix acls which is why they either only
partially or not even at all implement get and set inode
operations. For example, cifs allows setxattr() and getxattr()
operations but doesn't allow permission checking based on posix
acls because it can't implement a get acl inode operation.
Thus, this patch series updates the set acl inode operation to take
a dentry instead of an inode argument. However, for the get acl
inode operation we can't do this as the old get acl method is
called in e.g., generic_permission() and inode_permission(). These
helpers in turn are called in various filesystem's permission inode
operation. So passing a dentry argument to the old get acl inode
operation would amount to passing a dentry to the permission inode
operation which we shouldn't and probably can't do.
So instead of extending the existing inode operation Christoph
suggested to add a new one. He also requested to ensure that the
get and set acl inode operation taking a dentry are consistently
named. So for this version the old get acl operation is renamed to
->get_inode_acl() and a new ->get_acl() inode operation taking a
dentry is added. With this we can give both 9p and cifs get and set
acl inode operations and in turn remove their complex custom posix
xattr handlers.
In the future I hope to get rid of the inode method duplication but
it isn't like we have never had this situation. Readdir is just one
example. And frankly, the overall gain in type safety and the more
pleasant api wise are simply too big of a benefit to not accept
this duplication for a while.
- We've done a full audit of every codepaths using variant of the
current generic xattr api to get and set posix acls and
surprisingly it isn't that many places. There's of course always a
chance that we might have missed some and if so I'm sure we'll find
them soon enough.
The crucial codepaths to be converted are obviously stacking
filesystems such as ecryptfs and overlayfs.
For a list of all callers currently using generic xattr api helpers
see [2] including comments whether they support posix acls or not.
- The old vfs generic posix acl infrastructure doesn't obey the
create and replace semantics promised on the setxattr(2) manpage.
This patch series doesn't address this. It really is something we
should revisit later though.
The patches are roughly organized as follows:
(1) Change existing set acl inode operation to take a dentry
argument (Intended to be a non-functional change)
(2) Rename existing get acl method (Intended to be a non-functional
change)
(3) Implement get and set acl inode operations for filesystems that
couldn't implement one before because of the missing dentry.
That's mostly 9p and cifs (Intended to be a non-functional
change)
(4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(),
and vfs_set_acl() including security and integrity hooks
(Intended to be a non-functional change)
(5) Implement get and set acl inode operations for stacking
filesystems (Intended to be a non-functional change)
(6) Switch posix acl handling in stacking filesystems to new posix
acl api now that all filesystems it can stack upon support it.
(7) Switch vfs to new posix acl api (semantical change)
(8) Remove all now unused helpers
(9) Additional regression fixes reported after we merged this into
linux-next
Thanks to Seth for a lot of good discussion around this and
encouragement and input from Christoph"
* tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits)
posix_acl: Fix the type of sentinel in get_acl
orangefs: fix mode handling
ovl: call posix_acl_release() after error checking
evm: remove dead code in evm_inode_set_acl()
cifs: check whether acl is valid early
acl: make vfs_posix_acl_to_xattr() static
acl: remove a slew of now unused helpers
9p: use stub posix acl handlers
cifs: use stub posix acl handlers
ovl: use stub posix acl handlers
ecryptfs: use stub posix acl handlers
evm: remove evm_xattr_acl_change()
xattr: use posix acl api
ovl: use posix acl api
ovl: implement set acl method
ovl: implement get acl method
ecryptfs: implement set acl method
ecryptfs: implement get acl method
ksmbd: use vfs_remove_acl()
acl: add vfs_remove_acl()
...
|
|
Pull misc vfs updates from Al Viro:
"misc pile"
* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: sysv: Fix sysv_nblocks() returns wrong value
get rid of INT_LIMIT, use type_max() instead
btrfs: replace INT_LIMIT(loff_t) with OFFSET_MAX
fs: simplify vfs_get_super
fs: drop useless condition from inode_needs_update_time
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull namespace fix from Al Viro:
"Fix weird corner case in copy_mnt_ns()"
* tag 'pull-namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
copy_mnt_ns(): handle a corner case (overmounted mntns bindings) saner
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter updates from Al Viro:
"iov_iter work; most of that is about getting rid of direction
misannotations and (hopefully) preventing more of the same for the
future"
* tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
use less confusing names for iov_iter direction initializers
iov_iter: saner checks for attempt to copy to/from iterator
[xen] fix "direction" argument of iov_iter_kvec()
[vhost] fix 'direction' argument of iov_iter_{init,bvec}()
[target] fix iov_iter_bvec() "direction" argument
[s390] memcpy_real(): WRITE is "data source", not destination...
[s390] zcore: WRITE is "data source", not destination...
[infiniband] READ is "data destination", not source...
[fsi] WRITE is "data source", not destination...
[s390] copy_oldmem_kernel() - WRITE is "data source", not destination
csum_and_copy_to_iter(): handle ITER_DISCARD
get rid of unlikely() on page_copy_sane() calls
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull elf coredumping updates from Al Viro:
"Unification of regset and non-regset sides of ELF coredump handling.
Collecting per-thread register values is the only thing that needs to
be ifdefed there..."
* tag 'pull-elfcore' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
[elf] get rid of get_note_info_size()
[elf] unify regset and non-regset cases
[elf][non-regset] use elf_core_copy_task_regs() for dumper as well
[elf][non-regset] uninline elf_core_copy_task_fpregs() (and lose pt_regs argument)
elf_core_copy_task_regs(): task_pt_regs is defined everywhere
[elf][regset] simplify thread list handling in fill_note_info()
[elf][regset] clean fill_note_info() a bit
kill extern of vsyscall32_sysctl
kill coredump_params->regs
kill signal_pt_regs()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- A ptrace API cleanup series from Sergey Shtylyov
- Fixes and cleanups for kexec from ye xingchen
- nilfs2 updates from Ryusuke Konishi
- squashfs feature work from Xiaoming Ni: permit configuration of the
filesystem's compression concurrency from the mount command line
- A series from Akinobu Mita which addresses bound checking errors when
writing to debugfs files
- A series from Yang Yingliang to address rapidio memory leaks
- A series from Zheng Yejian to address possible overflow errors in
encode_comp_t()
- And a whole shower of singleton patches all over the place
* tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits)
ipc: fix memory leak in init_mqueue_fs()
hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount
rapidio: devices: fix missing put_device in mport_cdev_open
kcov: fix spelling typos in comments
hfs: Fix OOB Write in hfs_asc2mac
hfs: fix OOB Read in __hfs_brec_find
relay: fix type mismatch when allocating memory in relay_create_buf()
ocfs2: always read both high and low parts of dinode link count
io-mapping: move some code within the include guarded section
kernel: kcsan: kcsan_test: build without structleak plugin
mailmap: update email for Iskren Chernev
eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD
rapidio: fix possible UAF when kfifo_alloc() fails
relay: use strscpy() is more robust and safer
cpumask: limit visibility of FORCE_NR_CPUS
acct: fix potential integer overflow in encode_comp_t()
acct: fix accuracy loss for input value of encode_comp_t()
linux/init.h: include <linux/build_bug.h> and <linux/stringify.h>
rapidio: rio: fix possible name leak in rio_register_mport()
rapidio: fix possible name leaks when rio_add_device() fails
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Replace prandom_u32_max() and various open-coded variants of it,
there is now a new family of functions that uses fast rejection
sampling to choose properly uniformly random numbers within an
interval:
get_random_u32_below(ceil) - [0, ceil)
get_random_u32_above(floor) - (floor, U32_MAX]
get_random_u32_inclusive(floor, ceil) - [floor, ceil]
Coccinelle was used to convert all current users of
prandom_u32_max(), as well as many open-coded patterns, resulting in
improvements throughout the tree.
I'll have a "late" 6.1-rc1 pull for you that removes the now unused
prandom_u32_max() function, just in case any other trees add a new
use case of it that needs to converted. According to linux-next,
there may be two trivial cases of prandom_u32_max() reintroductions
that are fixable with a 's/.../.../'. So I'll have for you a final
conversion patch doing that alongside the removal patch during the
second week.
This is a treewide change that touches many files throughout.
- More consistent use of get_random_canary().
- Updates to comments, documentation, tests, headers, and
simplification in configuration.
- The arch_get_random*_early() abstraction was only used by arm64 and
wasn't entirely useful, so this has been replaced by code that works
in all relevant contexts.
- The kernel will use and manage random seeds in non-volatile EFI
variables, refreshing a variable with a fresh seed when the RNG is
initialized. The RNG GUID namespace is then hidden from efivarfs to
prevent accidental leakage.
These changes are split into random.c infrastructure code used in the
EFI subsystem, in this pull request, and related support inside of
EFISTUB, in Ard's EFI tree. These are co-dependent for full
functionality, but the order of merging doesn't matter.
- Part of the infrastructure added for the EFI support is also used for
an improvement to the way vsprintf initializes its siphash key,
replacing an sleep loop wart.
- The hardware RNG framework now always calls its correct random.c
input function, add_hwgenerator_randomness(), rather than sometimes
going through helpers better suited for other cases.
- The add_latent_entropy() function has long been called from the fork
handler, but is a no-op when the latent entropy gcc plugin isn't
used, which is fine for the purposes of latent entropy.
But it was missing out on the cycle counter that was also being mixed
in beside the latent entropy variable. So now, if the latent entropy
gcc plugin isn't enabled, add_latent_entropy() will expand to a call
to add_device_randomness(NULL, 0), which adds a cycle counter,
without the absent latent entropy variable.
- The RNG is now reseeded from a delayed worker, rather than on demand
when used. Always running from a worker allows it to make use of the
CPU RNG on platforms like S390x, whose instructions are too slow to
do so from interrupts. It also has the effect of adding in new inputs
more frequently with more regularity, amounting to a long term
transcript of random values. Plus, it helps a bit with the upcoming
vDSO implementation (which isn't yet ready for 6.2).
- The jitter entropy algorithm now tries to execute on many different
CPUs, round-robining, in hopes of hitting even more memory latencies
and other unpredictable effects. It also will mix in a cycle counter
when the entropy timer fires, in addition to being mixed in from the
main loop, to account more explicitly for fluctuations in that timer
firing. And the state it touches is now kept within the same cache
line, so that it's assured that the different execution contexts will
cause latencies.
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
random: include <linux/once.h> in the right header
random: align entropy_timer_state to cache line
random: mix in cycle counter when jitter timer fires
random: spread out jitter callback to different CPUs
random: remove extraneous period and add a missing one in comments
efi: random: refresh non-volatile random seed when RNG is initialized
vsprintf: initialize siphash key using notifier
random: add back async readiness notifier
random: reseed in delayed work rather than on-demand
random: always mix cycle counter in add_latent_entropy()
hw_random: use add_hwgenerator_randomness() for early entropy
random: modernize documentation comment on get_random_bytes()
random: adjust comment to account for removed function
random: remove early archrandom abstraction
random: use random.trust_{bootloader,cpu} command line option only
stackprotector: actually use get_random_canary()
stackprotector: move get_random_canary() into stackprotector.h
treewide: use get_random_u32_inclusive() when possible
treewide: use get_random_u32_{above,below}() instead of manual loop
treewide: use get_random_u32_below() instead of deprecated function
...
|
|
In non-foreground gc mode, if no victim is selected, the gc process
will wait for no_gc_sleep_time before waking up again. In this
subsequent time, even though a victim will be selected, the gc process
still waits for no_gc_sleep_time before waking up. The configuration
of wait_ms is not reasonable.
After any of the victims have been selected, we need to reset wait_ms to
default sleep time from no_gc_sleep_time.
Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
To fix:
WARNING: function definition argument 'struct f2fs_attr *' should also have an identifier name
+ ssize_t (*show)(struct f2fs_attr *, struct f2fs_sb_info *, char *);
WARNING: return sysfs_emit(...) formats should include a terminating newline
+ return sysfs_emit(buf, "(none)");
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'
+ unsigned npages = NODE_MAPPING(sbi)->nrpages;
WARNING: Missing a blank line after declarations
+ unsigned npages = COMPRESS_MAPPING(sbi)->nrpages;
+ si->page_mem += (unsigned long long)npages << PAGE_SHIFT;
WARNING: quoted string split across lines
+ seq_printf(s, "CP merge (Queued: %4d, Issued: %4d, Total: %4d, "
+ "Cur time: %4d(ms), Peak time: %4d(ms))\n",
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|