<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/mm/filemap_xip.c, branch v3.10</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v3.10</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v3.10'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2013-04-09T18:12:56Z</updated>
<entry>
<title>lift sb_start_write() out of -&gt;write()</title>
<updated>2013-04-09T18:12:56Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2013-03-20T17:04:20Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=03d95eb2f2578083a3f6286262e1cb5d88a00c02'/>
<id>urn:sha1:03d95eb2f2578083a3f6286262e1cb5d88a00c02</id>
<content type='text'>
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: move all mmu notifier invocations to be done outside the PT lock</title>
<updated>2012-10-09T07:22:58Z</updated>
<author>
<name>Sagi Grimberg</name>
<email>sagig@mellanox.com</email>
</author>
<published>2012-10-08T23:33:33Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=2ec74c3ef2d8c58d71e0e00336fb6b891192155a'/>
<id>urn:sha1:2ec74c3ef2d8c58d71e0e00336fb6b891192155a</id>
<content type='text'>
In order to allow sleeping during mmu notifier calls, we need to avoid
invoking them under the page table spinlock.  This patch solves the
problem by calling invalidate_page notification after releasing the lock
(but before freeing the page itself), or by wrapping the page invalidation
with calls to invalidate_range_begin and invalidate_range_end.

To prevent accidental changes to the invalidate_range_end arguments after
the call to invalidate_range_begin, the patch introduces a convention of
saving the arguments in consistently named locals:

	unsigned long mmun_start;	/* For mmu_notifiers */
	unsigned long mmun_end;	/* For mmu_notifiers */

	...

	mmun_start = ...
	mmun_end = ...
	mmu_notifier_invalidate_range_start(mm, mmun_start, mmun_end);

	...

	mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end);

The patch changes code to use this convention for all calls to
mmu_notifier_invalidate_range_start/end, except those where the calls are
close enough so that anyone who glances at the code can see the values
aren't changing.

This patchset is a preliminary step towards on-demand paging design to be
added to the RDMA stack.

Why do we want on-demand paging for Infiniband?

  Applications register memory with an RDMA adapter using system calls,
  and subsequently post IO operations that refer to the corresponding
  virtual addresses directly to HW.  Until now, this was achieved by
  pinning the memory during the registration calls.  The goal of on demand
  paging is to avoid pinning the pages of registered memory regions (MRs).
   This will allow users the same flexibility they get when swapping any
  other part of their processes address spaces.  Instead of requiring the
  entire MR to fit in physical memory, we can allow the MR to be larger,
  and only fit the current working set in physical memory.

Why should anyone care?  What problems are users currently experiencing?

  This can make programming with RDMA much simpler.  Today, developers
  that are working with more data than their RAM can hold need either to
  deregister and reregister memory regions throughout their process's
  life, or keep a single memory region and copy the data to it.  On demand
  paging will allow these developers to register a single MR at the
  beginning of their process's life, and let the operating system manage
  which pages needs to be fetched at a given time.  In the future, we
  might be able to provide a single memory access key for each process
  that would provide the entire process's address as one large memory
  region, and the developers wouldn't need to register memory regions at
  all.

Is there any prospect that any other subsystems will utilise these
infrastructural changes?  If so, which and how, etc?

  As for other subsystems, I understand that XPMEM wanted to sleep in
  MMU notifiers, as Christoph Lameter wrote at
  http://lkml.indiana.edu/hypermail/linux/kernel/0802.1/0460.html and
  perhaps Andrea knows about other use cases.

  Scheduling in mmu notifications is required since we need to sync the
  hardware with the secondary page tables change.  A TLB flush of an IO
  device is inherently slower than a CPU TLB flush, so our design works by
  sending the invalidation request to the device, and waiting for an
  interrupt before exiting the mmu notifier handler.

Avi said:

  kvm may be a buyer.  kvm::mmu_lock, which serializes guest page
  faults, also protects long operations such as destroying large ranges.
  It would be good to convert it into a spinlock, but as it is used inside
  mmu notifiers, this cannot be done.

  (there are alternatives, such as keeping the spinlock and using a
  generation counter to do the teardown in O(1), which is what the "may"
  is doing up there).

[akpm@linux-foundation.orgpossible speed tweak in hugetlb_cow(), cleanups]
Signed-off-by: Andrea Arcangeli &lt;andrea@qumranet.com&gt;
Signed-off-by: Sagi Grimberg &lt;sagig@mellanox.com&gt;
Signed-off-by: Haggai Eran &lt;haggaie@mellanox.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Xiao Guangrong &lt;xiaoguangrong@linux.vnet.ibm.com&gt;
Cc: Or Gerlitz &lt;ogerlitz@mellanox.com&gt;
Cc: Haggai Eran &lt;haggaie@mellanox.com&gt;
Cc: Shachar Raindel &lt;raindel@mellanox.com&gt;
Cc: Liran Liss &lt;liranl@mellanox.com&gt;
Cc: Christoph Lameter &lt;cl@linux-foundation.org&gt;
Cc: Avi Kivity &lt;avi@redhat.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: replace vma prio_tree with an interval tree</title>
<updated>2012-10-09T07:22:39Z</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2012-10-08T23:31:25Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=6b2dbba8b6ac4df26f72eda1e5ea7bab9f950e08'/>
<id>urn:sha1:6b2dbba8b6ac4df26f72eda1e5ea7bab9f950e08</id>
<content type='text'>
Implement an interval tree as a replacement for the VMA prio_tree.  The
algorithms are similar to lib/interval_tree.c; however that code can't be
directly reused as the interval endpoints are not explicitly stored in the
VMA.  So instead, the common algorithm is moved into a template and the
details (node type, how to get interval endpoints from the node, etc) are
filled in using the C preprocessor.

Once the interval tree functions are available, using them as a
replacement to the VMA prio tree is a relatively simple, mechanical job.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Hillf Danton &lt;dhillf@gmail.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Catalin Marinas &lt;catalin.marinas@arm.com&gt;
Cc: Andrea Arcangeli &lt;aarcange@redhat.com&gt;
Cc: David Woodhouse &lt;dwmw2@infradead.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: kill vma flag VM_CAN_NONLINEAR</title>
<updated>2012-10-09T07:22:17Z</updated>
<author>
<name>Konstantin Khlebnikov</name>
<email>khlebnikov@openvz.org</email>
</author>
<published>2012-10-08T23:28:46Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=0b173bc4daa8f8ec03a85abf5e47b23502ff80af'/>
<id>urn:sha1:0b173bc4daa8f8ec03a85abf5e47b23502ff80af</id>
<content type='text'>
Move actual pte filling for non-linear file mappings into the new special
vma operation: -&gt;remap_pages().

Filesystems must implement this method to get non-linear mapping support,
if it uses filemap_fault() then generic_file_remap_pages() can be used.

Now device drivers can implement this method and obtain nonlinear vma support.

Signed-off-by: Konstantin Khlebnikov &lt;khlebnikov@openvz.org&gt;
Cc: Alexander Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Carsten Otte &lt;cotte@de.ibm.com&gt;
Cc: Chris Metcalf &lt;cmetcalf@tilera.com&gt;	#arch/tile
Cc: Cyrill Gorcunov &lt;gorcunov@openvz.org&gt;
Cc: Eric Paris &lt;eparis@redhat.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Morris &lt;james.l.morris@oracle.com&gt;
Cc: Jason Baron &lt;jbaron@redhat.com&gt;
Cc: Kentaro Takeda &lt;takedakn@nttdata.co.jp&gt;
Cc: Matt Helsley &lt;matthltc@us.ibm.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Robert Richter &lt;robert.richter@amd.com&gt;
Cc: Suresh Siddha &lt;suresh.b.siddha@intel.com&gt;
Cc: Tetsuo Handa &lt;penguin-kernel@I-love.SAKURA.ne.jp&gt;
Cc: Venkatesh Pallipadi &lt;venki@google.com&gt;
Acked-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>fs: Protect write paths by sb_start_write - sb_end_write</title>
<updated>2012-07-31T05:45:47Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2012-06-12T14:20:37Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=14da9200140f8d722ad1767dfabadebd8b34f2ad'/>
<id>urn:sha1:14da9200140f8d722ad1767dfabadebd8b34f2ad</id>
<content type='text'>
There are several entry points which dirty pages in a filesystem.  mmap
(handled by block_page_mkwrite()), buffered write (handled by
__generic_file_aio_write()), splice write (generic_file_splice_write),
truncate, and fallocate (these can dirty last partial page - handled inside
each filesystem separately). Protect these places with sb_start_write() and
sb_end_write().

-&gt;page_mkwrite() calls are particularly complex since they are called with
mmap_sem held and thus we cannot use standard sb_start_write() due to lock
ordering constraints. We solve the problem by using a special freeze protection
sb_start_pagefault() which ranks below mmap_sem.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa &lt;kamal@canonical.com&gt;
Tested-by: Peter M. Petrakis &lt;peter.petrakis@canonical.com&gt;
Tested-by: Dann Frazier &lt;dann.frazier@canonical.com&gt;
Tested-by: Massimo Morana &lt;massimo.morana@canonical.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>mm: Make default vm_ops provide -&gt;page_mkwrite handler</title>
<updated>2012-07-30T21:02:48Z</updated>
<author>
<name>Jan Kara</name>
<email>jack@suse.cz</email>
</author>
<published>2012-06-12T14:20:29Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=4fcf1c6205fcfc7a226a144ae4d83b7f5415cab8'/>
<id>urn:sha1:4fcf1c6205fcfc7a226a144ae4d83b7f5415cab8</id>
<content type='text'>
Make default vm_ops provide -&gt;page_mkwrite handler. Currently it only updates
file's modification times and gets locked page but later it will also handle
filesystem freezing.

BugLink: https://bugs.launchpad.net/bugs/897421
Tested-by: Kamal Mostafa &lt;kamal@canonical.com&gt;
Tested-by: Peter M. Petrakis &lt;peter.petrakis@canonical.com&gt;
Tested-by: Dann Frazier &lt;dann.frazier@canonical.com&gt;
Tested-by: Massimo Morana &lt;massimo.morana@canonical.com&gt;
Signed-off-by: Jan Kara &lt;jack@suse.cz&gt;
Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>fs: introduce inode operation -&gt;update_time</title>
<updated>2012-06-01T16:07:25Z</updated>
<author>
<name>Josef Bacik</name>
<email>josef@redhat.com</email>
</author>
<published>2012-03-26T13:59:21Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=c3b2da314834499f34cba94f7053e55f6d6f92d8'/>
<id>urn:sha1:c3b2da314834499f34cba94f7053e55f6d6f92d8</id>
<content type='text'>
Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
-&gt;update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,

Signed-off-by: Josef Bacik &lt;josef@redhat.com&gt;
</content>
</entry>
<entry>
<title>mm/filemap_xip.c: fix race condition in xip_file_fault()</title>
<updated>2012-02-04T00:16:41Z</updated>
<author>
<name>Carsten Otte</name>
<email>carsteno@de.ibm.com</email>
</author>
<published>2012-02-03T23:37:14Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=99f02ef1f18631eb0a4e0ea0a3d56878dbcb4b90'/>
<id>urn:sha1:99f02ef1f18631eb0a4e0ea0a3d56878dbcb4b90</id>
<content type='text'>
Fix a race condition that shows in conjunction with xip_file_fault() when
two threads of the same user process fault on the same memory page.

In this case, the race winner will install the page table entry and the
unlucky loser will cause an oops: xip_file_fault calls vm_insert_pfn (via
vm_insert_mixed) which drops out at this check:

	retval = -EBUSY;
	if (!pte_none(*pte))
		goto out_unlock;

The resulting -EBUSY return value will trigger a BUG_ON() in
xip_file_fault.

This fix simply considers the fault as fixed in this case, because the
race winner has successfully installed the pte.

[akpm@linux-foundation.org: use conventional (and consistent) comment layout]
Reported-by: David Sadler &lt;dsadler@us.ibm.com&gt;
Signed-off-by: Carsten Otte &lt;cotte@de.ibm.com&gt;
Reported-by: Louis Alex Eisner &lt;leisner@cs.ucsd.edu&gt;
Cc: Hugh Dickins &lt;hughd@google.com&gt;
Cc: &lt;stable@vger.kernel.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>mm: Map most files to use export.h instead of module.h</title>
<updated>2011-10-31T13:20:12Z</updated>
<author>
<name>Paul Gortmaker</name>
<email>paul.gortmaker@windriver.com</email>
</author>
<published>2011-10-16T06:01:52Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=b95f1b31b75588306e32b2afd32166cad48f670b'/>
<id>urn:sha1:b95f1b31b75588306e32b2afd32166cad48f670b</id>
<content type='text'>
The files changed within are only using the EXPORT_SYMBOL
macro variants.  They are not using core modular infrastructure
and hence don't need module.h but only the export.h header.

Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
</entry>
<entry>
<title>mm: Convert i_mmap_lock to a mutex</title>
<updated>2011-05-25T15:39:18Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>a.p.zijlstra@chello.nl</email>
</author>
<published>2011-05-25T00:12:06Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=3d48ae45e72390ddf8cc5256ac32ed6f7a19cbea'/>
<id>urn:sha1:3d48ae45e72390ddf8cc5256ac32ed6f7a19cbea</id>
<content type='text'>
Straightforward conversion of i_mmap_lock to a mutex.

Signed-off-by: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Acked-by: Hugh Dickins &lt;hughd@google.com&gt;
Cc: Benjamin Herrenschmidt &lt;benh@kernel.crashing.org&gt;
Cc: David Miller &lt;davem@davemloft.net&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Cc: Russell King &lt;rmk@arm.linux.org.uk&gt;
Cc: Paul Mundt &lt;lethal@linux-sh.org&gt;
Cc: Jeff Dike &lt;jdike@addtoit.com&gt;
Cc: Richard Weinberger &lt;richard@nod.at&gt;
Cc: Tony Luck &lt;tony.luck@intel.com&gt;
Cc: KAMEZAWA Hiroyuki &lt;kamezawa.hiroyu@jp.fujitsu.com&gt;
Cc: Mel Gorman &lt;mel@csn.ul.ie&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Namhyung Kim &lt;namhyung@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
