Age | Commit message (Collapse) | Author |
|
This allows callers of migrate_vma() to allocate new page for empty CPU
page table entry (pte_none or back by zero page). This is only for
anonymous memory and it won't allow new page to be instanced if the
userfaultfd is armed.
This is useful to device driver that want to migrate a range of virtual
address and would rather allocate new memory than having to fault later
on.
Link: http://lkml.kernel.org/r/20170817000548.32038-18-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Allow to unmap and restore special swap entry of un-addressable
ZONE_DEVICE memory.
Link: http://lkml.kernel.org/r/20170817000548.32038-17-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Common case for migration of virtual address range is page are map only
once inside the vma in which migration is taking place. Because we
already walk the CPU page table for that range we can directly do the
unmap there and setup special migration swap entry.
Link: http://lkml.kernel.org/r/20170817000548.32038-16-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch add a new memory migration helpers, which migrate memory
backing a range of virtual address of a process to different memory (which
can be allocated through special allocator). It differs from numa
migration by working on a range of virtual address and thus by doing
migration in chunk that can be large enough to use DMA engine or special
copy offloading engine.
Expected users are any one with heterogeneous memory where different
memory have different characteristics (latency, bandwidth, ...). As an
example IBM platform with CAPI bus can make use of this feature to migrate
between regular memory and CAPI device memory. New CPU architecture with
a pool of high performance memory not manage as cache but presented as
regular memory (while being faster and with lower latency than DDR) will
also be prime user of this patch.
Migration to private device memory will be useful for device that have
large pool of such like GPU, NVidia plans to use HMM for that.
Link: http://lkml.kernel.org/r/20170817000548.32038-15-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Introduce a new migration mode that allow to offload the copy to a device
DMA engine. This changes the workflow of migration and not all
address_space migratepage callback can support this.
This is intended to be use by migrate_vma() which itself is use for thing
like HMM (see include/linux/hmm.h).
No additional per-filesystem migratepage testing is needed. I disables
MIGRATE_SYNC_NO_COPY in all problematic migratepage() callback and i
added comment in those to explain why (part of this patch). The commit
message is unclear it should say that any callback that wish to support
this new mode need to be aware of the difference in the migration flow
from other mode.
Some of these callbacks do extra locking while copying (aio, zsmalloc,
balloon, ...) and for DMA to be effective you want to copy multiple
pages in one DMA operations. But in the problematic case you can not
easily hold the extra lock accross multiple call to this callback.
Usual flow is:
For each page {
1 - lock page
2 - call migratepage() callback
3 - (extra locking in some migratepage() callback)
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
5 - copy page
6 - (unlock any extra lock of migratepage() callback)
7 - return from migratepage() callback
8 - unlock page
}
The new mode MIGRATE_SYNC_NO_COPY:
1 - lock multiple pages
For each page {
2 - call migratepage() callback
3 - abort in all problematic migratepage() callback
4 - migrate page state (freeze refcount, update page cache, buffer
head, ...)
} // finished all calls to migratepage() callback
5 - DMA copy multiple pages
6 - unlock all the pages
To support MIGRATE_SYNC_NO_COPY in the problematic case we would need a
new callback migratepages() (for instance) that deals with multiple
pages in one transaction.
Because the problematic cases are not important for current usage I did
not wanted to complexify this patchset even more for no good reason.
Link: http://lkml.kernel.org/r/20170817000548.32038-14-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This introduce a dummy HMM device class so device driver can use it to
create hmm_device for the sole purpose of registering device memory. It
is useful to device driver that want to manage multiple physical device
memory under same struct device umbrella.
Link: http://lkml.kernel.org/r/20170817000548.32038-13-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This introduce a simple struct and associated helpers for device driver to
use when hotpluging un-addressable device memory as ZONE_DEVICE. It will
find a unuse physical address range and trigger memory hotplug for it
which allocates and initialize struct page for the device memory.
Device driver should use this helper during device initialization to
hotplug the device memory. It should only need to remove the memory once
the device is going offline (shutdown or hotremove). There should not be
any userspace API to hotplug memory expect maybe for host device driver to
allow to add more memory to a guest device driver.
Device's memory is manage by the device driver and HMM only provides
helpers to that effect.
Link: http://lkml.kernel.org/r/20170817000548.32038-12-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
HMM pages (private or public device pages) are ZONE_DEVICE page and thus
need special handling when it comes to lru or refcount. This patch make
sure that memcontrol properly handle those when it face them. Those pages
are use like regular pages in a process address space either as anonymous
page or as file back page. So from memcg point of view we want to handle
them like regular page for now at least.
Link: http://lkml.kernel.org/r/20170817000548.32038-11-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
HMM pages (private or public device pages) are ZONE_DEVICE page and
thus you can not use page->lru fields of those pages. This patch
re-arrange the uncharge to allow single page to be uncharge without
modifying the lru field of the struct page.
There is no change to memcontrol logic, it is the same as it was
before this patch.
Link: http://lkml.kernel.org/r/20170817000548.32038-10-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A ZONE_DEVICE page that reach a refcount of 1 is free ie no longer have
any user. For device private pages this is important to catch and thus we
need to special case put_page() for this.
Link: http://lkml.kernel.org/r/20170817000548.32038-9-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
HMM (heterogeneous memory management) need struct page to support
migration from system main memory to device memory. Reasons for HMM and
migration to device memory is explained with HMM core patch.
This patch deals with device memory that is un-addressable memory (ie CPU
can not access it). Hence we do not want those struct page to be manage
like regular memory. That is why we extend ZONE_DEVICE to support
different types of memory.
A persistent memory type is define for existing user of ZONE_DEVICE and a
new device un-addressable type is added for the un-addressable memory
type. There is a clear separation between what is expected from each
memory type and existing user of ZONE_DEVICE are un-affected by new
requirement and new use of the un-addressable type. All specific code
path are protect with test against the memory type.
Because memory is un-addressable we use a new special swap type for when a
page is migrated to device memory (this reduces the number of maximum swap
file).
The main two additions beside memory type to ZONE_DEVICE is two callbacks.
First one, page_free() is call whenever page refcount reach 1 (which
means the page is free as ZONE_DEVICE page never reach a refcount of 0).
This allow device driver to manage its memory and associated struct page.
The second callback page_fault() happens when there is a CPU access to an
address that is back by a device page (which are un-addressable by the
CPU). This callback is responsible to migrate the page back to system
main memory. Device driver can not block migration back to system memory,
HMM make sure that such page can not be pin into device memory.
If device is in some error condition and can not migrate memory back then
a CPU page fault to device memory should end with SIGBUS.
[arnd@arndb.de: fix warning]
Link: http://lkml.kernel.org/r/20170823133213.712917-1-arnd@arndb.de
Link: http://lkml.kernel.org/r/20170817000548.32038-8-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There are new users of memory hotplug emerging. Some of them require
different subset of arch_add_memory. There are some which only require
allocation of struct pages without mapping those pages to the kernel
address space. We currently have __add_pages for that purpose. But this
is rather lowlevel and not very suitable for the code outside of the
memory hotplug. E.g. x86_64 wants to update max_pfn which should be done
by the caller. Introduce add_pages() which should care about those
details if they are needed. Each architecture should define its
implementation and select CONFIG_ARCH_HAS_ADD_PAGES. All others use the
currently existing __add_pages.
Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.com
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This handles page fault on behalf of device driver, unlike
handle_mm_fault() it does not trigger migration back to system memory for
device memory.
Link: http://lkml.kernel.org/r/20170817000548.32038-6-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This does not use existing page table walker because we want to share
same code for our page fault handler.
Link: http://lkml.kernel.org/r/20170817000548.32038-5-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is a heterogeneous memory management (HMM) process address space
mirroring. In a nutshell this provide an API to mirror process address
space on a device. This boils down to keeping CPU and device page table
synchronize (we assume that both device and CPU are cache coherent like
PCIe device can be).
This patch provide a simple API for device driver to achieve address space
mirroring thus avoiding each device driver to grow its own CPU page table
walker and its own CPU page table synchronization mechanism.
This is useful for NVidia GPU >= Pascal, Mellanox IB >= mlx5 and more
hardware in the future.
[jglisse@redhat.com: fix hmm for "mmu_notifier kill invalidate_page callback"]
Link: http://lkml.kernel.org/r/20170830231955.GD9445@redhat.com
Link: http://lkml.kernel.org/r/20170817000548.32038-4-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
HMM provides 3 separate types of functionality:
- Mirroring: synchronize CPU page table and device page table
- Device memory: allocating struct page for device memory
- Migration: migrating regular memory to device memory
This patch introduces some common helpers and definitions to all of
those 3 functionality.
Link: http://lkml.kernel.org/r/20170817000548.32038-3-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Signed-off-by: Evgeny Baskakov <ebaskakov@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Mark Hairgrove <mhairgrove@nvidia.com>
Signed-off-by: Sherry Cheung <SCheung@nvidia.com>
Signed-off-by: Subhash Gutti <sgutti@nvidia.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "HMM (Heterogeneous Memory Management)", v25.
Heterogeneous Memory Management (HMM) (description and justification)
Today device driver expose dedicated memory allocation API through their
device file, often relying on a combination of IOCTL and mmap calls.
The device can only access and use memory allocated through this API.
This effectively split the program address space into object allocated
for the device and useable by the device and other regular memory
(malloc, mmap of a file, share memory, â) only accessible by
CPU (or in a very limited way by a device by pinning memory).
Allowing different isolated component of a program to use a device thus
require duplication of the input data structure using device memory
allocator. This is reasonable for simple data structure (array, grid,
image, â) but this get extremely complex with advance data
structure (list, tree, graph, â) that rely on a web of memory
pointers. This is becoming a serious limitation on the kind of work
load that can be offloaded to device like GPU.
New industry standard like C++, OpenCL or CUDA are pushing to remove
this barrier. This require a shared address space between GPU device
and CPU so that GPU can access any memory of a process (while still
obeying memory protection like read only). This kind of feature is also
appearing in various other operating systems.
HMM is a set of helpers to facilitate several aspects of address space
sharing and device memory management. Unlike existing sharing mechanism
that rely on pining pages use by a device, HMM relies on mmu_notifier to
propagate CPU page table update to device page table.
Duplicating CPU page table is only one aspect necessary for efficiently
using device like GPU. GPU local memory have bandwidth in the TeraBytes/
second range but they are connected to main memory through a system bus
like PCIE that is limited to 32GigaBytes/second (PCIE 4.0 16x). Thus it
is necessary to allow migration of process memory from main system memory
to device memory. Issue is that on platform that only have PCIE the
device memory is not accessible by the CPU with the same properties as
main memory (cache coherency, atomic operations, ...).
To allow migration from main memory to device memory HMM provides a set of
helper to hotplug device memory as a new type of ZONE_DEVICE memory which
is un-addressable by CPU but still has struct page representing it. This
allow most of the core kernel logic that deals with a process memory to
stay oblivious of the peculiarity of device memory.
When page backing an address of a process is migrated to device memory the
CPU page table entry is set to a new specific swap entry. CPU access to
such address triggers a migration back to system memory, just like if the
page was swap on disk. HMM also blocks any one from pinning a ZONE_DEVICE
page so that it can always be migrated back to system memory if CPU access
it. Conversely HMM does not migrate to device memory any page that is pin
in system memory.
To allow efficient migration between device memory and main memory a new
migrate_vma() helpers is added with this patchset. It allows to leverage
device DMA engine to perform the copy operation.
This feature will be use by upstream driver like nouveau mlx5 and probably
other in the future (amdgpu is next suspect in line). We are actively
working on nouveau and mlx5 support. To test this patchset we also worked
with NVidia close source driver team, they have more resources than us to
test this kind of infrastructure and also a bigger and better userspace
eco-system with various real industry workload they can be use to test and
profile HMM.
The expected workload is a program builds a data set on the CPU (from
disk, from network, from sensors, â). Program uses GPU API (OpenCL,
CUDA, ...) to give hint on memory placement for the input data and also
for the output buffer. Program call GPU API to schedule a GPU job, this
happens using device driver specific ioctl. All this is hidden from
programmer point of view in case of C++ compiler that transparently
offload some part of a program to GPU. Program can keep doing other stuff
on the CPU while the GPU is crunching numbers.
It is expected that CPU will not access the same data set as the GPU while
GPU is working on it, but this is not mandatory. In fact we expect some
small memory object to be actively access by both GPU and CPU concurrently
as synchronization channel and/or for monitoring purposes. Such object
will stay in system memory and should not be bottlenecked by system bus
bandwidth (rare write and read access from both CPU and GPU).
As we are relying on device driver API, HMM does not introduce any new
syscall nor does it modify any existing ones. It does not change any
POSIX semantics or behaviors. For instance the child after a fork of a
process that is using HMM will not be impacted in anyway, nor is there any
data hazard between child COW or parent COW of memory that was migrated to
device prior to fork.
HMM assume a numbers of hardware features. Device must allow device page
table to be updated at any time (ie device job must be preemptable).
Device page table must provides memory protection such as read only.
Device must track write access (dirty bit). Device must have a minimum
granularity that match PAGE_SIZE (ie 4k).
Reviewer (just hint):
Patch 1 HMM documentation
Patch 2 introduce core infrastructure and definition of HMM, pretty
small patch and easy to review
Patch 3 introduce the mirror functionality of HMM, it relies on
mmu_notifier and thus someone familiar with that part would be
in better position to review
Patch 4 is an helper to snapshot CPU page table while synchronizing with
concurrent page table update. Understanding mmu_notifier makes
review easier.
Patch 5 is mostly a wrapper around handle_mm_fault()
Patch 6 add new add_pages() helper to avoid modifying each arch memory
hot plug function
Patch 7 add a new memory type for ZONE_DEVICE and also add all the logic
in various core mm to support this new type. Dan Williams and
any core mm contributor are best people to review each half of
this patchset
Patch 8 special case HMM ZONE_DEVICE pages inside put_page() Kirill and
Dan Williams are best person to review this
Patch 9 allow to uncharge a page from memory group without using the lru
list field of struct page (best reviewer: Johannes Weiner or
Vladimir Davydov or Michal Hocko)
Patch 10 Add support to uncharge ZONE_DEVICE page from a memory cgroup (best
reviewer: Johannes Weiner or Vladimir Davydov or Michal Hocko)
Patch 11 add helper to hotplug un-addressable device memory as new type
of ZONE_DEVICE memory (new type introducted in patch 3 of this
serie). This is boiler plate code around memory hotplug and it
also pick a free range of physical address for the device memory.
Note that the physical address do not point to anything (at least
as far as the kernel knows).
Patch 12 introduce a new hmm_device class as an helper for device driver
that want to expose multiple device memory under a common fake
device driver. This is usefull for multi-gpu configuration.
Anyone familiar with device driver infrastructure can review
this. Boiler plate code really.
Patch 13 add a new migrate mode. Any one familiar with page migration is
welcome to review.
Patch 14 introduce a new migration helper (migrate_vma()) that allow to
migrate a range of virtual address of a process using device DMA
engine to perform the copy. It is not limited to do copy from and
to device but can also do copy between any kind of source and
destination memory. Again anyone familiar with migration code
should be able to verify the logic.
Patch 15 optimize the new migrate_vma() by unmapping pages while we are
collecting them. This can be review by any mm folks.
Patch 16 add unaddressable memory migration to helper introduced in patch
7, this can be review by anyone familiar with migration code
Patch 17 add a feature that allow device to allocate non-present page on
the GPU when migrating a range of address to device memory. This
is an helper for device driver to avoid having to first allocate
system memory before migration to device memory
Patch 18 add a new kind of ZONE_DEVICE memory for cache coherent device
memory (CDM)
Patch 19 add an helper to hotplug CDM memory
Previous patchset posting :
v1 http://lwn.net/Articles/597289/
v2 https://lkml.org/lkml/2014/6/12/559
v3 https://lkml.org/lkml/2014/6/13/633
v4 https://lkml.org/lkml/2014/8/29/423
v5 https://lkml.org/lkml/2014/11/3/759
v6 http://lwn.net/Articles/619737/
v7 http://lwn.net/Articles/627316/
v8 https://lwn.net/Articles/645515/
v9 https://lwn.net/Articles/651553/
v10 https://lwn.net/Articles/654430/
v11 http://www.gossamer-threads.com/lists/linux/kernel/2286424
v12 http://www.kernelhub.org/?msg=972982&p=2
v13 https://lwn.net/Articles/706856/
v14 https://lkml.org/lkml/2016/12/8/344
v15 http://www.mail-archive.com/linux-kernel@xxxxxxxxxxxxxxx/msg1304107.html
v16 http://www.spinics.net/lists/linux-mm/msg119814.html
v17 https://lkml.org/lkml/2017/1/27/847
v18 https://lkml.org/lkml/2017/3/16/596
v19 https://lkml.org/lkml/2017/4/5/831
v20 https://lwn.net/Articles/720715/
v21 https://lkml.org/lkml/2017/4/24/747
v22 http://lkml.iu.edu/hypermail/linux/kernel/1705.2/05176.html
v23 https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1404788.html
v24 https://lwn.net/Articles/726691/
This patch (of 19):
This adds documentation for HMM (Heterogeneous Memory Management). It
presents the motivation behind it, the features necessary for it to be
useful and and gives an overview of how this is implemented.
Link: http://lkml.kernel.org/r/20170817000548.32038-2-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch enables thp migration for memory hotremove.
Link: http://lkml.kernel.org/r/20170717193955.20207-11-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch enables thp migration for move_pages(2).
Link: http://lkml.kernel.org/r/20170717193955.20207-10-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch enables thp migration for mbind(2) and migrate_pages(2).
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Soft dirty bit is designed to keep tracked over page migration. This
patch makes it work in the same manner for thp migration too.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When THP migration is being used, memory management code needs to handle
pmd migration entries properly. This patch uses !pmd_present() or
is_swap_pmd() (depending on whether pmd_none() needs separate code or
not) to check pmd migration entries at the places where a pmd entry is
present.
Since pmd-related code uses split_huge_page(), split_huge_pmd(),
pmd_trans_huge(), pmd_trans_unstable(), or
pmd_none_or_trans_huge_or_clear_bad(), this patch:
1. adds pmd migration entry split code in split_huge_pmd(),
2. takes care of pmd migration entries whenever pmd_trans_huge() is present,
3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.
Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
them.
Until this commit, a pmd entry should be:
1. pointing to a pte page,
2. is_swap_pmd(),
3. pmd_trans_huge(),
4. pmd_devmap(), or
5. pmd_none().
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.
This patch makes it possible to support thp migration. If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration. If you succeed to
allocate destination thp, you enter thp migration. Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.
[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Introduce CONFIG_ARCH_ENABLE_THP_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.
Link: http://lkml.kernel.org/r/20170717193955.20207-5-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Reviewed-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
TTU_MIGRATION is used to convert pte into migration entry until thp
split completes. This behavior conflicts with thp migration added later
patches, so let's introduce a new TTU flag specifically for freezing.
try_to_unmap() is used both for thp split (via freeze_page()) and page
migration (via __unmap_and_move()). In freeze_page(), ttu_flag given
for head page is like below (assuming anonymous thp):
(TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
TTU_MIGRATION | TTU_SPLIT_HUGE_PMD)
and ttu_flag given for tail pages is:
(TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | \
TTU_MIGRATION)
__unmap_and_move() calls try_to_unmap() with ttu_flag:
(TTU_MIGRATION | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS)
Now I'm trying to insert a branch for thp migration at the top of
try_to_unmap_one() like below
static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
unsigned long address, void *arg)
{
...
/* PMD-mapped THP migration entry */
if (!pvmw.pte && (flags & TTU_MIGRATION)) {
if (!PageAnon(page))
continue;
set_pmd_migration_entry(&pvmw, page);
continue;
}
...
}
so try_to_unmap() for tail pages called by thp split can go into thp
migration code path (which converts *pmd* into migration entry), while
the expectation is to freeze thp (which converts *pte* into migration
entry.)
I detected this failure as a "bad page state" error in a testcase where
split_huge_page() is called from queue_pages_pte_range().
Link: http://lkml.kernel.org/r/20170717193955.20207-4-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
_PAGE_PSE is used to distinguish between a truly non-present
(_PAGE_PRESENT=0) PMD, and a PMD which is undergoing a THP split and
should be treated as present.
But _PAGE_SWP_SOFT_DIRTY currently uses the _PAGE_PSE bit, which would
cause confusion between one of those PMDs undergoing a THP split, and a
soft-dirty PMD. Dropping _PAGE_PSE check in pmd_present() does not work
well, because it can hurt optimization of tlb handling in thp split.
Thus, we need to move the bit.
In the current kernel, bits 1-4 are not used in non-present format since
commit 00839ee3b299 ("x86/mm: Move swap offset/type up in PTE to work
around erratum"). So let's move _PAGE_SWP_SOFT_DIRTY to bit 1. Bit 7
is used as reserved (always clear), so please don't use it for other
purpose.
Link: http://lkml.kernel.org/r/20170717193955.20207-3-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Patch series "mm: page migration enhancement for thp", v9.
Motivations:
1. THP migration becomes important in the upcoming heterogeneous memory
systems. As David Nellans from NVIDIA pointed out from other threads
(http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1349227.html),
future GPUs or other accelerators will have their memory managed by
operating systems. Moving data into and out of these memory nodes
efficiently is critical to applications that use GPUs or other
accelerators. Existing page migration only supports base pages, which
has a very low memory bandwidth utilization. My experiments (see
below) show THP migration can migrate pages more efficiently.
2. Base page migration vs THP migration throughput.
Here are cross-socket page migration results from calling
move_pages() syscall:
In x86_64, a Intel two-socket E5-2640v3 box,
- single 4KB base page migration takes 62.47 us, using 0.06 GB/s BW,
- single 2MB THP migration takes 658.54 us, using 2.97 GB/s BW,
- 512 4KB base page migration takes 1987.38 us, using 0.98 GB/s BW.
In ppc64, a two-socket Power8 box,
- single 64KB base page migration takes 49.3 us, using 1.24 GB/s BW,
- single 16MB THP migration takes 2202.17 us, using 7.10 GB/s BW,
- 256 64KB base page migration takes 2543.65 us, using 6.14 GB/s BW.
THP migration can give us 3x and 1.15x throughput over base page
migration in x86_64 and ppc64 respectivley.
You can test it out by using the code here:
https://github.com/x-y-z/thp-migration-bench
3. Existing page migration splits THP before migration and cannot
guarantee the migrated pages are still contiguous. Contiguity is
always what GPUs and accelerators look for. Without THP migration,
khugepaged needs to do extra work to reassemble the migrated pages
back to THPs.
This patch (of 10):
Introduce a separate check routine related to MPOL_MF_INVERT flag. This
patch just does cleanup, no behavioral change.
Link: http://lkml.kernel.org/r/20170717193955.20207-2-zi.yan@sent.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The fix in the parent made me look at that function, and react to how
illogical and illegible the array initializer was.
Use named array indexes to make it clearer what is going on, and make
the initializer not depend silently on the exact index numbers.
[ The initializer now also shows an odd inconsistency in the naming:
note the IWCM vs IWPM.. - Linus ]
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: Doug Ledford <dledford@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The netlink message sent with type == 0, which doesn't have any client
behind it, caused to the overflow in max_num_ops array.
Fix it by declaring zero number of ops for the first client.
Fixes: c9901724a2f1 ("RDMA/netlink: Remove netlink clients infrastructure")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove our use of 'gperf' for generating perfect hashes from some of our
build tools.
This removal was prompted by Masahiro Yamada sending out a patch that
removes all our pre-generated files, and when I tested it, I noticed
that the gperf version I have (3.1) apparently generates code that no
longer works with out code-base because the function interfaces
generated by gperf have changed.
We really don't care that much, and the gperf people changed their
interfaces in ways that makes it annoying to work with them. Tools that
make it hard to use them should not be used, and the kernel is not at
all interested in some autoconf mess. So remove the gperf dependency
entirely.
It turns out that if you ignore the pre-generated files, the use of
gperf apparently saved us a whopping fifteen lines of code. It
obviously wasn't worth it, considering that the pre-generated files are
about 500 lines.
I sent this out as a patch about three weeks ago, and got absolutely
zero responses. So let's see if anybody notices now that I merge it.
Because there might be serious bugs here, but it WorksForMe(tm).
* gperf-removal:
Remove gperf usage from toolchain
|
|
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas,
megaraid_sas, zfcp and a host of minor updates.
The major driver change here is the elimination of the block based
cciss driver in favour of the SCSI based hpsa driver (which now drives
all the legacy cases cciss used to be required for). Plus a reset
handler clean up and the redo of the SAS SMP handler to use bsg lib"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits)
scsi: scsi-mq: Always unprepare before requeuing a request
scsi: Show .retries and .jiffies_at_alloc in debugfs
scsi: Improve requeuing behavior
scsi: Call scsi_initialize_rq() for filesystem requests
scsi: qla2xxx: Reset the logo flag, after target re-login.
scsi: qla2xxx: Fix slow mem alloc behind lock
scsi: qla2xxx: Clear fc4f_nvme flag
scsi: qla2xxx: add missing includes for qla_isr
scsi: qla2xxx: Fix an integer overflow in sysfs code
scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2()
scsi: aacraid: get rid of one level of indentation
scsi: aacraid: fix indentation errors
scsi: storvsc: fix memory leak on ring buffer busy
scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough
scsi: smartpqi: remove the smp_handler stub
scsi: hpsa: remove the smp_handler stub
scsi: bsg-lib: pass the release callback through bsg_setup_queue
scsi: Rework handling of scsi_device.vpd_pg8[03]
scsi: Rework the code for caching Vital Product Data (VPD)
scsi: rcu: Introduce rcu_swap_protected()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Do not allow use of freed init data and code even when boot consoles
are forced to stay. Also check for the init memory more precisely.
- Some code clean up by starting contributors.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk: Clean up do_syslog() error handling
printk/console: Enhance the check for consoles using init memory
printk/console: Always disable boot consoles that use init memory before it is freed
printk: Modify operators of printed_len and text_len
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
"A small pull request for audit this time, only four patches and only
two with any real code changes.
Those two changes are the removal of a pointless SELinux AVC
initialization audit event and a fix to improve the audit timestamp
overhead.
The other two patches are comment cleanup and administrative updates,
nothing very exciting.
Everything passes our tests"
* tag 'audit-pr-20170907' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: update the function comments
selinux: remove AVC init audit log message
audit: update the audit info in MAINTAINERS
audit: Reduce overhead using a coarse clock
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull secureexec update from Kees Cook:
"This series has the ultimate goal of providing a sane stack rlimit
when running set*id processes.
To do this, the bprm_secureexec LSM hook is collapsed into the
bprm_set_creds hook so the secureexec-ness of an exec can be
determined early enough to make decisions about rlimits and the
resulting memory layouts. Other logic acting on the secureexec-ness of
an exec is similarly consolidated. Capabilities needed some special
handling, but the refactoring removed other special handling, so that
was a wash"
* tag 'secureexec-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
exec: Consolidate pdeath_signal clearing
exec: Use sane stack rlimit under secureexec
exec: Consolidate dumpability logic
smack: Remove redundant pdeath_signal clearing
exec: Use secureexec for clearing pdeath_signal
exec: Use secureexec for setting dumpability
LSM: drop bprm_secureexec hook
commoncap: Move cap_elevated calculation into bprm_set_creds
commoncap: Refactor to remove bprm_secureexec hook
smack: Refactor to remove bprm_secureexec hook
selinux: Refactor to remove bprm_secureexec hook
apparmor: Refactor to remove bprm_secureexec hook
binfmt: Introduce secureexec flag
exec: Correct comments about "point of no return"
exec: Rename bprm->cred_prepared to called_set_creds
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc plugins update from Kees Cook:
"This finishes the porting work on randstruct, and introduces a new
option to structleak, both noted below:
- For the randstruct plugin, enable automatic randomization of
structures that are entirely function pointers (along with a couple
designated initializer fixes).
- For the structleak plugin, provide an option to perform zeroing
initialization of all otherwise uninitialized stack variables that
are passed by reference (Ard Biesheuvel)"
* tag 'gcc-plugins-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
gcc-plugins: structleak: add option to init all vars used as byref args
randstruct: Enable function pointer struct detection
drivers/net/wan/z85230.c: Use designated initializers
drm/amd/powerplay: rv: Use designated initializers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore update from Kees Cook:
"Make pstore permissions more versatile by removing CAP_SYSLOG
requirement and defining more restrictive root directory DAC
permissions default (0750, which can be adjust after boot unlike the
CAP_SYSLOG check).
Suggested by Nick Kralevich"
* tag 'pstore-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
Revert "pstore: Honor dmesg_restrict sysctl on dmesg dumps"
pstore: Make default pstorefs root dir perms 0750
|
|
Pull cifs update from Steve French:
"Enable xattr support for smb3 and also a bugfix"
* tag '4.14-smb3-xattr-enable' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Check for timeout on Negotiate stage
cifs: Add support for writing attributes on SMB2+
cifs: Add support for reading attributes on SMB2+
|
|
Pull aio fix from Ben LaHaise:
"Improve aio-nr counting on large SMP systems.
It has been in linux-next for quite some time"
* git://git.kvack.org/~bcrl/aio-next:
fs: aio: fix the increment of aio-nr and counting against aio-max-nr
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull quota scaling updates from Jan Kara:
"This contains changes to make the quota subsystem more scalable.
Reportedly it improves number of files created per second on ext4
filesystem on fast storage by about a factor of 2x"
* 'quota_scaling' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (28 commits)
quota: Add lock annotations to struct members
quota: Reduce contention on dq_data_lock
fs: Provide __inode_get_bytes()
quota: Inline dquot_[re]claim_reserved_space() into callsite
quota: Inline inode_{incr,decr}_space() into callsites
quota: Inline functions into their callsites
ext4: Disable dirty list tracking of dquots when journalling quotas
quota: Allow disabling tracking of dirty dquots in a list
quota: Remove dq_wait_unused from dquot
quota: Move locking into clear_dquot_dirty()
quota: Do not dirty bad dquots
quota: Fix possible corruption of dqi_flags
quota: Propagate ->quota_read errors from v2_read_file_info()
quota: Fix error codes in v2_read_file_info()
quota: Push dqio_sem down to ->read_file_info()
quota: Push dqio_sem down to ->write_file_info()
quota: Push dqio_sem down to ->get_next_id()
quota: Push dqio_sem down to ->release_dqblk()
quota: Remove locking for writing to the old quota format
quota: Do not acquire dqio_sem for dquot overwrites in v2 format
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF, reiserfs, quota, fsnotify cleanups from Jan Kara:
"Several UDF, reiserfs, quota and fsnotify cleanups.
Note that there is also a patch updating MAINTAINERS entry for
notification subsystem to point to me as a maintainer since current
entries are stale"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: make dnotify_fsnotify_ops const
isofs: Delete an unnecessary variable initialisation in isofs_read_inode()
isofs: Adjust four checks for null pointers
isofs: Delete an error message for a failed memory allocation in isofs_read_inode()
quota_v2: Delete an error message for a failed memory allocation in v2_read_file_info()
fs-udf: Delete an error message for a failed memory allocation in two functions
fs-udf: Improve six size determinations
fs-udf: Adjust two checks for null pointers
reiserfs: fix spelling mistake: "tranasction" -> "transaction"
MAINTAINERS: Update entries for notification subsystem
uapi/linux/quota.h: Do not include linux/errno.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
"There's a few orphans in the conversion to %pOF printf specifiers
included here that no one else picked up.
Summary:
- Convert more DT code to use of_property_read_* API.
- Improve DT overlay support when adding multiple overlays
- Convert printk's to %pOF format specifiers. Most went via subsystem
trees, but picked up the remaining orphans
- Correct unittests to use preferred "okay" for "status" property
value
- Add a KASLR seed property
- Vendor prefixes for Mellanox, Theobroma System, Adaptrum, Moxa
- Fix modalias buffer handling
- Clean-up of include paths for building dtbs
- Add bindings for amc6821, isl1208, tsl2x7x, srf02, and srf10
devices
- Add nvmem bindings for MediaTek MT7623 and MT7622 SoC
- Add compatible string for Allwinner H5 Mali-450 GPU
- Fix links to old OpenFirmware docs with new mirror on
devicetree.org
- Remove status property from binding doc examples"
* tag 'devicetree-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (45 commits)
devicetree: Adjust status "ok" -> "okay" under drivers/of/
dt-bindings: Remove "status" from examples
dt-bindings: pinctrl: sh-pfc: Use generic node name
dt-bindings: Add vendor Mellanox
dt-binding: net/phy: fix interrupts description
virt: Convert to using %pOF instead of full_name
macintosh: Convert to using %pOF instead of full_name
ide: pmac: Convert to using %pOF instead of full_name
microblaze: Convert to using %pOF instead of full_name
dt-bindings: usb: musb: Grammar s/the/to/, s/is/are/
of: Use PLATFORM_DEVID_NONE definition
of/device: Fix of_device_get_modalias() buffer handling
of/device: Prevent buffer overflow in of_device_modalias()
dt-bindings: add amc6821, isl1208 trivial bindings
dt-bindings: add vendor prefix for Theobroma Systems
of: search scripts/dtc/include-prefixes path for both CPP and DTC
of: remove arch/$(SRCARCH)/boot/dts from include search path for CPP
of: remove drivers/of/testcase-data from include search path for CPP
of: return of_get_cpu_node from of_cpu_device_node_get if CPUs are not registered
iio: srf08: add device tree binding for srf02 and srf10
...
|
|
git://anongit.freedesktop.org/git/drm-intel
Pull i916 drm fixes from Rodrigo Vivi:
"Since Dave is on paternity leave we are sending drm/i915 fixes for
v4.14-rc1 directly to you as he had asked us to do.
The most critical ones are the GPU reset fix for gen2-4 and GVT fix
for a regression that is blocking gvt init to work on your tree.
The rest is general fixes for patches coming from drm-next"
Acked-by: Dave Airlie <airlied@redhat.com>
* tag 'drm-intel-next-fixes-2017-09-07' of git://anongit.freedesktop.org/git/drm-intel:
drm/i915: Re-enable GTT following a device reset
drm/i915: Annotate user relocs with __user
drm/i915: Silence sparse by using gfp_t
drm/i915: Add __rcu to radix tree slot pointer
drm/i915: Fix the missing PPAT cache attributes on CNL
drm/i915/gvt: Remove one duplicated MMIO
drm/i915: Fix enum pipe vs. enum transcoder for the PCH transcoder
drm/i915: Make i2c lock ops static
drm/i915: Make i9xx_load_ycbcr_conversion_matrix() static
drm/i915/edp: Increase T12 panel delay to 900 ms to fix DP AUX CH timeouts
drm/i915: Ignore duplicate VMA stored within the per-object handle LUT
drm/i915: Skip fence alignemnt check for the CCS plane
drm/i915: Treat fb->offsets[] as a raw byte offset instead of a linear offset
drm/i915: Always wake the device to flush the GTT
drm/i915: Recreate vmapping even when the object is pinned
drm/i915: Quietly cancel FBC activation if CRTC is turned off before worker
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED updates from Jacek Anaszewski:
"LED class drivers improvements:
leds-pca955x:
- add Device Tree support and bindings
- use devm_led_classdev_register()
- add GPIO support
- prevent crippled LED class device name
- check for I2C errors
leds-gpio:
- add optional retain-state-shutdown DT property
- allow LED to retain state at shutdown
leds-tlc591xx:
- merge conditional tests
- add missing of_node_put
leds-powernv:
- delete an error message for a failed memory allocation in
powernv_led_create()
leds-is31fl32xx.c
- convert to using custom %pOF printf format specifier
Constify attribute_group structures in:
- leds-blinkm
- leds-lm3533
Make several arrays static const in:
- leds-aat1290
- leds-lp5521
- leds-lp5562
- leds-lp8501"
* tag 'leds_for_4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: pca955x: check for I2C errors
leds: gpio: Allow LED to retain state at shutdown
dt-bindings: leds: gpio: Add optional retain-state-shutdown property
leds: powernv: Delete an error message for a failed memory allocation in powernv_led_create()
leds: lp8501: make several arrays static const
leds: lp5562: make several arrays static const
leds: lp5521: make several arrays static const
leds: aat1290: make array max_mm_current_percent static const
leds: pca955x: Prevent crippled LED device name
leds: lm3533: constify attribute_group structure
dt-bindings: leds: add pca955x
leds: pca955x: add GPIO support
leds: pca955x: use devm_led_classdev_register
leds: pca955x: add device tree support
leds: Convert to using %pOF instead of full_name
leds: blinkm: constify attribute_group structures.
leds: tlc591xx: add missing of_node_put
leds: tlc591xx: merge conditional tests
|
|
Pull dmaengine updates from Vinod Koul:
"This one features the usual updates to the drivers and one good part
of removing DA_SG from core as it has no users.
Summary:
- Remove DMA_SG support as we have no users for this feature
- New driver for Altera / Intel mSGDMA IP core
- Support for memset in dmatest and qcom_hidma driver
- Update for non cyclic mode in k3dma, bunch of update in bam_dma,
bcm sba-raid
- Constify device ids across drivers"
* tag 'dmaengine-4.14-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (52 commits)
dmaengine: sun6i: support V3s SoC variant
dmaengine: sun6i: make gate bit in sun8i's DMA engines a common quirk
dmaengine: rcar-dmac: document R8A77970 bindings
dmaengine: xilinx_dma: Fix error code format specifier
dmaengine: altera: Use macros instead of structs to describe the registers
dmaengine: ti-dma-crossbar: Fix dra7 reserve function
dmaengine: pl330: constify amba_id
dmaengine: pl08x: constify amba_id
dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_COMPLETED
dmaengine: bcm-sba-raid: Explicitly ACK mailbox message after sending
dmaengine: bcm-sba-raid: Add debugfs support
dmaengine: bcm-sba-raid: Remove redundant SBA_REQUEST_STATE_RECEIVED
dmaengine: bcm-sba-raid: Re-factor sba_process_deferred_requests()
dmaengine: bcm-sba-raid: Pre-ack async tx descriptor
dmaengine: bcm-sba-raid: Peek mbox when we have no free requests
dmaengine: bcm-sba-raid: Alloc resources before registering DMA device
dmaengine: bcm-sba-raid: Improve sba_issue_pending() run duration
dmaengine: bcm-sba-raid: Increase number of free sba_request
dmaengine: bcm-sba-raid: Allow arbitrary number free sba_request
dmaengine: bcm-sba-raid: Remove reqs_free_count from sba_device
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight
Pull backlight updates from Lee Jones:
"Fix-ups:
- Constification; pwm_bl
- Use new GPIO API; gpio_backlight
- Remove unused functionality; gpio_backlight
Bug Fixes:
- Fix artificial MAXREG limit; lm3630a_bl"
* tag 'backlight-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/backlight:
backlight: gpio_backlight: Delete pdata inversion
backlight: gpio_backlight: Convert to use GPIO descriptor
backlight: pwm_bl: Make of_device_ids const
backlight: lm3630a: Bump REG_MAX value to 0x50 instead of 0x1F
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD updates from Lee Jones:
"New Drivers
- RK805 Power Management IC (PMIC)
- ROHM BD9571MWV-M MFD Power Management IC (PMIC)
- Texas Instruments TPS68470 Power Management IC (PMIC) & LEDs
New Device Support:
- Add support for HiSilicon Hi6421v530 to hi6421-pmic-core
- Add support for X-Powers AXP806 to axp20x
- Add support for X-Powers AXP813 to axp20x
- Add support for Intel Sunrise Point LPSS to intel-lpss-pci
New Functionality:
- Amend API to provide register layout; atmel-smc
Fix-ups:
- DT re-work; omap, nokia
- Header file location change {I2C => MFD}; dm355evm_msp, tps65010
- Fix chip ID formatting issue(s); rk808
- Optionally register touchscreen devices; da9052-core
- Documentation improvements; twl-core
- Constification; rtsx_pcr, ab8500-core, da9055-i2c, da9052-spi
- Drop unnecessary static declaration; max8925-i2c
- Kconfig changes (missing deps and remove module support)
- Slim down oversized licence statement; hi6421-pmic-core
- Use managed resources (devm_*); lp87565
- Supply proper error checking/handling; t7l66xb
Bug Fixes:
- Fix counter duplication issue; da9052-core
- Fix potential NULL deference issue; max8998
- Leave SPI-NOR write-protection bit alone; lpc_ich
- Ensure device is put into reset during suspend; intel-lpss
- Correct register offset variable size; omap-usb-tll"
* tag 'mfd-next-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (61 commits)
mfd: intel_soc_pmic: Differentiate between Bay and Cherry Trail CRC variants
mfd: intel_soc_pmic: Export separate mfd-cell configs for BYT and CHT
dt-bindings: mfd: Add bindings for ZII RAVE devices
mfd: omap-usb-tll: Fix register offsets
mfd: da9052: Constify spi_device_id
mfd: intel-lpss: Put I2C and SPI controllers into reset state on suspend
mfd: da9055: Constify i2c_device_id
mfd: intel-lpss: Add missing PCI ID for Intel Sunrise Point LPSS devices
mfd: t7l66xb: Handle return value of clk_prepare_enable
mfd: Add ROHM BD9571MWV-M PMIC DT bindings
mfd: intel_soc_pmic_chtwc: Turn Kconfig option into a bool
mfd: lp87565: Convert to use devm_mfd_add_devices()
mfd: Add support for TPS68470 device
mfd: lpc_ich: Do not touch SPI-NOR write protection bit on Haswell/Broadwell
mfd: syscon: atmel-smc: Add helper to retrieve register layout
mfd: axp20x: Use correct platform device ID for many PEK
dt-bindings: mfd: axp20x: Introduce bindings for AXP813
mfd: axp20x: Add support for AXP813 PMIC
dt-bindings: mfd: axp20x: Add AXP806 to supported list of chips
mfd: Add ROHM BD9571MWV-M MFD PMIC driver
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov:
- a new GPIO bit-banging driver implementing PS/2 protocol
- a new power key driver for Rockchip RK805 PMIC
- bunch of patches constifying various device ID structures
- Elan I2C touchpad driver now supports devices with 2 buttons
- other assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (76 commits)
Input: byd - make array seq static, reduces object code size
Input: xilinx_ps2 - fix multiline comment style
Input: pxa27x_keypad - handle return value of clk_prepare_enable
Input: tegra-kbc - handle return value of clk_prepare_enable
Input: PS/2 gpio bit banging driver for serio bus
Input: xen-kbdfront - enable auto repeat for xen keyboard frontend driver
Input: ambakmi - constify amba_id
Input: atmel_mxt_ts - add support for reset line
Input: atmel_mxt_ts - use more managed resources
Input: wacom_w8001 - constify serio_device_id
Input: tsc40 - constify serio_device_id
Input: touchwin - constify serio_device_id
Input: touchright - constify serio_device_id
Input: touchit213 - constify serio_device_id
Input: penmount - constify serio_device_id
Input: mtouch - constify serio_device_id
Input: inexio - constify serio_device_id
Input: hampshire - constify serio_device_id
Input: gunze - constify serio_device_id
Input: fujitsu_ts - constify serio_device_id
...
|
|
git://git.linaro.org/landing-teams/working/fujitsu/integration
Pull mailbox updates from Jassi Brar:
"Just behavorial changes to a controller driver: the Broadcom's Flexrm
mailbox driver has been modifified to support debugfs and TX-Done
mechanism by ACK.
Nothing for the core mailbox stack"
* tag 'mailbox-v4.14' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
mailbox: bcm-flexrm-mailbox: Use txdone_ack instead of txdone_poll
mailbox: bcm-flexrm-mailbox: Use bitmap instead of IDA
mailbox: bcm-flexrm-mailbox: Fix mask used in CMPL_START_ADDR_VALUE()
mailbox: bcm-flexrm-mailbox: Add debugfs support
mailbox: bcm-flexrm-mailbox: Set IRQ affinity hint for FlexRM ring IRQs
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
"Brazil's Independence Day pull request :-)
This is one of the biggest media pull requests, with 625 patches
affecting almost all parts of media (RC, DVB, V4L2, CEC, docs).
This contains:
- A lot of new drivers:
* DVB frontends: mxl5xx, stv0910, stv6111;
* camera flash: as3645a led driver;
* HDMI receiver: adv748X;
* camera sensor: Omnivision 6650 5M driver (ov6650);
* HDMI CEC: ao-cec meson driver;
* V4L2: Qualcom camss driver;
* Remote controller: gpio-ir-tx, pwm-ir-tx and zx-irdec drivers.
- The DDbridge DVB driver got a massive update, with makes it in sync
with modern hardware from that vendor;
- There's an important milestone on this series: the DVB
documentation was written in 2003, but only started to be updated
in 2007. It also used to contain several gaps from the time it was
kept out of tree, mentioning error codes and device nodes that
never existed upstream. On this series, it received a massive
update: all non-deprecated digital TV APIs are now in sync with the
current implementation;
- Some DVB APIs that aren't used by any upstream driver got removed;
- Other parts of the media documentation algo got updated, fixing
some bugs on its PDF output and making it compatible with Sphinx
version 1.6.
As the number of hacks required to build PDF output reduced, I hope
we'll have less troubles as newer versions of our documentation
toolchain are released (famous last words);
- As usual, lots of driver cleanups and improvements"
* tag 'media/v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (624 commits)
media: leds: as3645a: add V4L2_FLASH_LED_CLASS dependency
media: get rid of removed DMX_GET_CAPS and DMX_SET_SOURCE leftovers
media: Revert "[media] v4l: async: make v4l2 coexist with devicetree nodes in a dt overlay"
media: staging: atomisp: sh_css_calloc shall return a pointer to the allocated space
media: Revert "[media] lirc_dev: remove superfluous get/put_device() calls"
media: add qcom_camss.rst to v4l-drivers rst file
media: dvb headers: make checkpatch happier
media: dvb uapi: move frontend legacy API to another part of the book
media: pixfmt-srggb12p.rst: better format the table for PDF output
media: docs-rst: media: Don't use \small for V4L2_PIX_FMT_SRGGB10 documentation
media: index.rst: don't write "Contents:" on PDF output
media: pixfmt*.rst: replace a two dots by a comma
media: vidioc-g-fmt.rst: adjust table format
media: vivid.rst: add a blank line to correct ReST format
media: v4l2 uapi book: get rid of driver programming's chapter
media: format.rst: use the right markup for important notes
media: docs-rst: cardlists: change their format to flat-tables
media: em28xx-cardlist.rst: update to reflect last changes
media: v4l2-event.rst: adjust table to fit on PDF output
media: docs: don't show ToC for each part on PDF output
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"We have touched quite a lot of files but with fewer changes at this
cycle; as you can see, most of changes are trivial fixes, especially
constification patches.
Among the massive attacks by constification gangs, we had a few core
changes (mostly for ASoC core), as well the fixes and the updates by
major vendors.
Some highlights:
ALSA core:
- Fix possible races in control API user-TLV codes
- Small cleanup of PCM core
ASoC:
- Continued work for componentization; still half-baked, but we're
certainly progressing
- Use of devres for jack detection GPIOs, rather as a cleanup
- Jack detection support for Qualcomm MSM8916
- Support for Allwinner H3, Cirrus Logic CS43130, Intel Kabylake
systems with RT5663, Realtek RT274, TI TLV320AIC32x6 and Wolfson
WM8523"
* tag 'sound-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (512 commits)
ALSA: hda/ca0132 - Fix memory leak at error path
ALSA: hda: Fix forget to free resource in error handling code path in hda_codec_driver_probe
ASoC: cs43130: Fix unused compiler warnings for PM runtime
ASoC: cs43130: Fix possible Oops with invalid dev_id
ASoC: cs43130: fix spelling mistake: "irq_occurrance" -> "irq_occurrence"
ALSA: atmel: Remove leftovers of AVR32 removal
ALSA: atmel: convert AC97c driver to GPIO descriptor API
ALSA: hda/realtek - Enable jack detection function for Intel ALC700
ALSA: hda: Fix regression of hdmi eld control created based on invalid pcm
ASoC: Intel: Skylake: Add IPC to configure the copier secondary pins
ASoC: add missing compile rule for max98371
ASoC: add missing compile rule for sirf-audio-codec
ASoC: add missing compile rule for max98371
ASoC: cs43130: Add devicetree bindings for CS43130
ASoC: cs43130: Add support for CS43130 codec
ASoC: make clock direction configurable in asoc-simple
ALSA: ctxfi: Remove null check before kfree
ASoC: max98927: Changed device property read function
ASoC: max98927: Modified DAPM widget and map to enable/disable VI sense path
ASoC: max98927: Added PM suspend and resume function
...
|