summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2012-01-10mm: writeback: cleanups in preparation for per-zone dirty limitsJohannes Weiner
The next patch will introduce per-zone dirty limiting functions in addition to the traditional global dirty limiting. Rename determine_dirtyable_memory() to global_dirtyable_memory() before adding the zone-specific version, and fix up its documentation. Also, move the functions to determine the dirtyable memory and the function to calculate the dirty limit based on that together so that their relationship is more apparent and that they can be commented on as a group. Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mel@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: exclude reserved pages from dirtyable memoryJohannes Weiner
Per-zone dirty limits try to distribute page cache pages allocated for writing across zones in proportion to the individual zone sizes, to reduce the likelihood of reclaim having to write back individual pages from the LRU lists in order to make progress. This patch: The amount of dirtyable pages should not include the full number of free pages: there is a number of reserved pages that the page allocator and kswapd always try to keep free. The closer (reclaimable pages - dirty pages) is to the number of reserved pages, the more likely it becomes for reclaim to run into dirty pages: +----------+ --- | anon | | +----------+ | | | | | | -- dirty limit new -- flusher new | file | | | | | | | | | -- dirty limit old -- flusher old | | | +----------+ --- reclaim | reserved | +----------+ | kernel | +----------+ This patch introduces a per-zone dirty reserve that takes both the lowmem reserve as well as the high watermark of the zone into account, and a global sum of those per-zone values that is subtracted from the global amount of dirtyable pages. The lowmem reserve is unavailable to page cache allocations and kswapd tries to keep the high watermark free. We don't want to end up in a situation where reclaim has to clean pages in order to balance zones. Not treating reserved pages as dirtyable on a global level is only a conceptual fix. In reality, dirty pages are not distributed equally across zones and reclaim runs into dirty pages on a regular basis. But it is important to get this right before tackling the problem on a per-zone level, where the distance between reclaim and the dirty pages is mostly much smaller in absolute numbers. [akpm@linux-foundation.org: fix highmem build] Signed-off-by: Johannes Weiner <jweiner@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Chris Mason <chris.mason@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10vmscan: add task name to warn_scan_unevictable() messagesKOSAKI Motohiro
If we need to know a usecase, caller program name is critical important. Show it. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> David Rientjes <rientjes@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm, debug: test for online nid when allocating on single nodeDavid Rientjes
Calling alloc_pages_exact_node() means the allocation only passes the zonelist of a single node into the page allocator. If that node isn't online, it's zonelist may never have been initialized causing a strange oops that may not immediately be clear. I recently debugged an issue where node 0 wasn't online and an allocator was passing 0 to alloc_pages_exact_node() and it resulted in a NULL pointer on zonelist->_zoneref. If CONFIG_DEBUG_VM is enabled, though, it would be nice to catch this a bit earlier. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10fadvise: only initiate writeback for specified range with FADV_DONTNEEDShawn Bohrer
Previously POSIX_FADV_DONTNEED would start writeback for the entire file when the bdi was not write congested. This negatively impacts performance if the file contains dirty pages outside of the requested range. This change uses __filemap_fdatawrite_range() to only initiate writeback for the requested range. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Acked-by: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10slub: min order when debug_guardpage_minorder > 0Stanislaw Gruszka
Disable slub debug facilities and allocate slabs at minimal order when debug_guardpage_minorder > 0 to increase probability to catch random memory corruption by cpu exception. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10PM/Hibernate: do not count debug pages as savableStanislaw Gruszka
When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder > 0, we have lot of free pages that are not marked so. Snapshot code account them as savable, what cause hibernate memory preallocation failure. It is pretty hard to make hibernate allocation succeed with debug_guardpage_minorder=1. This change at least make it possible when system has relatively big amount of RAM. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: more intensive memory corruption debuggingStanislaw Gruszka
With CONFIG_DEBUG_PAGEALLOC configured, the CPU will generate an exception on access (read,write) to an unallocated page, which permits us to catch code which corrupts memory. However the kernel is trying to maximise memory usage, hence there are usually few free pages in the system and buggy code usually corrupts some crucial data. This patch changes the buddy allocator to keep more free/protected pages and to interlace free/protected and allocated pages to increase the probability of catching corruption. When the kernel is compiled with CONFIG_DEBUG_PAGEALLOC, debug_guardpage_minorder defines the minimum order used by the page allocator to grant a request. The requested size will be returned with the remaining pages used as guard pages. The default value of debug_guardpage_minorder is zero: no change from current behaviour. [akpm@linux-foundation.org: tweak documentation, s/flg/flag/] Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10kernel.h: add BUILD_BUG() macroDavid Daney
We can place this in definitions that we expect the compiler to remove by dead code elimination. If this assertion fails, we get a nice error message at build time. The GCC function attribute error("message") was added in version 4.3, so we define a new macro __linktime_error(message) to expand to this for GCC-4.3 and later. This will give us an error diagnostic from the compiler on the line that fails. For other compilers __linktime_error(message) expands to nothing, and we have to be content with a link time error, but at least we will still get a build error. BUILD_BUG() expands to the undefined function __build_bug_failed() and will fail at link time if the compiler ever emits code for it. On GCC-4.3 and later, attribute((error())) is used so that the failure will be noted at compile time instead. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: David Rientjes <rientjes@google.com> Cc: DM <dm.n9107@gmail.com> Cc: Ralf Baechle <ralf@linux-mips.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm/hugetlb.c: fix virtual address handling in hugetlb faultKAMEZAWA Hiroyuki
handle_mm_fault() passes 'faulted' address to hugetlb_fault(). This address is not aligned to a hugepage boundary. Most of the functions for hugetlb pages are aware of that and calculate an alignment themselves. However some functions such as copy_user_huge_page() and clear_huge_page() don't handle alignment by themselves. This patch make hugeltb_fault() fix the alignment and pass an aligned addresss (to address of a faulted hugepage) to functions. [akpm@linux-foundation.org: use &=] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10hugetlb: clarify hugetlb_instantiation_mutex usageMichal Hocko
Let's make it clear that we cannot race with other fault handlers due to hugetlb (global) mutex. Also make it clear that we want to keep pte_same checks anayway to have a transition from the global mutex easier. Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Hillf Danton <dhillf@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10hugetlb: detect race upon page allocation failure during COWHillf Danton
Currently we are not rechecking pte_same in hugetlb_cow after we take ptl lock again in the page allocation failure code path and simply retry again. This is not an issue at the moment because hugetlb fault path is protected by hugetlb_instantiation_mutex so we cannot race. The original page is locked and so we cannot race even with the page migration. Let's add the pte_same check anyway as we want to be consistent with the other check later in this function and be safe if we ever remove the mutex. [mhocko@suse.cz: reworded the changelog] Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: account reaped page cache on inode cache pruningKonstantin Khlebnikov
Inode cache pruning indirectly reclaims page-cache by invalidating mapping pages. Let's account them into reclaim-state to notice this progress in memory reclaimer. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: avoid livelock on !__GFP_FS allocationsMel Gorman
Colin Cross reported; Under the following conditions, __alloc_pages_slowpath can loop forever: gfp_mask & __GFP_WAIT is true gfp_mask & __GFP_FS is false reclaim and compaction make no progress order <= PAGE_ALLOC_COSTLY_ORDER These conditions happen very often during suspend and resume, when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL allocations into __GFP_WAIT. The oom killer is not run because gfp_mask & __GFP_FS is false, but should_alloc_retry will always return true when order is less than PAGE_ALLOC_COSTLY_ORDER. In his fix, he avoided retrying the allocation if reclaim made no progress and __GFP_FS was not set. The problem is that this would result in GFP_NOIO allocations failing that previously succeeded which would be very unfortunate. The big difference between GFP_NOIO and suspend converting GFP_KERNEL to behave like GFP_NOIO is that normally flushers will be cleaning pages and kswapd reclaims pages allowing GFP_NOIO to succeed after a short delay. The same does not necessarily apply during suspend as the storage device may be suspended. This patch special cases the suspend case to fail the page allocation if reclaim cannot make progress and adds some documentation on how gfp_allowed_mask is currently used. Failing allocations like this may cause suspend to abort but that is better than a livelock. [mgorman@suse.de: Rework fix to be suspend specific] [rientjes@google.com: Move suspended device check to should_alloc_retry] Reported-by: Colin Cross <ccross@android.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: reduce the amount of work done when updating min_free_kbytesMel Gorman
When min_free_kbytes is updated, some pageblocks are marked MIGRATE_RESERVE. Ordinarily, this work is unnoticable as it happens early in boot but on large machines with 1TB of memory, this has been reported to delay boot times, probably due to the NUMA distances involved. The bulk of the work is due to calling calling pageblock_is_reserved() an unnecessary amount of times and accessing far more struct page metadata than is necessary. This patch significantly reduces the amount of work done by setup_zone_migrate_reserve() improving boot times on 1TB machines. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: migrate: one less atomic operationJacobo Giralt
migrate_page_move_mapping() drops a reference from the old page after unfreezing its counter. Both operations can be merged into a single atomic operation by directly unfreezing to one less reference. The same applies to migrate_huge_page_move_mapping(). Signed-off-by: Jacobo Giralt <jacobo.giralt@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Johannes Weiner <jweiner@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm-tracepoint: fix documentation and examplesKonstantin Khlebnikov
We renamed the page-free mm tracepoints. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm-tracepoint: rename page-free eventsKonstantin Khlebnikov
Rename mm_page_free_direct into mm_page_free and mm_pagevec_free into mm_page_free_batched Since v2.6.33-5426-gc475dab the kernel triggers mm_page_free_direct for all freed pages, not only for directly freed. So, let's name it properly. For pages freed via page-list we also trigger mm_page_free_batched event. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: remove unused pagevec_freeKonstantin Khlebnikov
It not exported and now nobody uses it. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm: add free_hot_cold_page_list() helperKonstantin Khlebnikov
This patch adds helper free_hot_cold_page_list() to free list of 0-order pages. It frees pages directly from list without temporary page-vector. It also calls trace_mm_pagevec_free() to simulate pagevec_free() behaviour. bloat-o-meter: add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28) function old new delta free_hot_cold_page_list - 264 +264 get_page_from_freelist 2129 2132 +3 __pagevec_free 243 239 -4 split_free_page 380 373 -7 release_pages 606 510 -96 free_page_list 188 - -188 Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10vmscan: activate executable pages after first usageKonstantin Khlebnikov
Logic added in commit 8cab4754d24a0 ("vmscan: make mapped executable pages the first class citizen") was noticeably weakened in commit 645747462435d84 ("vmscan: detect mapped file pages used only once"). Currently these pages can become "first class citizens" only after second usage. After this patch page_check_references() will activate they after first usage, and executable code gets yet better chance to stay in memory. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10vmscan: promote shared file mapped pagesKonstantin Khlebnikov
Commit 645747462435 ("vmscan: detect mapped file pages used only once") greatly decreases lifetime of single-used mapped file pages. Unfortunately it also decreases life time of all shared mapped file pages. Because after commit bf3f3bc5e7347 ("mm: don't mark_page_accessed in fault path") page-fault handler does not mark page active or even referenced. Thus page_check_references() activates file page only if it was used twice while it stays in inactive list, meanwhile it activates anon pages after first access. Inactive list can be small enough, this way reclaimer can accidentally throw away any widely used page if it wasn't used twice in short period. After this patch page_check_references() also activate file mapped page at first inactive list scan if this page is already used multiple times via several ptes. I found this while trying to fix degragation in rhel6 (~2.6.32) from rhel5 (~2.6.18). There a complete mess with >100 web/mail/spam/ftp containers, they share all their files but there a lot of anonymous pages: ~500mb shared file mapped memory and 15-20Gb non-shared anonymous memory. In this situation major-pagefaults are very costly, because all containers share the same page. In my load kernel created a disproportionate pressure on the file memory, compared with the anonymous, they equaled only if I raise swappiness up to 150 =) These patches actually wasn't helped a lot in my problem, but I saw noticable (10-20 times) reduce in count and average time of major-pagefault in file-mapped areas. Actually both patches are fixes for commit v2.6.33-5448-g6457474, because it was aimed at one scenario (singly used pages), but it breaks the logic in other scenarios (shared and/or executable pages) Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Pekka Enberg <penberg@kernel.org> Acked-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-10mm/page-writeback.c: make determine_dirtyable_memory static againJohannes Weiner
The tracing ring-buffer used this function briefly, but not anymore. Make it local to the writeback code again. Also, move the function so that no forward declaration needs to be reintroduced. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: vfs: new helper - d_make_root() dcache: use a dispose list in select_parent ceph: d_alloc_root() may fail ext4: fix failure exits isofs: inode leak on mount failure
2012-01-09vfs: new helper - d_make_root()Al Viro
d_alloc_root() with iput() in case of allocation failure... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09dcache: use a dispose list in select_parentDave Chinner
select_parent currently abuses the dentry cache LRU to provide cleanup features for child dentries that need to be freed. It moves them to the tail of the LRU, then tells shrink_dcache_parent() to calls __shrink_dcache_sb to unconditionally move them to a dispose list (as DCACHE_REFERENCED is ignored). __shrink_dcache_sb() has to relock the dentries to move them off the LRU onto the dispose list, but otherwise does not touch the dentries that select_parent() moved to the tail of the LRU. It then passses the dispose list to shrink_dentry_list() which tries to free the dentries. IOWs, the use of __shrink_dcache_sb() is superfluous - we can build exactly the same list of dentries for disposal directly in select_parent() and call shrink_dentry_list() instead of calling __shrink_dcache_sb() to do that. This means that we avoid long holds on the lru lock walking the LRU moving dentries to the dispose list We also avoid the need to relock each dentry just to move it off the LRU, reducing the numebr of times we lock each dentry to dispose of them in shrink_dcache_parent() from 3 to 2 times. Further, we remove one of the two callers of __shrink_dcache_sb(). This also means that __shrink_dcache_sb can be moved into back into prune_dcache_sb() and we no longer have to handle referenced dentries conditionally, simplifying the code. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-nextLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: sparc32: remove unused file: include/asm/pgtsun4.h sparc32: fix PAGE_SIZE definition sparc32: enable different preemptions models sparc32: support atomic64_t apbuart: fix section mismatch warning sparc32: drop useless preprocessor conditional in atomic_32.h sparc32: drop unused atomic24 support
2012-01-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: igmp: Avoid zero delay when receiving odd mixture of IGMP queries netdev: make net_device_ops const bcm63xx: make ethtool_ops const usbnet: make ethtool_ops const net: Fix build with INET disabled. net: introduce netif_addr_lock_nested() and call if when appropriate net: correct lock name in dev_[uc/mc]_sync documentations. net: sk_update_clone is only used in net/core/sock.c 8139cp: fix missing napi_gro_flush. pktgen: set correct max and min in pktgen_setup_inject() smsc911x: Unconditionally include linux/smscphy.h in smsc911x.h asix: fix infinite loop in rx_fixup() net: Default UDP and UNIX diag to 'n'. r6040: fix typo in use of MCR0 register bits net: fix sock_clone reference mismatch with tcp memcontrol
2012-01-09Merge tag 'clk' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
clock management changes for i.MX Another simple series related to clock management, this time only for imx. * tag 'clk' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: mxs: select HAVE_CLK_PREPARE for clock clk: add config option HAVE_CLK_PREPARE into Kconfig ASoC: mxs-saif: convert to clk_prepare/clk_unprepare video: mxsfb: convert to clk_prepare/clk_unprepare serial: mxs-auart: convert to clk_prepare/clk_unprepare net: flexcan: convert to clk_prepare/clk_unprepare mtd: gpmi-lib: convert to clk_prepare/clk_unprepare mmc: mxs-mmc: convert to clk_prepare/clk_unprepare dma: mxs-dma: convert to clk_prepare/clk_unprepare net: fec: add clk_prepare/clk_unprepare ARM: mxs: convert platform code to clk_prepare/clk_unprepare clk: add helper functions clk_prepare_enable and clk_disable_unprepare Fix up trivial conflicts in drivers/net/ethernet/freescale/fec.c due to commit 0ebafefcaa7a ("net: fec: add clk_prepare/clk_unprepare") clashing trivially with commit e163cc97f9ac ("net/fec: fix the .remove code").
2012-01-09Merge tag 'timer' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
timer changes for msm A very simple series. We used to have more churn in the timer area, so this is kept separate. Will probably put this into the drivers series next time. * tag 'timer' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: msm: timer: Use clockevents_config_and_register() msm: timer: Setup interrupt after registering clockevent msm: timer: Remove SoC specific #ifdefs msm: timer: Remove msm_clocks[] and simplify code msm: timer: Fix ONESHOT mode interrupts msm: timer: Use GPT for clockevents and DGT for clocksource msm: timer: Cleanup #includes and #defines msm: timer: Tighten #ifdef for local timer support
2012-01-09Merge tag 'pm' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
power management changes for omap and imx A significant part of the changes for these two platforms went into power management, so they are split out into a separate branch. * tag 'pm' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (65 commits) ARM: imx6: remove __CPUINIT annotation from v7_invalidate_l1 ARM: imx6: fix v7_invalidate_l1 by adding I-Cache invalidation ARM: imx6q: resume PL310 only when CACHE_L2X0 defined ARM: imx6q: build pm code only when CONFIG_PM selected ARM: mx5: use generic irq chip pm interface for pm functions on ARM: omap: pass minimal SoC/board data for UART from dt arm/dts: Add minimal device tree support for omap2420 and omap2430 omap-serial: Add minimal device tree support omap-serial: Use default clock speed (48Mhz) if not specified omap-serial: Get rid of all pdev->id usage ARM: OMAP2+: hwmod: Add a new flag to handle hwmods left enabled at init ARM: OMAP4: PRM: use PRCM interrupt handler ARM: OMAP3: pm: use prcm chain handler ARM: OMAP: hwmod: add support for selecting mpu_irq for each wakeup pad ARM: OMAP2+: mux: add support for PAD wakeup interrupts ARM: OMAP: PRCM: add suspend prepare / finish support ARM: OMAP: PRCM: add support for chain interrupt handler ARM: OMAP3/4: PRM: add functions to read pending IRQs, PRM barrier ARM: OMAP2+: hwmod: Add API to enable IO ring wakeup ARM: OMAP2+: mux: add wakeup-capable hwmod mux entries to dynamic list ...
2012-01-09Merge tag 'drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Driver specific changes Again, a lot of platforms have changes in here: pxa, samsung, omap, at91, imx, ... * tag 'drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (54 commits) ARM: sa1100: clean up of the clock support ARM: pxa: add dummy clock for sa1100-rtc RTC: sa1100: support sa1100, pxa and mmp soc families RTC: sa1100: remove redundant code of setting alarm RTC: sa1100: Clean out ost register Input: zylonite-wm97xx - replace IRQ_GPIO() with gpio_to_irq() pcmcia: pxa: replace IRQ_GPIO() with gpio_to_irq() ARM: EXYNOS: Modified files for SPI consolidation work ARM: S5P64X0: Enable SDHCI support ARM: S5P64X0: Add lookup of sdhci-s3c clocks using generic names ARM: S5P64X0: Add HSMMC setup for host Controller ARM: EXYNOS: Add USB OHCI support to ORIGEN board USB: Add Samsung Exynos OHCI diver ARM: EXYNOS: Add USB OHCI support to SMDKV310 board ARM: EXYNOS: Add USB OHCI device net: macb: fix build break with !CONFIG_OF i2c: tegra: Support DVC controller in device tree i2c: tegra: Add __devinit/exit to probe/remove net/at91_ether: use gpio_is_valid for phy IRQ line ARM: at91/net: add macb ethernet controller in 9g45/9g20 DT ...
2012-01-09Merge tag 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
New feature development This adds support for new features, and contains stuff from most platforms. A number of these patches could have fit into other branches, too, but were small enough not to cause too much confusion here. * tag 'devel' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits) mfd/db8500-prcmu: remove support for early silicon revisions ARM: ux500: fix the smp_twd clock calculation ARM: ux500: remove support for early silicon revisions ARM: ux500: update register files ARM: ux500: register DB5500 PMU dynamically ARM: ux500: update ASIC detection for U5500 ARM: ux500: support DB8520 ARM: picoxcell: implement watchdog restart ARM: OMAP3+: hwmod data: Add the default clockactivity for I2C ARM: OMAP3: hwmod data: disable multiblock reads on MMC1/2 on OMAP34xx/35xx <= ES2.1 ARM: OMAP: USB: EHCI and OHCI hwmod structures for OMAP4 ARM: OMAP: USB: EHCI and OHCI hwmod structures for OMAP3 ARM: OMAP: hwmod data: Add support for AM35xx UART4/ttyO3 ARM: Orion: Remove address map info from all platform data structures ARM: Orion: Get address map from plat-orion instead of via platform_data ARM: Orion: mbus_dram_info consolidation ARM: Orion: Consolidate the address map setup ARM: Kirkwood: Add configuration for MPP12 as GPIO ARM: Kirkwood: Recognize A1 revision of 6282 chip ARM: ux500: update the MOP500 GPIO assignments ...
2012-01-09Merge tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Board-level changes This adds and extends support for specific boards on a number of ARM platforms: omap, imx, samsung, tegra, ... * tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (49 commits) Enable 32 bit flash support for iMX21ADS board ARM: mx31pdk: Add MC13783 RTC support iomux-mx25: configuration to support CSPI3 on CSI pins MX1:apf9328: Add i2c support mioa701: add newly available DoC G3 chip arm/tegra: remove __initdata annotation from pinmux tables arm/tegra: Use bus notifiers to trigger pinmux setup arm/tegra: Refactor board-*-pinmux.c to share code arm/tegra: Fix mistake in Trimslice's pinmux arm/tegra: Rework Seaboard-vs-Ventana pinmux table arm/tegra: Remove useless entries from ventana_pinmux[] arm/tegra: PCIe: Remove include of mach/pinmux.h arm/tegra: Harmony PCIe: Don't touch pinmux arm/tegra: Add AUXDATA for tegra-pinmux and tegra-gpio arm/tegra: Split Seaboard GPIO table to allow for Ventana ARM: imx6q: generate imx6q dtb files arm/imx6q: Rename Sabreauto to Armadillo2 arm/imx6q-sabrelite: add enet phy ksz9021rn fixup arm/imx6: add imx6q sabrelite board support dts/imx: rename uart labels to consistent with hw spec ...
2012-01-09Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
SoC-level changes for tegra and omap This adds support for the new tegra30 SoC, as well as small changes to support minor variations of existing omap SoCs. * tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (26 commits) arm/tegra: Compile tegra_dt_init_irq only when CONFIG_OF arm/tegra: Make MACH_TEGRA_DT depend on ARCH_TEGRA_2x_SOC arm/tegra: Delete tegra_init_clock() arm/tegra: Fix section mismatch errors in tegra30 pinmux arm/tegra: Fix section mismatch errors in tegra20 pinmux arm/tegra: refresh defconfig for tegra30 arm/tegra: add support for tegra30 based board cardhu arm/tegra: implement support for tegra30 arm/tegra: pinmux tables and definitions for tegra30 arm/tegra: add new fields to struct tegra_pingroup_desc arm/tegra: prepare pinmux code for multiple tegra variants arm/tegra: rename tegra20 pinmux files arm/tegra: generalize L2 cache initialization arm/tegra: use PMC reset arm/tegra: rename board-dt.c to board-dt-tegra20.c arm/tegra: prepare early init for multiple tegra variants arm/tegra: don't export clk_measure_input_freq arm/tegra: prepare clock code for multiple tegra variants arm/tegra: cleanup tegra20 support arm/tegra: clk_get should not be fatal ... Fix up trivial conflict in arch/arm/mach-tegra/board-dt-tegra20.c
2012-01-09Merge tag 'cleanup2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Cleanups for the Samsung platforms Various cleanup changes that the device driver changes are built upon. Since the samsung cleanups depend on the device tree series, which depends on the first set of cleanups for tegra. * tag 'cleanup2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: ARM: EXYNOS: Use gpio_request_one ARM: S5PV210: Use gpio_request_one ARM: S3C64XX: Modified according to SPI consolidation work ARM: S5PV210: Modified files for SPI consolidation work ARM: S5P64X0: Modified files for SPI consolidation work ARM: S5PC100: Modified files for SPI consolidation work ARM: S3C64XX: Modified files for SPI consolidation work ARM: SAMSUNG: Consolidation of SPI platform devices to plat-samsung ARM: SAMSUNG: Remove SPI bus clocks from platform data ARM: S5PV210: Add SPI clkdev support ARM: S5P64X0: Add SPI clkdev support ARM: S5PC100: Add SPI clkdev support ARM: S3C64XX: Add SPI clkdev support spi/s3c64xx: Use bus clocks created using clkdev mmc: sdhci-s3c: Use generic clock names for sdhci bus clock options ARM: SAMSUNG: Add lookup of sdhci-s3c clocks using generic names ARM: SAMSUNG: Remove SDHCI bus clocks from platform data ARM: SAMSUNG: Use kmemdup rather than duplicating its implementation ARM: EXYNOS: remove exynos4_scu_enable()
2012-01-09Merge tag 'dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Device tree conversions for samsung and tegra Both platforms had some initial device tree support, but this adds much more to actually make it usable. * tag 'dt' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (45 commits) ARM: dts: Add intial dts file for EXYNOS4210 SoC, SMDKV310 and ORIGEN ARM: EXYNOS: Add Exynos4 device tree enabled board file rtc: rtc-s3c: Add device tree support input: samsung-keypad: Add device tree support ARM: S5PV210: Modify platform data for pl330 driver ARM: S5PC100: Modify platform data for pl330 driver ARM: S5P64x0: Modify platform data for pl330 driver ARM: EXYNOS: Add a alias for pdma clocks ARM: EXYNOS: Limit usage of pl330 device instance to non-dt build ARM: SAMSUNG: Add device tree support for pl330 dma engine wrappers DMA: PL330: Add device tree support ARM: EXYNOS: Modify platform data for pl330 driver DMA: PL330: Infer transfer direction from transfer request instead of platform data DMA: PL330: move filter function into driver serial: samsung: Fix build for non-Exynos4210 devices serial: samsung: add device tree support serial: samsung: merge probe() function from all SoC specific extensions serial: samsung: merge all SoC specific port reset functions ARM: SAMSUNG: register uart clocks to clock lookup list serial: samsung: remove all uses of get_clksrc and set_clksrc ... Fix up fairly trivial conflicts in arch/arm/mach-s3c2440/clock.c and drivers/tty/serial/Kconfig both due to just adding code close to changes.
2012-01-09Merge tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-socLinus Torvalds
Cleanups on various subarchitectures Cleanup patches for various ARM platforms and some of their associated drivers, the bulk of these is for mach-91. Arnd ended up pulling in the restart branch from Russell in order to fix up some simple but annoying merge conflicts. * tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits) arm/at91: fix build of stamp9g20 ARM: u300: delete memory.h MAINTAINERS: add maintainer entry for Picochip picoxcell ARM: picoxcell: move io mappings to common.c ARM: picoxcell: don't reserve irq_descs ARM: picoxcell: remove mach/memory.h ARM: at91: delete the pcontrol_g20_defconfig arm/tegra: Remove code that's ifndef CONFIG_ARM_GIC arm/tegra: remove unused defines arm/tegra: fix variable formatting in makefile ARM: davinci: vpif: move code to driver core header from platform ARM: at91/gpio: fix display of number of irq setuped ARM: at91/gpio: drop PIN_BASE ARM: at91/udc: use gpio_is_valid to check the gpio ARM: at91/ohci: use gpio_is_valid to check the gpio ARM: at91/nand: use gpio_is_valid to check the gpio ARM: at91/mmc: use gpio_is_valid to check the gpio ARM: at91/ide: use gpio_is_valid to check the gpio ARM: at91/pata: use gpio_is_valid to check the gpio ARM: at91/soc: use gpio_is_valid to check the gpio ...
2012-01-09Merge tag 'fixes-non-critical' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Non-critical bug fixes Simple bug fixes that were not considered important enough for inclusion into 3.2. * tag 'fixes-non-critical' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: MAINTAINERS: update pxa and mmp ARM: pxa: Include linux/export.h in balloon3.c ARM: OMAP4: clock: Add CPU local timer clock node ARM: OMAP4: hwmod: Don't wait for the idle status if modulemode is not supported ARM: OMAP: AM3517/3505: fix crash on boot due to incorrect voltagedomain data ARM: OMAP: hwmod data: fix the panic on Nokia RM-680 during boot ARM: OMAP2+: DMA: Workaround for invalid destination position ARM: OMAP2+: DMA: Workaround for invalid source position
2012-01-09tracing/mm: Move include of trace/events/kmem.h out of header into slab.cSteven Rostedt
Including trace/events/*.h TRACE_EVENT() macro headers in other headers can cause strange side effects if another trace/event/*.h header includes that header. Having trace/events/kmem.h inside slab_def.h caused a compile error in sparc64 when changes were done to some header files. Moving the kmem.h trace header out of slab.h and into slab.c fixes the problem. Note, both slub.c and slob.c already include the trace/events/kmem.h file. Only slab.c had it missing. Link: http://lkml.kernel.org/r/20120105190405.1e3191fb5a43b2a0f1655e1f@canb.auug.org.au Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-01-09igmp: Avoid zero delay when receiving odd mixture of IGMP queriesBen Hutchings
Commit 5b7c84066733c5dfb0e4016d939757b38de189e4 ('ipv4: correct IGMP behavior on v3 query during v2-compatibility mode') added yet another case for query parsing, which can result in max_delay = 0. Substitute a value of 1, as in the usual v3 case. Reported-by: Simon McVittie <smcv@debian.org> References: http://bugs.debian.org/654876 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09netdev: make net_device_ops conststephen hemminger
More drivers where net_device_ops should be const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09bcm63xx: make ethtool_ops conststephen hemminger
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09usbnet: make ethtool_ops conststephen hemminger
The ethtool_ops table of function pointers should be const. Fix all the usb network drivers. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09net: Fix build with INET disabled.David S. Miller
> net/core/sock.c: In function 'sk_update_clone': > net/core/sock.c:1278:3: error: implicit declaration of function 'sock_update_memcg' Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-09ceph: d_alloc_root() may failAl Viro
... and ceph_init_dentry(NULL) will oops Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09Merge branch 'for-3.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: percpu: Remove irqsafe_cpu_xxx variants Fix up conflict in arch/x86/include/asm/percpu.h due to clash with cebef5beed3d ("x86: Fix and improve percpu_cmpxchg{8,16}b_double()") which edited the (now removed) irqsafe_cpu_cmpxchg*_double code.
2012-01-09Merge branch 'for-3.3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup * 'for-3.3' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits) cgroup: fix to allow mounting a hierarchy by name cgroup: move assignement out of condition in cgroup_attach_proc() cgroup: Remove task_lock() from cgroup_post_fork() cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end() cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static cgroup: only need to check oldcgrp==newgrp once cgroup: remove redundant get/put of task struct cgroup: remove redundant get/put of old css_set from migrate cgroup: Remove unnecessary task_lock before fetching css_set on migration cgroup: Drop task_lock(parent) on cgroup_fork() cgroups: remove redundant get/put of css_set from css_set_check_fetched() resource cgroups: remove bogus cast cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task() cgroup, cpuset: don't use ss->pre_attach() cgroup: don't use subsys->can_attach_task() or ->attach_task() cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() cgroup: improve old cgroup handling in cgroup_attach_proc() cgroup: always lock threadgroup during migration threadgroup: extend threadgroup_lock() to cover exit and exec threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem ... Fix up conflict in kernel/cgroup.c due to commit e0197aae59e5: "cgroups: fix a css_set not found bug in cgroup_attach_proc" that already mentioned that the bug is fixed (differently) in Tejun's cgroup patchset. This one, in other words.
2012-01-09ext4: fix failure exitsAl Viro
a) leaking root dentry is bad b) in case of failed ext4_mb_init() we don't want to do ext4_mb_release() c) OTOH, in the same case we *do* want ext4_ext_release() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-09Merge branch 'for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2/3/4: delete unneeded includes of module.h ext{3,4}: Fix potential race when setversion ioctl updates inode udf: Mark LVID buffer as uptodate before marking it dirty ext3: Don't warn from writepage when readonly inode is spotted after error jbd: Remove j_barrier mutex reiserfs: Force inode evictions before umount to avoid crash reiserfs: Fix quota mount option parsing udf: Treat symlink component of type 2 as / udf: Fix deadlock when converting file from in-ICB one to normal one udf: Cleanup calling convention of inode_getblk() ext2: Fix error handling on inode bitmap corruption ext3: Fix error handling on inode bitmap corruption ext3: replace ll_rw_block with other functions ext3: NULL dereference in ext3_evict_inode() jbd: clear revoked flag on buffers before a new transaction started ext3: call ext3_mark_recovery_complete() when recovery is really needed