Age | Commit message (Collapse) | Author |
|
Implement the per-device PM QoS constraints by creating a device
PM QoS API, which calls the PM QoS constraints management core code.
The per-device latency constraints data strctures are stored
in the device dev_pm_info struct.
The device PM code calls the init and destroy of the per-device constraints
data struct in order to support the dynamic insertion and removal of the
devices in the system.
To minimize the data usage by the per-device constraints, the data struct
is only allocated at the first call to dev_pm_qos_add_request.
The data is later free'd when the device is removed from the system.
A global mutex protects the constraints users from the data being
allocated and free'd.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
In preparation for the per-device constratins support:
- rename update_target to pm_qos_update_target
- generalize and export pm_qos_update_target for usage by the upcoming
per-device latency constraints framework:
* operate on struct pm_qos_constraints for constraints management,
* introduce an 'action' parameter for constraints add/update/remove,
* the return value indicates if the aggregated constraint value has
changed,
- update the internal code to operate on struct pm_qos_constraints
- add a NULL pointer check in the API functions
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
In preparation for the per-device constratins support, re-organize
the data strctures:
- add a struct pm_qos_constraints which contains the constraints
related data
- update struct pm_qos_object contents to the PM QoS internal object
data. Add a pointer to struct pm_qos_constraints
- update the internal code to use the new data structs.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
- Misc fixes to improve code readability:
* rename struct pm_qos_request_list to struct pm_qos_request,
* rename pm_qos_req parameter to req in internal code,
consistenly use req in the API parameters,
* update the in-kernel API callers to the new parameters names,
* rename of fields names (requests, list, node, constraints)
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
The PM QoS implementation files are better named
kernel/power/qos.c and include/linux/pm_qos.h.
The PM QoS support is compiled under the CONFIG_PM option.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Since the PM clock management code in drivers/base/power/clock_ops.c
is used for both runtime PM and system suspend/hibernation, the
definitions of data structures and headers related to it should not
be located in include/linux/pm_rumtime.h. Move them to a separate
header file.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Currently pm_genpd_runtime_resume() has to walk the list of devices
from the device's PM domain to find the corresponding device list
object containing the need_restore field to check if the driver's
.runtime_resume() callback should be executed for the device.
This is suboptimal and can be simplified by using power.sybsys_data
to store device information used by the generic PM domains code.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Since the power.subsys_data device field will be used by multiple
filesystems, introduce a reference counting mechanism for it to avoid
freeing it prematurely or changing its value at a wrong time.
Make the PM clocks management code that currently is the only user of
power.subsys_data use the new reference counting.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Introduce struct pm_subsys_data that may be subclassed by subsystems
to store subsystem-specific information related to the device. Move
the clock management fields accessed through the power.subsys_data
pointer in struct device to the new strucutre.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Since it is now possible for a PM domain to have multiple masters
instead of one parent, rename the "wait for parent" status to reflect
the new situation.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Currently, for a given generic PM domain there may be only one parent
domain (i.e. a PM domain it depends on). However, there is at least
one real-life case in which there should be two parents (masters) for
one PM domain (the A3RV domain on SH7372 turns out to depend on the
A4LC domain and it depends on the A4R domain and the same time). For
this reason, allow a PM domain to have multiple parents (masters) by
introducing objects representing links between PM domains.
The (logical) links between PM domains represent relationships in
which one domain is a master (i.e. it is depended on) and another
domain is a slave (i.e. it depends on the master) with the rule that
the slave cannot be powered on if the master is not powered on and
the master cannot be powered off if the slave is not powered off.
Each struct generic_pm_domain object representing a PM domain has
two lists of links, a list of links in which it is a master and
a list of links in which it is a slave. The first of these lists
replaces the list of subdomains and the second one is used in place
of the parent pointer.
Each link is represented by struct gpd_link object containing
pointers to the master and the slave and two struct list_head
members allowing it to hook into two lists (the master's list
of "master" links and the slave's list of "slave" links). This
allows the code to get to the link from each side (either from
the master or from the slave) and follow it in each direction.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
The next patch will make it possible for a generic PM domain to have
multiple parents (i.e. multiple PM domains it depends on). To
prepare for that change it is necessary to change pm_genpd_poweron()
so that it doesn't jump to the start label after running itself
recursively for the parent domain. For this purpose, introduce a new
PM domain status value GPD_STATE_WAIT_PARENT that will be set by
pm_genpd_poweron() before calling itself recursively for the parent
domain and modify the code in drivers/base/power/domain.c so that
the GPD_STATE_WAIT_PARENT status is guaranteed to be preserved during
the execution of pm_genpd_poweron() for the parent.
This change also causes pm_genpd_add_subdomain() and
pm_genpd_remove_subdomain() to wait for started pm_genpd_poweron() to
complete and allows pm_genpd_runtime_resume() to avoid dropping the
lock after powering on the PM domain.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
Currently, pm_genpd_poweron() and pm_genpd_poweroff() need to take
the parent PM domain's lock in order to modify the parent's counter
of active subdomains in a nonracy way. This causes the locking to be
considerably complex and in fact is not necessary, because the
subdomain counters may be implemented as atomic fields and they
won't have to be modified under a lock.
Replace the unsigned in sd_count field in struct generic_pm_domain
by an atomic_t one and modify the code in drivers/base/power/domain.c
to take this change into account.
This patch doesn't change the locking yet, that is going to be done
in a separate subsequent patch.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: check size of FUSE_NOTIFY_INVAL_ENTRY message
fuse: mark pages accessed when written to
fuse: delete dead .write_begin and .write_end aops
fuse: fix flock
fuse: fix non-ANSI void function notation
|
|
Cleaning up the code a little bit. attempt_plug_merge() traverses the plug
list anyway, we can do the request counting there, so stack size is reduced
a little bit.
The motivation here is I suspect if we should count the requests for each
queue (task could handle multiple disks in the meantime), but my test doesn't
show it's worthy doing. If somebody proves we should do it, below change
will make that more easier.
Signed-off-by: Shaohua Li <shli@kernel.org>
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
tty_operations->remove is normally called like:
queue_release_one_tty
->tty_shutdown
->tty_driver_remove_tty
->tty_operations->remove
However tty_shutdown() is called from queue_release_one_tty() only if
tty_operations->shutdown is NULL. But for pty, it is not.
pty_unix98_shutdown() is used there as ->shutdown.
So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
called. This results in invalid pty_count. I.e. what can be seen in
/proc/sys/kernel/pty/nr.
I see this was already reported at:
https://lkml.org/lkml/2009/11/5/370
But it was not fixed since then.
This patch is kind of a hackish way. The problem lies in ->install. We
allocate there another tty (so-called tty->link). So ->install is
called once, but ->remove twice, for both tty and tty->link. The fix
here is to count both tty and tty->link and divide the count by 2 for
user.
And to have ->remove called, let's make tty_driver_remove_tty() global
and call that from pty_unix98_shutdown() (tty_operations->shutdown).
While at it, let's document that when ->shutdown is defined,
tty_shutdown() is not called.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
and lave REQ_META purely for marking requests as metadata in blktrace.
All existing callers of REQ_META except for XFS are updated to also
set REQ_PRIO for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Replace all occurnanced of the undocumented READ_META with READ | REQ_META
and remove the unused WRITE_META define.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
Certain platform specific or Host-WiLink Interface specific actions would be
required to be taken when the chip is being enabled and after the chip is
disabled such as configuration of the mux modes for the GPIO of host connected
to the nshutdown of the chip or relinquishing UART after the chip is disabled.
Similar actions can also be taken when the chip is in deep sleep or when the
chip is awake. Performance enhancements such as configuring the host to run
faster when chip is awake and slower when chip is asleep can also be made
here.
Signed-off-by: Pavan Savoy <pavan_savoy@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
This patch (as1482) adds a macro for testing whether or not a
pm_message value represents an autosuspend or autoresume (i.e., a
runtime PM) event. Encapsulating this notion seems preferable to
open-coding the test all over the place.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
Revert "cfq: Remove special treatment for metadata rqs."
block: fix flush machinery for stacking drivers with differring flush flags
block: improve rq_affinity placement
blktrace: add FLUSH/FUA support
Move some REQ flags to the common bio/request area
allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
xen/blkback: Make description more obvious.
cfq-iosched: Add documentation about idling
block: Make rq_affinity = 1 work as expected
block: swim3: fix unterminated of_device_id table
block/genhd.c: remove useless cast in diskstats_show()
drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
bsg-lib: add module.h include
cfq-iosched: Reduce linked group count upon group destruction
blk-throttle: correctly determine sync bio
loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
loop: add management interface for on-demand device allocation
loop: replace linked list of allocated devices with an idr index
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
PCI: OF: Don't crash when bridge parent is NULL.
PCI: export pcie_bus_configure_settings symbol
PCI: code and comments cleanup
PCI: make cardbus-bridge resources optional
PCI: make SRIOV resources optional
PCI : ability to relocate assigned pci-resources
PCI: honor child buses add_size in hot plug configuration
PCI: Set PCI-E Max Payload Size on fabric
|
|
Revert the pass-good area introduced in ffd1f609ab10 ("writeback:
introduce max-pause and pass-good dirty limits") and make the max-pause
area smaller and safe.
This fixes ~30% performance regression in the ext3 data=writeback
fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.
Using deadline scheduler also has a regression, but not that big as CFQ,
so this suggests we have some write starvation.
The test logs show that
- the disks are sometimes under utilized
- global dirty pages sometimes rush high to the pass-good area for
several hundred seconds, while in the mean time some bdi dirty pages
drop to very low value (bdi_dirty << bdi_thresh). Then suddenly the
global dirty pages dropped under global dirty threshold and bdi_dirty
rush very high (for example, 2 times higher than bdi_thresh). During
which time balance_dirty_pages() is not called at all.
So the problems are
1) The random writes progress so slow that they break the assumption of
the max-pause logic that "8 pages per 200ms is typically more than
enough to curb heavy dirtiers".
2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
and then (bdi_thresh >> bdi_dirty) for others.
3) The higher max-pause/pass-good thresholds somehow leads to the bad
swing of dirty pages.
The fix is to allow the task to slightly dirty over task_bdi_thresh, but
no way to exceed bdi_dirty and/or global dirty_thresh.
Tests show that it fixed the JBOD regression completely (both behavior
and performance), while still being able to cut down large pause times
in balance_dirty_pages() for single-disk cases.
Reported-by: Li Shaohua <shaohua.li@intel.com>
Tested-by: Li Shaohua <shaohua.li@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / Domains: Fix build for CONFIG_PM_RUNTIME unset
|
|
This allows the cast in lowmem_page_address (introduced as a warning
fixup to 33dd4e0ec911 "mm: make some struct page's const") to be
removed.
Propagate const'ness to page_to_section() as well since it is required
by __page_to_pfn.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Followup to 33dd4e0ec911 "mm: make some struct page's const" which missed the
HASHED_PAGE_VIRTUAL case.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rtc: Limit RTC PIE frequency
rtc: Fix hrtimer deadlock
rtc: Handle errors correctly in rtc_irq_set_state()
Fixup trivial conflicts in drivers/rtc/interface.c due to slightly
trivially versions of the same patch coming in two different ways.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
irq: Track the owner of irq descriptor
irq: Always set IRQF_ONESHOT if no primary handler is specified
genirq: Fix wrong bit operation
|
|
Commit ae1b1539622fb46e51b4d13b3f9e5f4c713f86ae, block: reimplement
FLUSH/FUA to support merge, introduced a performance regression when
running any sort of fsyncing workload using dm-multipath and certain
storage (in our case, an HP EVA). The test I ran was fs_mark, and it
dropped from ~800 files/sec on ext4 to ~100 files/sec. It turns out
that dm-multipath always advertised flush+fua support, and passed
commands on down the stack, where those flags used to get stripped off.
The above commit changed that behavior:
static inline struct request *__elv_next_request(struct request_queue *q)
{
struct request *rq;
while (1) {
- while (!list_empty(&q->queue_head)) {
+ if (!list_empty(&q->queue_head)) {
rq = list_entry_rq(q->queue_head.next);
- if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
- (rq->cmd_flags & REQ_FLUSH_SEQ))
- return rq;
- rq = blk_do_flush(q, rq);
- if (rq)
- return rq;
+ return rq;
}
Note that previously, a command would come in here, have
REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:
struct request *blk_do_flush(struct request_queue *q, struct request *rq)
{
unsigned int fflags = q->flush_flags; /* may change, cache it */
bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
REQ_FUA);
unsigned skip = 0;
...
if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
rq->cmd_flags &= ~REQ_FLUSH;
if (!has_fua)
rq->cmd_flags &= ~REQ_FUA;
return rq;
}
So, the flush machinery was bypassed in such cases (q->flush_flags == 0
&& rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).
Now, however, we don't get into the flush machinery at all. Instead,
__elv_next_request just hands a request with flush and fua bits set to
the scsi_request_fn, even if the underlying request_queue does not
support flush or fua.
The agreed upon approach is to fix the flush machinery to allow
stacking. While this isn't used in practice (since there is only one
request-based dm target, and that target will now reflect the flush
flags of the underlying device), it does future-proof the solution, and
make it function as designed.
In order to make this work, I had to add a field to the struct request,
inside the flush structure (to store the original req->end_io). Shaohua
had suggested overloading the union with rb_node and completion_data,
but the completion data is used by device mapper and can also be used by
other drivers. So, I didn't see a way around the additional field.
I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
the lost performance. Comments and other testers, as always, are
appreciated.
Cheers,
Jeff
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
mmc: remove unused "ddr" parameter in struct mmc_ios
mmc: dw_mmc: Fix DDR mode support.
mmc: core: use defined R1_STATE_PRG macro for card status
mmc: sdhci: use f_max instead of host->clock for timeouts
mmc: sdhci: move timeout_clk calculation farther down
mmc: sdhci: check host->clock before using it as a denominator
mmc: Revert "mmc: sdhci: Fix SDHCI_QUIRK_TIMEOUT_USES_SDCLK"
mmc: tmio: eliminate unused variable 'mmc' warning
mmc: esdhc-imx: fix card interrupt loss on freescale eSDHC
mmc: sdhci-s3c: Fix build for header change
mmc: dw_mmc: Fix mask in IDMAC_SET_BUFFER1_SIZE macro
mmc: cb710: fix possible pci_dev leak in cb710_pci_configure()
mmc: core: Detect eMMC v4.5 ext_csd entries
mmc: mmc_test: avoid stalled file in debugfs
mmc: sdhci-s3c: add BROKEN_ADMA_ZEROLEN_DESC quirk
mmc: sdhci: pxav3: controller needs 32 bit ADMA addressing
mmc: sdhci: fix retuning timer wrongly deleted in sdhci_tasklet_finish
|
|
Function genpd_queue_power_off_work() is not defined for
CONFIG_PM_RUNTIME, so pm_genpd_poweroff_unused() causes a build
error to happen in that case. Fix the problem by making
pm_genpd_poweroff_unused() depend on CONFIG_PM_RUNTIME too.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
|
|
"mmc: dw_mmc: Fix DDR mode support" removed the last user.
Signed-off-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>
|
|
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
dt: add empty of_get_property for non-dt
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
e1000e: increase driver version number
e1000e: alternate MAC address update
e1000e: do not disable receiver on 82574/82583
e1000e: alternate MAC address does not work on device id 0x1060
PCnet: Fix section mismatch
bnx2x: disable dcb on 578xx since not supported yet
bnx2x: properly clean indirect addresses
bnx2x: prevent race between undi_unload and load flows
bnx2x: fix select_queue when FCoE is disabled
bnx2x: init FCOE FP only once
ipv4: some rt_iif -> rt_route_iif conversions
net/bridge/netfilter/ebtables.c: use available error handling code
net/netlabel/netlabel_kapi.c: add missing cleanup code
net/irda: sh_sir: tidyup compile warning
net/irda: sh_sir: add missing header
net/irda: sh_irda: add missing header
slcan: ldisc generated skbs are received in softirq context
scm: Capture the full credentials of the scm sender
tcp: initialize variable ecn_ok in syncookies path
drivers/net/wireless/wl1251: add missing kfree
...
|
|
The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
check in set_user() to check for NPROC exceeding via setuid() and
similar functions.
Before the check there was a possibility to greatly exceed the allowed
number of processes by an unprivileged user if the program relied on
rlimit only. But the check created new security threat: many poorly
written programs simply don't check setuid() return code and believe it
cannot fail if executed with root privileges. So, the check is removed
in this patch because of too often privilege escalations related to
buggy programs.
The NPROC can still be enforced in the common code flow of daemons
spawning user processes. Most of daemons do fork()+setuid()+execve().
The check introduced in execve() (1) enforces the same limit as in
setuid() and (2) doesn't create similar security issues.
Neil Brown suggested to track what specific process has exceeded the
limit by setting PF_NPROC_EXCEEDED process flag. With the change only
this process would fail on execve(), and other processes' execve()
behaviour is not changed.
Solar Designer suggested to re-check whether NPROC limit is still
exceeded at the moment of execve(). If the process was sleeping for
days between set*uid() and execve(), and the NPROC counter step down
under the limit, the defered execve() failure because NPROC limit was
exceeded days ago would be unexpected. If the limit is not exceeded
anymore, we clear the flag on successful calls to execve() and fork().
The flag is also cleared on successful calls to set_user() as the limit
was exceeded for the previous user, not the current one.
Similar check was introduced in -ow patches (without the process flag).
v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().
Reviewed-by: James Morris <jmorris@namei.org>
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
FUA follows WRITE, use the same 'F' flag for both cases and
distinguish them by their (relative) position. The end results
look like (other flags might be shown also):
- WRITE: W
- WRITE_FLUSH: FW
- WRITE_FUA: WF
- WRITE_FLUSH_FUA: FWF
Note that we reuse TC_BARRIER due to lack of bit space of act_mask
so that the older versions of blktrace tools will report flush
requests as barriers from now on.
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as
on a request, so relocate them to the shared part of the enum.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
|
|
The patch adds empty function of_get_property for non-dt build, so that
drivers migrating to dt can save some '#ifdef CONFIG_OF'.
This also fixes the current Tegra compile problem in linux-next.
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
In commit 2efaca927f5c ("mm/futex: fix futex writes on archs with SW
tracking of dirty & young") we forgot about MMU=n. This patch fixes
that.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/1311761831.24752.413.camel@twins
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Avoid annoying warnings from these functions ("discards qualifiers")
because they assign 'current_cred()' to a non-const pointer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 3295514841c2 ("fix rcu annotations noise in cred.h") accidentally
dropped the const of current->cred inside current_cred() by the
insertion of a cast to deal with an RCU annotation loss warning from
sparce.
Use an appropriate RCU wrapper instead so as not to lose the const.
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a
number of issues with supporing flock locks over existing POSIX
locking infrastructure:
- it's not backward compatible, passing flock(2) calls to userspace
unconditionally (if userspace sets FUSE_POSIX_LOCKS)
- it doesn't cater for the fact that flock locks are automatically
unlocked on file release
- it doesn't take into account the fact that flock exclusive locks
(write locks) don't need an fd opened for write.
The last one invalidates the original premise of the patch that flock
locks can be emulated with POSIX locks.
This patch fixes the first two issues. The last one needs to be fixed
in userspace if the filesystem assumed that a write lock will happen
only on a file operned for write (as in the case of the current fuse
library).
Reported-by: Sebastian Pipping <webmaster@hartwork.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
|
|
Currently userland will barf when including linux/netlink.h unless it
precisely includes sys/socket.h first. The issue is where the
definition of "sa_family_t" comes from.
We've been back and forth on how to fix this issue in the past, see:
http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621
http://thread.gmane.org/gmane.linux.network/143380
Ben Hutchings suggested we take a hint from how we handle the
sockaddr_storage type. First we define a "__kernel_sa_family_t"
to linux/socket.h that is always defined.
Then if __KERNEL__ is defined, we also define "sa_family_t" as
equal to "__kernel_sa_family_t".
Then in places like linux/netlink.h we use __kernel_sa_family_t
in user visible datastructures.
Reported-by: Michel Machado <michel@digirati.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
task->cred is declared as __rcu, and access to other tasks' ->cred is,
indeed, protected. Access to current->cred does not need rcu_dereference()
at all, since only the task itself can change its ->cred. sparse, of
course, has no way of knowing that...
Add force-cast in current_cred(), make current_fsuid() et.al. use it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* 'for-linus' of git://git.open-osd.org/linux-open-osd:
ore: Make ore its own module
exofs: Rename raid engine from exofs/ios.c => ore
exofs: ios: Move to a per inode components & device-table
exofs: Move exofs specific osd operations out of ios.c
exofs: Add offset/length to exofs_get_io_state
exofs: Fix truncate for the raid-groups case
exofs: Small cleanup of exofs_fill_super
exofs: BUG: Avoid sbi realloc
exofs: Remove pnfs-osd private definitions
nfs_xdr: Move nfs4_string definition out of #ifdef CONFIG_NFS_V4
|
|
The inode structure layout is largely random, and some of the vfs paths
really do care. The path lookup in particular is already quite D$
intensive, and profiles show that accessing the 'inode->i_op->xyz'
fields is quite costly.
We already optimized the dcache to not unnecessarily load the d_op
structure for members that are often NULL using the DCACHE_OP_xyz bits
in dentry->d_flags, and this does something very similar for the inode
ops that are used during pathname lookup.
It also re-orders the fields so that the fields accessed by 'stat' are
together at the beginning of the inode structure, and roughly in the
order accessed.
The effect of this seems to be in the 1-2% range for an empty kernel
"make -j" run (which is fairly kernel-intensive, mostly in filename
lookup), so it's visible. The numbers are fairly noisy, though, and
likely depend a lot on exact microarchitecture. So there's more tuning
to be done.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Gcc tends to generate better code with small integers, including the
DCACHE_xyz flag tests - so move the common ones to be first in the list.
Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
tree.
And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
common case to be a nice straight-line fall-through.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: Compute protocol sequence numbers and fragment IDs using MD5.
crypto: Move md5_transform to lib/md5.c
|
|
Computers have become a lot faster since we compromised on the
partial MD4 hash which we use currently for performance reasons.
MD5 is a much safer choice, and is inline with both RFC1948 and
other ISS generators (OpenBSD, Solaris, etc.)
Furthermore, only having 24-bits of the sequence number be truly
unpredictable is a very serious limitation. So the periodic
regeneration and 8-bit counter have been removed. We compute and
use a full 32-bit sequence number.
For ipv6, DCCP was found to use a 32-bit truncated initial sequence
number (it needs 43-bits) and that is fixed here as well.
Reported-by: Dan Kaminsky <dan@doxpara.com>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
We are going to use this for TCP/IP sequence number and fragment ID
generation.
Signed-off-by: David S. Miller <davem@davemloft.net>
|