<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/kernel/workqueue.c, branch v4.6-rc7</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v4.6-rc7</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v4.6-rc7'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2016-04-27T19:03:59Z</updated>
<entry>
<title>Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq</title>
<updated>2016-04-27T19:03:59Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-04-27T19:03:59Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=b75a2bf899b668b1d52de8846aafdbcf81349c73'/>
<id>urn:sha1:b75a2bf899b668b1d52de8846aafdbcf81349c73</id>
<content type='text'>
Pull workqueue fix from Tejun Heo:
 "So, it turns out we had a silly bug in the most fundamental part of
  workqueue for a very long time.  AFAICS, this dates back to pre-git
  era and has quite likely been there from the time workqueue was first
  introduced.

  A work item uses its PENDING bit to synchronize multiple queuers.
  Anyone who wins the PENDING bit owns the pending state of the work
  item.  Whether a queuer wins or loses the race, one thing should be
  guaranteed - there will soon be at least one execution of the work
  item - where "after" means that the execution instance would be able
  to see all the changes that the queuer has made prior to the queueing
  attempt.

  Unfortunately, we were missing a smp_mb() after clearing PENDING for
  execution, so nothing guaranteed visibility of the changes that a
  queueing loser has made, which manifested as a reproducible blk-mq
  stall.

  Lots of kudos to Roman for debugging the problem.  The patch for
  -stable is the minimal one.  For v3.7, Peter is working on a patch to
  make the code path slightly more efficient and less fragile"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: fix ghost PENDING flag while doing MQ IO
</content>
</entry>
<entry>
<title>workqueue: fix ghost PENDING flag while doing MQ IO</title>
<updated>2016-04-26T15:23:22Z</updated>
<author>
<name>Roman Pen</name>
<email>roman.penyaev@profitbricks.com</email>
</author>
<published>2016-04-26T11:15:35Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=346c09f80459a3ad97df1816d6d606169a51001a'/>
<id>urn:sha1:346c09f80459a3ad97df1816d6d606169a51001a</id>
<content type='text'>
The bug in a workqueue leads to a stalled IO request in MQ ctx-&gt;rq_list
with the following backtrace:

[  601.347452] INFO: task kworker/u129:5:1636 blocked for more than 120 seconds.
[  601.347574]       Tainted: G           O    4.4.5-1-storage+ #6
[  601.347651] "echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  601.348142] kworker/u129:5  D ffff880803077988     0  1636      2 0x00000000
[  601.348519] Workqueue: ibnbd_server_fileio_wq ibnbd_dev_file_submit_io_worker [ibnbd_server]
[  601.348999]  ffff880803077988 ffff88080466b900 ffff8808033f9c80 ffff880803078000
[  601.349662]  ffff880807c95000 7fffffffffffffff ffffffff815b0920 ffff880803077ad0
[  601.350333]  ffff8808030779a0 ffffffff815b01d5 0000000000000000 ffff880803077a38
[  601.350965] Call Trace:
[  601.351203]  [&lt;ffffffff815b0920&gt;] ? bit_wait+0x60/0x60
[  601.351444]  [&lt;ffffffff815b01d5&gt;] schedule+0x35/0x80
[  601.351709]  [&lt;ffffffff815b2dd2&gt;] schedule_timeout+0x192/0x230
[  601.351958]  [&lt;ffffffff812d43f7&gt;] ? blk_flush_plug_list+0xc7/0x220
[  601.352208]  [&lt;ffffffff810bd737&gt;] ? ktime_get+0x37/0xa0
[  601.352446]  [&lt;ffffffff815b0920&gt;] ? bit_wait+0x60/0x60
[  601.352688]  [&lt;ffffffff815af784&gt;] io_schedule_timeout+0xa4/0x110
[  601.352951]  [&lt;ffffffff815b3a4e&gt;] ? _raw_spin_unlock_irqrestore+0xe/0x10
[  601.353196]  [&lt;ffffffff815b093b&gt;] bit_wait_io+0x1b/0x70
[  601.353440]  [&lt;ffffffff815b056d&gt;] __wait_on_bit+0x5d/0x90
[  601.353689]  [&lt;ffffffff81127bd0&gt;] wait_on_page_bit+0xc0/0xd0
[  601.353958]  [&lt;ffffffff81096db0&gt;] ? autoremove_wake_function+0x40/0x40
[  601.354200]  [&lt;ffffffff81127cc4&gt;] __filemap_fdatawait_range+0xe4/0x140
[  601.354441]  [&lt;ffffffff81127d34&gt;] filemap_fdatawait_range+0x14/0x30
[  601.354688]  [&lt;ffffffff81129a9f&gt;] filemap_write_and_wait_range+0x3f/0x70
[  601.354932]  [&lt;ffffffff811ced3b&gt;] blkdev_fsync+0x1b/0x50
[  601.355193]  [&lt;ffffffff811c82d9&gt;] vfs_fsync_range+0x49/0xa0
[  601.355432]  [&lt;ffffffff811cf45a&gt;] blkdev_write_iter+0xca/0x100
[  601.355679]  [&lt;ffffffff81197b1a&gt;] __vfs_write+0xaa/0xe0
[  601.355925]  [&lt;ffffffff81198379&gt;] vfs_write+0xa9/0x1a0
[  601.356164]  [&lt;ffffffff811c59d8&gt;] kernel_write+0x38/0x50

The underlying device is a null_blk, with default parameters:

  queue_mode    = MQ
  submit_queues = 1

Verification that nullb0 has something inflight:

root@pserver8:~# cat /sys/block/nullb0/inflight
       0        1
root@pserver8:~# find /sys/block/nullb0/mq/0/cpu* -name rq_list -print -exec cat {} \;
...
/sys/block/nullb0/mq/0/cpu2/rq_list
CTX pending:
        ffff8838038e2400
...

During debug it became clear that stalled request is always inserted in
the rq_list from the following path:

   save_stack_trace_tsk + 34
   blk_mq_insert_requests + 231
   blk_mq_flush_plug_list + 281
   blk_flush_plug_list + 199
   wait_on_page_bit + 192
   __filemap_fdatawait_range + 228
   filemap_fdatawait_range + 20
   filemap_write_and_wait_range + 63
   blkdev_fsync + 27
   vfs_fsync_range + 73
   blkdev_write_iter + 202
   __vfs_write + 170
   vfs_write + 169
   kernel_write + 56

So blk_flush_plug_list() was called with from_schedule == true.

If from_schedule is true, that means that finally blk_mq_insert_requests()
offloads execution of __blk_mq_run_hw_queue() and uses kblockd workqueue,
i.e. it calls kblockd_schedule_delayed_work_on().

That means, that we race with another CPU, which is about to execute
__blk_mq_run_hw_queue() work.

Further debugging shows the following traces from different CPUs:

  CPU#0                                  CPU#1
  ----------------------------------     -------------------------------
  reqeust A inserted
  STORE hctx-&gt;ctx_map[0] bit marked
  kblockd_schedule...() returns 1
  &lt;schedule to kblockd workqueue&gt;
                                         request B inserted
                                         STORE hctx-&gt;ctx_map[1] bit marked
                                         kblockd_schedule...() returns 0
  *** WORK PENDING bit is cleared ***
  flush_busy_ctxs() is executed, but
  bit 1, set by CPU#1, is not observed

As a result request B pended forever.

This behaviour can be explained by speculative LOAD of hctx-&gt;ctx_map on
CPU#0, which is reordered with clear of PENDING bit and executed _before_
actual STORE of bit 1 on CPU#1.

The proper fix is an explicit full barrier &lt;mfence&gt;, which guarantees
that clear of PENDING bit is to be executed before all possible
speculative LOADS or STORES inside actual work function.

Signed-off-by: Roman Pen &lt;roman.penyaev@profitbricks.com&gt;
Cc: Gioh Kim &lt;gi-oh.kim@profitbricks.com&gt;
Cc: Michael Wang &lt;yun.wang@profitbricks.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq</title>
<updated>2016-03-19T03:05:39Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2016-03-19T03:05:39Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=ef504fa591aae6f6ebdf26edbe6ec0bfd32ea7d3'/>
<id>urn:sha1:ef504fa591aae6f6ebdf26edbe6ec0bfd32ea7d3</id>
<content type='text'>
Pull workqueue updates from Tejun Heo:
 "Three trivial workqueue changes"

* 'for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Fix comment for work_on_cpu()
  sched/core: Get rid of 'cpu' argument in wq_worker_sleeping()
  workqueue: Replace usage of init_name with dev_set_name()
</content>
</entry>
<entry>
<title>tags: Fix DEFINE_PER_CPU expansions</title>
<updated>2016-03-15T23:55:16Z</updated>
<author>
<name>Peter Zijlstra</name>
<email>peterz@infradead.org</email>
</author>
<published>2016-03-15T21:52:49Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=25528213fe9f75f4e286f08d35a73ca2bb634a50'/>
<id>urn:sha1:25528213fe9f75f4e286f08d35a73ca2bb634a50</id>
<content type='text'>
$ make tags
  GEN     tags
ctags: Warning: drivers/acpi/processor_idle.c:64: null expansion of name pattern "\1"
ctags: Warning: drivers/xen/events/events_2l.c:41: null expansion of name pattern "\1"
ctags: Warning: kernel/locking/lockdep.c:151: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:133: null expansion of name pattern "\1"
ctags: Warning: kernel/rcu/rcutorture.c:135: null expansion of name pattern "\1"
ctags: Warning: kernel/workqueue.c:323: null expansion of name pattern "\1"
ctags: Warning: net/ipv4/syncookies.c:53: null expansion of name pattern "\1"
ctags: Warning: net/ipv6/syncookies.c:44: null expansion of name pattern "\1"
ctags: Warning: net/rds/page.c:45: null expansion of name pattern "\1"

Which are all the result of the DEFINE_PER_CPU pattern:

  scripts/tags.sh:200:	'/\&lt;DEFINE_PER_CPU([^,]*, *\([[:alnum:]_]*\)/\1/v/'
  scripts/tags.sh:201:	'/\&lt;DEFINE_PER_CPU_SHARED_ALIGNED([^,]*, *\([[:alnum:]_]*\)/\1/v/'

The below cures them. All except the workqueue one are within reasonable
distance of the 80 char limit. TJ do you have any preference on how to
fix the wq one, or shall we just not care its too long?

Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: David S. Miller &lt;davem@davemloft.net&gt;
Acked-by: Rafael J. Wysocki &lt;rafael.j.wysocki@intel.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: "Paul E. McKenney" &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Fix comment for work_on_cpu()</title>
<updated>2016-03-11T17:39:01Z</updated>
<author>
<name>Anna-Maria Gleixner</name>
<email>anna-maria@linutronix.de</email>
</author>
<published>2016-03-10T11:07:38Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=22aceb317678057dced5f1d6e3ac15acdb863e7b'/>
<id>urn:sha1:22aceb317678057dced5f1d6e3ac15acdb863e7b</id>
<content type='text'>
Function is processed in thread context, not in user context.

Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: Lai Jiangshan &lt;jiangshanlai@gmail.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Anna-Maria Gleixner &lt;anna-maria@linutronix.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>sched/core: Get rid of 'cpu' argument in wq_worker_sleeping()</title>
<updated>2016-03-02T15:28:47Z</updated>
<author>
<name>Alexander Gordeev</name>
<email>agordeev@redhat.com</email>
</author>
<published>2016-03-02T11:53:31Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=9b7f6597f013d449d6700d11820faf91ee0ec985'/>
<id>urn:sha1:9b7f6597f013d449d6700d11820faf91ee0ec985</id>
<content type='text'>
Given that wq_worker_sleeping() could only be called for a
CPU it is running on, we do not need passing a CPU ID as an
argument.

Suggested-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Alexander Gordeev &lt;agordeev@redhat.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: Replace usage of init_name with dev_set_name()</title>
<updated>2016-02-17T21:14:18Z</updated>
<author>
<name>Lars-Peter Clausen</name>
<email>lars@metafoo.de</email>
</author>
<published>2016-02-17T20:04:41Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=23217b443b4b0439c8b55d3be0482d3cd7fbc5ac'/>
<id>urn:sha1:23217b443b4b0439c8b55d3be0482d3cd7fbc5ac</id>
<content type='text'>
The init_name property of the device struct is sort of a hack and should
only be used for statically allocated devices. Since the device is
dynamically allocated here it is safe to use the proper way to set a
devices name by calling dev_set_name().

Signed-off-by: Lars-Peter Clausen &lt;lars@metafoo.de&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup</title>
<updated>2016-02-10T17:13:05Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-02-03T18:54:25Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=d6e022f1d207a161cd88e08ef0371554680ffc46'/>
<id>urn:sha1:d6e022f1d207a161cd88e08ef0371554680ffc46</id>
<content type='text'>
When looking up the pool_workqueue to use for an unbound workqueue,
workqueue assumes that the target CPU is always bound to a valid NUMA
node.  However, currently, when a CPU goes offline, the mapping is
destroyed and cpu_to_node() returns NUMA_NO_NODE.

This has always been broken but hasn't triggered often enough before
874bbfe600a6 ("workqueue: make sure delayed work run in local cpu").
After the commit, workqueue forcifully assigns the local CPU for
delayed work items without explicit target CPU to fix a different
issue.  This widens the window where CPU can go offline while a
delayed work item is pending causing delayed work items dispatched
with target CPU set to an already offlined CPU.  The resulting
NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
NULL pool_workqueue and thus crash.

While 874bbfe600a6 has been reverted for a different reason making the
bug less visible again, it can still happen.  Fix it by mapping
NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
This is a temporary workaround.  The long term solution is keeping CPU
-&gt; NODE mapping stable across CPU off/online cycles which is being
worked on.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Cc: Tang Chen &lt;tangchen@cn.fujitsu.com&gt;
Cc: Rafael J. Wysocki &lt;rafael@kernel.org&gt;
Cc: Len Brown &lt;len.brown@intel.com&gt;
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
</content>
</entry>
<entry>
<title>workqueue: implement "workqueue.debug_force_rr_cpu" debug feature</title>
<updated>2016-02-09T22:59:38Z</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-02-09T22:59:38Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=f303fccb82928790ec58eea82722bd5c54d300b3'/>
<id>urn:sha1:f303fccb82928790ec58eea82722bd5c54d300b3</id>
<content type='text'>
Workqueue used to guarantee local execution for work items queued
without explicit target CPU.  The guarantee is gone now which can
break some usages in subtle ways.  To flush out those cases, this
patch implements a debug feature which forces round-robin CPU
selection for all such work items.

The debug feature defaults to off and can be enabled with a kernel
parameter.  The default can be flipped with a debug config option.

If you hit this commit during bisection, please refer to 041bd12e272c
("Revert "workqueue: make sure delayed work run in local cpu"") for
more information and ping me.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
<entry>
<title>workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs</title>
<updated>2016-02-09T22:59:38Z</updated>
<author>
<name>Mike Galbraith</name>
<email>umgwanakikbuti@gmail.com</email>
</author>
<published>2016-02-09T22:59:38Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=ef557180447fa9a7a0affd3abb21ecceb4b5e125'/>
<id>urn:sha1:ef557180447fa9a7a0affd3abb21ecceb4b5e125</id>
<content type='text'>
WORK_CPU_UNBOUND work items queued to a bound workqueue always run
locally.  This is a good thing normally, but not when the user has
asked us to keep unbound work away from certain CPUs.  Round robin
these to wq_unbound_cpumask CPUs instead, as perturbation avoidance
trumps performance.

tj: Cosmetic and comment changes.  WARN_ON_ONCE() dropped from empty
    (wq_unbound_cpumask AND cpu_online_mask).  If we want that, it
    should be done when config changes.

Signed-off-by: Mike Galbraith &lt;umgwanakikbuti@gmail.com&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
</entry>
</feed>
