<feed xmlns='http://www.w3.org/2005/Atom'>
<title>pm24.git/fs/dlm, branch v6.13</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<id>https://git.kobert.dev/pm24.git/atom?h=v6.13</id>
<link rel='self' href='https://git.kobert.dev/pm24.git/atom?h=v6.13'/>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/'/>
<updated>2024-11-18T16:05:57Z</updated>
<entry>
<title>dlm: fix dlm_recover_members refcount on error</title>
<updated>2024-11-18T16:05:57Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-11-18T16:01:49Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=200b977ebbc313a59174ba971006a231b3533dc5'/>
<id>urn:sha1:200b977ebbc313a59174ba971006a231b3533dc5</id>
<content type='text'>
If dlm_recover_members() fails we don't drop the references of the
previous created root_list that holds and keep all rsbs alive during the
recovery. It might be not an unlikely event because ping_members() could
run into an -EINTR if another recovery progress was triggered again.

Fixes: 3a747f4a2ee8 ("dlm: move rsb root_list to ls_recover() stack")
Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>dlm: fix recovery of middle conversions</title>
<updated>2024-11-15T19:39:36Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-11-13T16:46:42Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=f74dacb4c81164d7578a11d5f8b660ad87059e6a'/>
<id>urn:sha1:f74dacb4c81164d7578a11d5f8b660ad87059e6a</id>
<content type='text'>
In one special case, recovery is unable to reliably rebuild
lock state by simply recreating lkb structs as sent from the
lock holders.  That case is when the lkb's include conversions
between PR and CW modes.

The recovery code has always recognized this special case,
but the implemention has always been broken, and would set
invalid modes in recovered lkb's.  Unpredictable or bogus
errors could then be returned for further locking calls on
these locks.

This bug has gone unnoticed for so long due to some
combination of:
- applications never or infrequently converting between PR/CW
- recovery not occuring during these conversions
- if the recovery bug does occur, the caller may not notice,
  depending on what further locking calls are made, e.g. if
  the lock is simply unlocked it may go unnoticed

However, a core analysis from a recent gfs2 bug report points
to this broken code.

PR = Protected Read
CW = Concurrent Write
PR and CW are incompatible
PR and PR are compatible
CW and CW are compatible

Example 1

node C, resource R
granted: PR node A
granted: PR node B
granted: NL node C
granted: NL node D

- A sends convert PR-&gt;CW to C
- C fails before A gets a reply
- recovery occurs

At this point, A does not know if it still holds
the lock in PR, or if its conversion to CW was granted:
- If A's conversion to CW was granted, then another
  node's CW lock may also have been granted.
- If A's conversion to CW was not granted, it still
  holds a PR lock, and other nodes may also hold PR locks.

So, the new master of R cannot simply recreate the lock
from A using granted mode PR and requested mode CW.
The new master must look at all the recovered locks to
determine the correct granted modes, and ensure that all
the recovered locks are recreated in compatible states.

The correct lock recovery steps in this example are:
- node D becomes the new master of R
- node B sends D its lkb, granted PR
- node A sends D its lkb, convert PR-&gt;CW
- D determines the correct lock state is:
  granted: PR node B
  convert: PR-&gt;CW node A

The lkb sent by each node was recreated without
any change on the new master node.

Example 2

node C, resource R
granted: PR node A
granted: NL node C
granted: NL node D
waiting: CW node B

- A sends convert PR-&gt;CW to C
- C grants the conversion to CW for A
- C grants the waiting request for CW to B
- C sends granted message to B, but fails
  before it can send the granted message to A
- B receives the granted message from C

At this point:
- A believes it is converting PR-&gt;CW
- B believes it is holding a CW lock

The correct lock recovery steps in this example are:
- node D becomes the new master of R
- node A sends D its lkb, convert PR-&gt;CW
- node B sends D its lkb, granted CW
- D determins the correct lock state is:
  granted: CW node B
  granted: CW node A

The lkb sent by B is recreated without change,
but the lkb sent by A is changed because the
granted mode was not compatible.

Fixes to make this work correctly:

recover_convert_waiter: should not make any changes
to a converting lkb that is still waiting for a reply
message.  It was previously setting grmode to IV, which
is invalid state, so the lkb would not be handled
correctly by other code.

receive_rcom_lock_args: was checking the wrong lkb field
(wait_type instead of status) to determine if the lkb is
being converted, and in need of inspection for this special
recovery.  It was also setting grmode to IV in the lkb,
causing it to be mishandled by other code.
Now, this function just puts the lkb, directly as sent,
onto the convert queue of the resource being recovered,
and corrects it in recover_conversion() later, if needed.

recover_conversion: the job of this function is to detect
and correct lkb states for the special PR/CW conversions.
The new code now checks for recovered lkbs on the granted
queue with grmode PR or CW, and takes the real grmode from
that.  Then it looks for lkbs on the convert queue with an
incompatible grmode (i.e. grmode PR when the real grmode is
CW, or v.v.)  These converting lkbs need to be fixed.
They are fixed by temporarily setting their grmode to NL,
so that grmodes are not incompatible and won't confuse other
locking code.  The converting lkb will then be granted at
the end of recovery, replacing the temporary NL grmode.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>dlm: make add_to_waiters() that it can't fail</title>
<updated>2024-10-04T15:31:31Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-10-04T15:13:43Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=dfe5a6cc42043d8a9433aea6d2e49887b72bf3ed'/>
<id>urn:sha1:dfe5a6cc42043d8a9433aea6d2e49887b72bf3ed</id>
<content type='text'>
If add_to_waiters() fails we have a problem because the previous called
functions such as validate_lock_args() or validate_unlock_args() sets
specific lkb values that are set for a request, there exists no way back
to revert those changes. When there is a pending lock request the
original request arguments will be overwritten with unknown
consequences.

The good news are that I believe those cases that we fail in
add_to_waiters() can't happen or very unlikely to happen (only if the DLM
user does stupid API things), but if so we have the above mentioned
problem.

There are two conditions that will be removed here. The first one is the
-EINVAL case which contains is_overlap_unlock() or (is_overlap_cancel()
and mstype == DLM_MSG_CANCEL).

The is_overlap_unlock() is missing for the normal UNLOCK case which is
moved to validate_unlock_args(). The is_overlap_cancel() already happens
in validate_unlock_args() when DLM_LKF_CANCEL is set. In case of
validate_lock_args() we check on is_overlap() when it is not a new request,
on a new request the lkb is always new and does not have those values set.

The -EBUSY check can't happen in case as for non new lock requests (when
DLM_LKF_CONVERT is set) we already check in validate_lock_args() for
lkb_wait_type and is_overlap(). Then there is only
validate_unlock_args() that will never hit the default case because
dlm_unlock() will produce DLM_MSG_UNLOCK and DLM_MSG_CANCEL messages.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>dlm: dlm_config_info config fields to unsigned int</title>
<updated>2024-10-04T15:31:31Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-10-04T15:13:42Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=cc5580bca3a4103265a7fa1e081f853319b8aa0f'/>
<id>urn:sha1:cc5580bca3a4103265a7fa1e081f853319b8aa0f</id>
<content type='text'>
We are using kstrtouint() to parse common integer fields. This patch
will switch to use unsigned int instead of int as we are parsing
unsigned integer values.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>dlm: use dlm_config as only cluster configuration</title>
<updated>2024-10-04T15:31:31Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-10-04T15:13:41Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=76e342d32f7fc536939ce60ed92b3f5e4addad0f'/>
<id>urn:sha1:76e342d32f7fc536939ce60ed92b3f5e4addad0f</id>
<content type='text'>
This patch removes the configfs storage fields from the dlm_cluster
structure to store per cluster values. Those fields also exists for the
dlm_config global variable and get stored in both when setting configfs
values. To read values it will always be read out from the dlm_cluster
configfs structure but this patch changes it to only use the global
dlm_config variable. Storing them in two places makes no sense as both
are able to be changed under certain conditions during DLM runtime.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>dlm: handle port as __be16 network byte order</title>
<updated>2024-10-04T15:31:31Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-10-04T15:13:40Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=f92a5be5717ea32830f829b080652c5c862b85e7'/>
<id>urn:sha1:f92a5be5717ea32830f829b080652c5c862b85e7</id>
<content type='text'>
This patch handles the DLM listen port setting internally as byte order
as it is a value that is used as network byte on the wire. The user
space still sets this value as host byte order for configfs as we don't
break UAPI here.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>dlm: disallow different configs nodeid storages</title>
<updated>2024-10-04T15:31:31Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-10-04T15:13:39Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=7138c7903468578f1ae57b1c7eac8b7082862995'/>
<id>urn:sha1:7138c7903468578f1ae57b1c7eac8b7082862995</id>
<content type='text'>
The DLM configfs path has usually a nodeid in it's directory path and
again a file to store the nodeid again in a separate storage. It is
forced that the user space will set both (the directory name and nodeid
file) storage to the same value if it doesn't do that we run in some
kind of broken state.

This patch will simply represent the file storage to it's upper
directory nodeid name. It will force the user now to use a valid
unsigned int as nodeid directory name and will ignore all nodeid writes
in the nodeid file storage as this will now always represent the upper
nodeid directory name.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>dlm: fix possible lkb_resource null dereference</title>
<updated>2024-10-04T15:31:31Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-10-04T15:13:38Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=b98333c67daf887c724cd692e88e2db9418c0861'/>
<id>urn:sha1:b98333c67daf887c724cd692e88e2db9418c0861</id>
<content type='text'>
This patch fixes a possible null pointer dereference when this function is
called from request_lock() as lkb-&gt;lkb_resource is not assigned yet,
only after validate_lock_args() by calling attach_lkb(). Another issue
is that a resource name could be a non printable bytearray and we cannot
assume to be ASCII coded.

The log functionality is probably never being hit when DLM is used in
normal way and no debug logging is enabled. The null pointer dereference
can only occur on a new created lkb that does not have the resource
assigned yet, it probably never hits the null pointer dereference but we
should be sure that other changes might not change this behaviour and we
actually can hit the mentioned null pointer dereference.

In this patch we just drop the printout of the resource name, the lkb id
is enough to make a possible connection to a resource name if this
exists.

Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>dlm: fix swapped args sb_flags vs sb_status</title>
<updated>2024-10-04T15:31:31Z</updated>
<author>
<name>Alexander Aring</name>
<email>aahringo@redhat.com</email>
</author>
<published>2024-10-04T15:13:37Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=6d59f2fbfb18965f76ebcff40ab38da717cde798'/>
<id>urn:sha1:6d59f2fbfb18965f76ebcff40ab38da717cde798</id>
<content type='text'>
The arguments got swapped by commit 986ae3c2a8df ("dlm: fix race between
final callback and remove") fixing this now.

Fixes: 986ae3c2a8df ("dlm: fix race between final callback and remove")
Signed-off-by: Alexander Aring &lt;aahringo@redhat.com&gt;
Signed-off-by: David Teigland &lt;teigland@redhat.com&gt;
</content>
</entry>
<entry>
<title>[tree-wide] finally take no_llseek out</title>
<updated>2024-09-27T15:18:43Z</updated>
<author>
<name>Al Viro</name>
<email>viro@zeniv.linux.org.uk</email>
</author>
<published>2024-09-27T01:56:11Z</published>
<link rel='alternate' type='text/html' href='https://git.kobert.dev/pm24.git/commit/?id=cb787f4ac0c2e439ea8d7e6387b925f74576bdf8'/>
<id>urn:sha1:cb787f4ac0c2e439ea8d7e6387b925f74576bdf8</id>
<content type='text'>
no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")

To quote that commit,

  At -rc1 we'll need do a mechanical removal of no_llseek -

  git grep -l -w no_llseek | grep -v porting.rst | while read i; do
	sed -i '/\&lt;no_llseek\&gt;/d' $i
  done

  would do it.

Unfortunately, that hadn't been done.  Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
	.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
</entry>
</feed>
