pm24.git/fs/dlm, branch v6.13

dlm: fix dlm_recover_members refcount on error

2024-11-18T16:05:57Z

If dlm_recover_members() fails we don't drop the references of the previous created root_list that holds and keep all rsbs alive during the recovery. It might be not an unlikely event because ping_members() could run into an -EINTR if another recovery progress was triggered again. Fixes: 3a747f4a2ee8 ("dlm: move rsb root_list to ls_recover() stack") Signed-off-by: Alexander Aring Signed-off-by: David Teigland

dlm: fix recovery of middle conversions

2024-11-15T19:39:36Z

In one special case, recovery is unable to reliably rebuild lock state by simply recreating lkb structs as sent from the lock holders. That case is when the lkb's include conversions between PR and CW modes. The recovery code has always recognized this special case, but the implemention has always been broken, and would set invalid modes in recovered lkb's. Unpredictable or bogus errors could then be returned for further locking calls on these locks. This bug has gone unnoticed for so long due to some combination of: - applications never or infrequently converting between PR/CW - recovery not occuring during these conversions - if the recovery bug does occur, the caller may not notice, depending on what further locking calls are made, e.g. if the lock is simply unlocked it may go unnoticed However, a core analysis from a recent gfs2 bug report points to this broken code. PR = Protected Read CW = Concurrent Write PR and CW are incompatible PR and PR are compatible CW and CW are compatible Example 1 node C, resource R granted: PR node A granted: PR node B granted: NL node C granted: NL node D - A sends convert PR->CW to C - C fails before A gets a reply - recovery occurs At this point, A does not know if it still holds the lock in PR, or if its conversion to CW was granted: - If A's conversion to CW was granted, then another node's CW lock may also have been granted. - If A's conversion to CW was not granted, it still holds a PR lock, and other nodes may also hold PR locks. So, the new master of R cannot simply recreate the lock from A using granted mode PR and requested mode CW. The new master must look at all the recovered locks to determine the correct granted modes, and ensure that all the recovered locks are recreated in compatible states. The correct lock recovery steps in this example are: - node D becomes the new master of R - node B sends D its lkb, granted PR - node A sends D its lkb, convert PR->CW - D determines the correct lock state is: granted: PR node B convert: PR->CW node A The lkb sent by each node was recreated without any change on the new master node. Example 2 node C, resource R granted: PR node A granted: NL node C granted: NL node D waiting: CW node B - A sends convert PR->CW to C - C grants the conversion to CW for A - C grants the waiting request for CW to B - C sends granted message to B, but fails before it can send the granted message to A - B receives the granted message from C At this point: - A believes it is converting PR->CW - B believes it is holding a CW lock The correct lock recovery steps in this example are: - node D becomes the new master of R - node A sends D its lkb, convert PR->CW - node B sends D its lkb, granted CW - D determins the correct lock state is: granted: CW node B granted: CW node A The lkb sent by B is recreated without change, but the lkb sent by A is changed because the granted mode was not compatible. Fixes to make this work correctly: recover_convert_waiter: should not make any changes to a converting lkb that is still waiting for a reply message. It was previously setting grmode to IV, which is invalid state, so the lkb would not be handled correctly by other code. receive_rcom_lock_args: was checking the wrong lkb field (wait_type instead of status) to determine if the lkb is being converted, and in need of inspection for this special recovery. It was also setting grmode to IV in the lkb, causing it to be mishandled by other code. Now, this function just puts the lkb, directly as sent, onto the convert queue of the resource being recovered, and corrects it in recover_conversion() later, if needed. recover_conversion: the job of this function is to detect and correct lkb states for the special PR/CW conversions. The new code now checks for recovered lkbs on the granted queue with grmode PR or CW, and takes the real grmode from that. Then it looks for lkbs on the convert queue with an incompatible grmode (i.e. grmode PR when the real grmode is CW, or v.v.) These converting lkbs need to be fixed. They are fixed by temporarily setting their grmode to NL, so that grmodes are not incompatible and won't confuse other locking code. The converting lkb will then be granted at the end of recovery, replacing the temporary NL grmode. Signed-off-by: Alexander Aring Signed-off-by: David Teigland

dlm: make add_to_waiters() that it can't fail

2024-10-04T15:31:31Z

If add_to_waiters() fails we have a problem because the previous called functions such as validate_lock_args() or validate_unlock_args() sets specific lkb values that are set for a request, there exists no way back to revert those changes. When there is a pending lock request the original request arguments will be overwritten with unknown consequences. The good news are that I believe those cases that we fail in add_to_waiters() can't happen or very unlikely to happen (only if the DLM user does stupid API things), but if so we have the above mentioned problem. There are two conditions that will be removed here. The first one is the -EINVAL case which contains is_overlap_unlock() or (is_overlap_cancel() and mstype == DLM_MSG_CANCEL). The is_overlap_unlock() is missing for the normal UNLOCK case which is moved to validate_unlock_args(). The is_overlap_cancel() already happens in validate_unlock_args() when DLM_LKF_CANCEL is set. In case of validate_lock_args() we check on is_overlap() when it is not a new request, on a new request the lkb is always new and does not have those values set. The -EBUSY check can't happen in case as for non new lock requests (when DLM_LKF_CONVERT is set) we already check in validate_lock_args() for lkb_wait_type and is_overlap(). Then there is only validate_unlock_args() that will never hit the default case because dlm_unlock() will produce DLM_MSG_UNLOCK and DLM_MSG_CANCEL messages. Signed-off-by: Alexander Aring Signed-off-by: David Teigland

dlm: dlm_config_info config fields to unsigned int

2024-10-04T15:31:31Z

We are using kstrtouint() to parse common integer fields. This patch will switch to use unsigned int instead of int as we are parsing unsigned integer values. Signed-off-by: Alexander Aring Signed-off-by: David Teigland

dlm: use dlm_config as only cluster configuration

2024-10-04T15:31:31Z

This patch removes the configfs storage fields from the dlm_cluster structure to store per cluster values. Those fields also exists for the dlm_config global variable and get stored in both when setting configfs values. To read values it will always be read out from the dlm_cluster configfs structure but this patch changes it to only use the global dlm_config variable. Storing them in two places makes no sense as both are able to be changed under certain conditions during DLM runtime. Signed-off-by: Alexander Aring Signed-off-by: David Teigland

dlm: handle port as __be16 network byte order

2024-10-04T15:31:31Z

This patch handles the DLM listen port setting internally as byte order as it is a value that is used as network byte on the wire. The user space still sets this value as host byte order for configfs as we don't break UAPI here. Signed-off-by: Alexander Aring Signed-off-by: David Teigland

dlm: disallow different configs nodeid storages

2024-10-04T15:31:31Z

The DLM configfs path has usually a nodeid in it's directory path and again a file to store the nodeid again in a separate storage. It is forced that the user space will set both (the directory name and nodeid file) storage to the same value if it doesn't do that we run in some kind of broken state. This patch will simply represent the file storage to it's upper directory nodeid name. It will force the user now to use a valid unsigned int as nodeid directory name and will ignore all nodeid writes in the nodeid file storage as this will now always represent the upper nodeid directory name. Signed-off-by: Alexander Aring Signed-off-by: David Teigland

dlm: fix possible lkb_resource null dereference

2024-10-04T15:31:31Z

This patch fixes a possible null pointer dereference when this function is called from request_lock() as lkb->lkb_resource is not assigned yet, only after validate_lock_args() by calling attach_lkb(). Another issue is that a resource name could be a non printable bytearray and we cannot assume to be ASCII coded. The log functionality is probably never being hit when DLM is used in normal way and no debug logging is enabled. The null pointer dereference can only occur on a new created lkb that does not have the resource assigned yet, it probably never hits the null pointer dereference but we should be sure that other changes might not change this behaviour and we actually can hit the mentioned null pointer dereference. In this patch we just drop the printout of the resource name, the lkb id is enough to make a possible connection to a resource name if this exists. Signed-off-by: Alexander Aring Signed-off-by: David Teigland

dlm: fix swapped args sb_flags vs sb_status

2024-10-04T15:31:31Z

The arguments got swapped by commit 986ae3c2a8df ("dlm: fix race between final callback and remove") fixing this now. Fixes: 986ae3c2a8df ("dlm: fix race between final callback and remove") Signed-off-by: Alexander Aring Signed-off-by: David Teigland

[tree-wide] finally take no_llseek out

2024-09-27T15:18:43Z

no_llseek had been defined to NULL two years ago, in commit 868941b14441 ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro Signed-off-by: Linus Torvalds