summaryrefslogtreecommitdiff
path: root/fs/afs/vl_alias.c
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-06-05 16:26:36 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-06-05 16:26:36 -0700
commit9daa0a27a0bce6596be287fb1df372ff80bb1087 (patch)
tree0dae8f626d71339d79131a3f664abbe36547c5da /fs/afs/vl_alias.c
parent0b166a57e6222666292a481b742af92b50c3ba50 (diff)
parent8409f67b6437c4b327ee95a71081b9c7bfee0b00 (diff)
Merge tag 'afs-next-20200604' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS updates from David Howells: "There's some core VFS changes which affect a couple of filesystems: - Make the inode hash table RCU safe and providing some RCU-safe accessor functions. The search can then be done without taking the inode_hash_lock. Care must be taken because the object may be being deleted and no wait is made. - Allow iunique() to avoid taking the inode_hash_lock. - Allow AFS's callback processing to avoid taking the inode_hash_lock when using the inode table to find an inode to notify. - Improve Ext4's time updating. Konstantin Khlebnikov said "For now, I've plugged this issue with try-lock in ext4 lazy time update. This solution is much better." Then there's a set of changes to make a number of improvements to the AFS driver: - Improve callback (ie. third party change notification) processing by: (a) Relying more on the fact we're doing this under RCU and by using fewer locks. This makes use of the RCU-based inode searching outlined above. (b) Moving to keeping volumes in a tree indexed by volume ID rather than a flat list. (c) Making the server and volume records logically part of the cell. This means that a server record now points directly at the cell and the tree of volumes is there. This removes an N:M mapping table, simplifying things. - Improve keeping NAT or firewall channels open for the server callbacks to reach the client by actively polling the fileserver on a timed basis, instead of only doing it when we have an operation to process. - Improving detection of delayed or lost callbacks by including the parent directory in the list of file IDs to be queried when doing a bulk status fetch from lookup. We can then check to see if our copy of the directory has changed under us without us getting notified. - Determine aliasing of cells (such as a cell that is pointed to be a DNS alias). This allows us to avoid having ambiguity due to apparently different cells using the same volume and file servers. - Improve the fileserver rotation to do more probing when it detects that all of the addresses to a server are listed as non-responsive. It's possible that an address that previously stopped responding has become responsive again. Beyond that, lay some foundations for making some calls asynchronous: - Turn the fileserver cursor struct into a general operation struct and hang the parameters off of that rather than keeping them in local variables and hang results off of that rather than the call struct. - Implement some general operation handling code and simplify the callers of operations that affect a volume or a volume component (such as a file). Most of the operation is now done by core code. - Operations are supplied with a table of operations to issue different variants of RPCs and to manage the completion, where all the required data is held in the operation object, thereby allowing these to be called from a workqueue. - Put the standard "if (begin), while(select), call op, end" sequence into a canned function that just emulates the current behaviour for now. There are also some fixes interspersed: - Don't let the EACCES from ICMP6 mapping reach the user as such, since it's confusing as to whether it's a filesystem error. Convert it to EHOSTUNREACH. - Don't use the epoch value acquired through probing a server. If we have two servers with the same UUID but in different cells, it's hard to draw conclusions from them having different epoch values. - Don't interpret the argument to the CB.ProbeUuid RPC as a fileserver UUID and look up a fileserver from it. - Deal with servers in different cells having the same UUIDs. In the event that a CB.InitCallBackState3 RPC is received, we have to break the callback promises for every server record matching that UUID. - Don't let afs_statfs return values that go below 0. - Don't use running fileserver probe state to make server selection and address selection decisions on. Only make decisions on final state as the running state is cleared at the start of probing" Acked-by: Al Viro <viro@zeniv.linux.org.uk> (fs/inode.c part) * tag 'afs-next-20200604' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: (27 commits) afs: Adjust the fileserver rotation algorithm to reprobe/retry more quickly afs: Show more a bit more server state in /proc/net/afs/servers afs: Don't use probe running state to make decisions outside probe code afs: Fix afs_statfs() to not let the values go below zero afs: Fix the by-UUID server tree to allow servers with the same UUID afs: Reorganise volume and server trees to be rooted on the cell afs: Add a tracepoint to track the lifetime of the afs_volume struct afs: Detect cell aliases 3 - YFS Cells with a canonical cell name op afs: Detect cell aliases 2 - Cells with no root volumes afs: Detect cell aliases 1 - Cells with root volumes afs: Implement client support for the YFSVL.GetCellName RPC op afs: Retain more of the VLDB record for alias detection afs: Fix handling of CB.ProbeUuid cache manager op afs: Don't get epoch from a server because it may be ambiguous afs: Build an abstraction around an "operation" concept afs: Rename struct afs_fs_cursor to afs_operation afs: Remove the error argument from afs_protocol_error() afs: Set error flag rather than return error from file status decode afs: Make callback processing more efficient. afs: Show more information in /proc/net/afs/servers ...
Diffstat (limited to 'fs/afs/vl_alias.c')
-rw-r--r--fs/afs/vl_alias.c382
1 files changed, 382 insertions, 0 deletions
diff --git a/fs/afs/vl_alias.c b/fs/afs/vl_alias.c
new file mode 100644
index 000000000000..093895c49c21
--- /dev/null
+++ b/fs/afs/vl_alias.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* AFS cell alias detection
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/namei.h>
+#include <keys/rxrpc-type.h>
+#include "internal.h"
+
+/*
+ * Sample a volume.
+ */
+static struct afs_volume *afs_sample_volume(struct afs_cell *cell, struct key *key,
+ const char *name, unsigned int namelen)
+{
+ struct afs_volume *volume;
+ struct afs_fs_context fc = {
+ .type = 0, /* Explicitly leave it to the VLDB */
+ .volnamesz = namelen,
+ .volname = name,
+ .net = cell->net,
+ .cell = cell,
+ .key = key, /* This might need to be something */
+ };
+
+ volume = afs_create_volume(&fc);
+ _leave(" = %px", volume);
+ return volume;
+}
+
+/*
+ * Compare two addresses.
+ */
+static int afs_compare_addrs(const struct sockaddr_rxrpc *srx_a,
+ const struct sockaddr_rxrpc *srx_b)
+{
+ short port_a, port_b;
+ int addr_a, addr_b, diff;
+
+ diff = (short)srx_a->transport_type - (short)srx_b->transport_type;
+ if (diff)
+ goto out;
+
+ switch (srx_a->transport_type) {
+ case AF_INET: {
+ const struct sockaddr_in *a = &srx_a->transport.sin;
+ const struct sockaddr_in *b = &srx_b->transport.sin;
+ addr_a = ntohl(a->sin_addr.s_addr);
+ addr_b = ntohl(b->sin_addr.s_addr);
+ diff = addr_a - addr_b;
+ if (diff == 0) {
+ port_a = ntohs(a->sin_port);
+ port_b = ntohs(b->sin_port);
+ diff = port_a - port_b;
+ }
+ break;
+ }
+
+ case AF_INET6: {
+ const struct sockaddr_in6 *a = &srx_a->transport.sin6;
+ const struct sockaddr_in6 *b = &srx_b->transport.sin6;
+ diff = memcmp(&a->sin6_addr, &b->sin6_addr, 16);
+ if (diff == 0) {
+ port_a = ntohs(a->sin6_port);
+ port_b = ntohs(b->sin6_port);
+ diff = port_a - port_b;
+ }
+ break;
+ }
+
+ default:
+ BUG();
+ }
+
+out:
+ return diff;
+}
+
+/*
+ * Compare the address lists of a pair of fileservers.
+ */
+static int afs_compare_fs_alists(const struct afs_server *server_a,
+ const struct afs_server *server_b)
+{
+ const struct afs_addr_list *la, *lb;
+ int a = 0, b = 0, addr_matches = 0;
+
+ la = rcu_dereference(server_a->addresses);
+ lb = rcu_dereference(server_b->addresses);
+
+ while (a < la->nr_addrs && b < lb->nr_addrs) {
+ const struct sockaddr_rxrpc *srx_a = &la->addrs[a];
+ const struct sockaddr_rxrpc *srx_b = &lb->addrs[b];
+ int diff = afs_compare_addrs(srx_a, srx_b);
+
+ if (diff < 0) {
+ a++;
+ } else if (diff > 0) {
+ b++;
+ } else {
+ addr_matches++;
+ a++;
+ b++;
+ }
+ }
+
+ return addr_matches;
+}
+
+/*
+ * Compare the fileserver lists of two volumes. The server lists are sorted in
+ * order of ascending UUID.
+ */
+static int afs_compare_volume_slists(const struct afs_volume *vol_a,
+ const struct afs_volume *vol_b)
+{
+ const struct afs_server_list *la, *lb;
+ int i, a = 0, b = 0, uuid_matches = 0, addr_matches = 0;
+
+ la = rcu_dereference(vol_a->servers);
+ lb = rcu_dereference(vol_b->servers);
+
+ for (i = 0; i < AFS_MAXTYPES; i++)
+ if (la->vids[i] != lb->vids[i])
+ return 0;
+
+ while (a < la->nr_servers && b < lb->nr_servers) {
+ const struct afs_server *server_a = la->servers[a].server;
+ const struct afs_server *server_b = lb->servers[b].server;
+ int diff = memcmp(&server_a->uuid, &server_b->uuid, sizeof(uuid_t));
+
+ if (diff < 0) {
+ a++;
+ } else if (diff > 0) {
+ b++;
+ } else {
+ uuid_matches++;
+ addr_matches += afs_compare_fs_alists(server_a, server_b);
+ a++;
+ b++;
+ }
+ }
+
+ _leave(" = %d [um %d]", addr_matches, uuid_matches);
+ return addr_matches;
+}
+
+/*
+ * Compare root.cell volumes.
+ */
+static int afs_compare_cell_roots(struct afs_cell *cell)
+{
+ struct afs_cell *p;
+
+ _enter("");
+
+ rcu_read_lock();
+
+ hlist_for_each_entry_rcu(p, &cell->net->proc_cells, proc_link) {
+ if (p == cell || p->alias_of)
+ continue;
+ if (!p->root_volume)
+ continue; /* Ignore cells that don't have a root.cell volume. */
+
+ if (afs_compare_volume_slists(cell->root_volume, p->root_volume) != 0)
+ goto is_alias;
+ }
+
+ rcu_read_unlock();
+ _leave(" = 0");
+ return 0;
+
+is_alias:
+ rcu_read_unlock();
+ cell->alias_of = afs_get_cell(p);
+ return 1;
+}
+
+/*
+ * Query the new cell for a volume from a cell we're already using.
+ */
+static int afs_query_for_alias_one(struct afs_cell *cell, struct key *key,
+ struct afs_cell *p)
+{
+ struct afs_volume *volume, *pvol = NULL;
+ int ret;
+
+ /* Arbitrarily pick a volume from the list. */
+ read_seqlock_excl(&p->volume_lock);
+ if (!RB_EMPTY_ROOT(&p->volumes))
+ pvol = afs_get_volume(rb_entry(p->volumes.rb_node,
+ struct afs_volume, cell_node),
+ afs_volume_trace_get_query_alias);
+ read_sequnlock_excl(&p->volume_lock);
+ if (!pvol)
+ return 0;
+
+ _enter("%s:%s", cell->name, pvol->name);
+
+ /* And see if it's in the new cell. */
+ volume = afs_sample_volume(cell, key, pvol->name, pvol->name_len);
+ if (IS_ERR(volume)) {
+ afs_put_volume(cell->net, pvol, afs_volume_trace_put_query_alias);
+ if (PTR_ERR(volume) != -ENOMEDIUM)
+ return PTR_ERR(volume);
+ /* That volume is not in the new cell, so not an alias */
+ return 0;
+ }
+
+ /* The new cell has a like-named volume also - compare volume ID,
+ * server and address lists.
+ */
+ ret = 0;
+ if (pvol->vid == volume->vid) {
+ rcu_read_lock();
+ if (afs_compare_volume_slists(volume, pvol))
+ ret = 1;
+ rcu_read_unlock();
+ }
+
+ afs_put_volume(cell->net, volume, afs_volume_trace_put_query_alias);
+ afs_put_volume(cell->net, pvol, afs_volume_trace_put_query_alias);
+ return ret;
+}
+
+/*
+ * Query the new cell for volumes we know exist in cells we're already using.
+ */
+static int afs_query_for_alias(struct afs_cell *cell, struct key *key)
+{
+ struct afs_cell *p;
+
+ _enter("%s", cell->name);
+
+ if (mutex_lock_interruptible(&cell->net->proc_cells_lock) < 0)
+ return -ERESTARTSYS;
+
+ hlist_for_each_entry(p, &cell->net->proc_cells, proc_link) {
+ if (p == cell || p->alias_of)
+ continue;
+ if (RB_EMPTY_ROOT(&p->volumes))
+ continue;
+ if (p->root_volume)
+ continue; /* Ignore cells that have a root.cell volume. */
+ afs_get_cell(p);
+ mutex_unlock(&cell->net->proc_cells_lock);
+
+ if (afs_query_for_alias_one(cell, key, p) != 0)
+ goto is_alias;
+
+ if (mutex_lock_interruptible(&cell->net->proc_cells_lock) < 0) {
+ afs_put_cell(cell->net, p);
+ return -ERESTARTSYS;
+ }
+
+ afs_put_cell(cell->net, p);
+ }
+
+ mutex_unlock(&cell->net->proc_cells_lock);
+ _leave(" = 0");
+ return 0;
+
+is_alias:
+ cell->alias_of = p; /* Transfer our ref */
+ return 1;
+}
+
+/*
+ * Look up a VLDB record for a volume.
+ */
+static char *afs_vl_get_cell_name(struct afs_cell *cell, struct key *key)
+{
+ struct afs_vl_cursor vc;
+ char *cell_name = ERR_PTR(-EDESTADDRREQ);
+ bool skipped = false, not_skipped = false;
+ int ret;
+
+ if (!afs_begin_vlserver_operation(&vc, cell, key))
+ return ERR_PTR(-ERESTARTSYS);
+
+ while (afs_select_vlserver(&vc)) {
+ if (!test_bit(AFS_VLSERVER_FL_IS_YFS, &vc.server->flags)) {
+ vc.ac.error = -EOPNOTSUPP;
+ skipped = true;
+ continue;
+ }
+ not_skipped = true;
+ cell_name = afs_yfsvl_get_cell_name(&vc);
+ }
+
+ ret = afs_end_vlserver_operation(&vc);
+ if (skipped && !not_skipped)
+ ret = -EOPNOTSUPP;
+ return ret < 0 ? ERR_PTR(ret) : cell_name;
+}
+
+static int yfs_check_canonical_cell_name(struct afs_cell *cell, struct key *key)
+{
+ struct afs_cell *master;
+ char *cell_name;
+
+ cell_name = afs_vl_get_cell_name(cell, key);
+ if (IS_ERR(cell_name))
+ return PTR_ERR(cell_name);
+
+ if (strcmp(cell_name, cell->name) == 0) {
+ kfree(cell_name);
+ return 0;
+ }
+
+ master = afs_lookup_cell(cell->net, cell_name, strlen(cell_name),
+ NULL, false);
+ kfree(cell_name);
+ if (IS_ERR(master))
+ return PTR_ERR(master);
+
+ cell->alias_of = master; /* Transfer our ref */
+ return 1;
+}
+
+static int afs_do_cell_detect_alias(struct afs_cell *cell, struct key *key)
+{
+ struct afs_volume *root_volume;
+ int ret;
+
+ _enter("%s", cell->name);
+
+ ret = yfs_check_canonical_cell_name(cell, key);
+ if (ret != -EOPNOTSUPP)
+ return ret;
+
+ /* Try and get the root.cell volume for comparison with other cells */
+ root_volume = afs_sample_volume(cell, key, "root.cell", 9);
+ if (!IS_ERR(root_volume)) {
+ cell->root_volume = root_volume;
+ return afs_compare_cell_roots(cell);
+ }
+
+ if (PTR_ERR(root_volume) != -ENOMEDIUM)
+ return PTR_ERR(root_volume);
+
+ /* Okay, this cell doesn't have an root.cell volume. We need to
+ * locate some other random volume and use that to check.
+ */
+ return afs_query_for_alias(cell, key);
+}
+
+/*
+ * Check to see if a new cell is an alias of a cell we already have. At this
+ * point we have the cell's volume server list.
+ *
+ * Returns 0 if we didn't detect an alias, 1 if we found an alias and an error
+ * if we had problems gathering the data required. In the case the we did
+ * detect an alias, cell->alias_of is set to point to the assumed master.
+ */
+int afs_cell_detect_alias(struct afs_cell *cell, struct key *key)
+{
+ struct afs_net *net = cell->net;
+ int ret;
+
+ if (mutex_lock_interruptible(&net->cells_alias_lock) < 0)
+ return -ERESTARTSYS;
+
+ if (test_bit(AFS_CELL_FL_CHECK_ALIAS, &cell->flags)) {
+ ret = afs_do_cell_detect_alias(cell, key);
+ if (ret >= 0)
+ clear_bit_unlock(AFS_CELL_FL_CHECK_ALIAS, &cell->flags);
+ } else {
+ ret = cell->alias_of ? 1 : 0;
+ }
+
+ mutex_unlock(&net->cells_alias_lock);
+
+ if (ret == 1)
+ pr_notice("kAFS: Cell %s is an alias of %s\n",
+ cell->name, cell->alias_of->name);
+ return ret;
+}