summaryrefslogtreecommitdiff
path: root/fs/afs/addr_list.c
AgeCommit message (Collapse)Author
2018-10-24afs: Probe multiple fileservers simultaneouslyDavid Howells
Send probes to all the unprobed fileservers in a fileserver list on all addresses simultaneously in an attempt to find out the fastest route whilst not getting stuck for 20s on any server or address that we don't get a reply from. This alleviates the problem whereby attempting to access a new server can take a long time because the rotation algorithm ends up rotating through all servers and addresses until it finds one that responds. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Eliminate the address pointer from the address list cursorDavid Howells
Eliminate the address pointer from the address list cursor as it's redundant (ac->addrs[ac->index] can be used to find the same address) and address lists must be replaced rather than being rearranged, so is of limited value. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Allow dumping of server cursor on operation failureDavid Howells
Provide an option to allow the file or volume location server cursor to be dumped if the rotation routine falls off the end without managing to contact a server. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Implement VL server rotationDavid Howells
Track VL servers as independent entities rather than lumping all their addresses together into one set and implement server-level rotation by: (1) Add the concept of a VL server list, where each server has its own separate address list. This code is similar to the FS server list. (2) Use the DNS resolver to retrieve a set of servers and their associated addresses, ports, preference and weight ratings. (3) In the case of a legacy DNS resolver or an address list given directly through /proc/net/afs/cells, create a list containing just a dummy server record and attach all the addresses to that. (4) Implement a simple rotation policy, for the moment ignoring the priorities and weights assigned to the servers. (5) Show the address list through /proc/net/afs/<cell>/vlservers. This also displays the source and status of the data as indicated by the upcall. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-24afs: Improve FS server rotation error handlingDavid Howells
Improve the error handling in FS server rotation by: (1) Cache the latest useful error value for the fs operation as a whole in struct afs_fs_cursor separately from the error cached in the afs_addr_cursor struct. The one in the address cursor gets clobbered occasionally. Copy over the error to the fs operation only when it's something we'd be interested in passing to userspace. (2) Make it so that EDESTADDRREQ is the default that is seen only if no addresses are available to be accessed. (3) When calling utility functions, such as checking a volume status or probing a fileserver, don't let a successful result clobber the cached error in the cursor; instead, stash the result in a temporary variable until it has been assessed. (4) Don't return ETIMEDOUT or ETIME if a better error, such as ENETUNREACH, is already cached. (5) On leaving the rotation loop, turn any remote abort code into a more useful error than ECONNABORTED. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04rxrpc: Use IPv4 addresses throught the IPv6David Howells
AF_RXRPC opens an IPv6 socket through which to send and receive network packets, both IPv6 and IPv4. It currently turns AF_INET addresses into AF_INET-as-AF_INET6 addresses based on an assumption that this was necessary; on further inspection of the code, however, it turns out that the IPv6 code just farms packets aimed at AF_INET addresses out to the IPv4 code. Fix AF_RXRPC to use AF_INET addresses directly when given them. Fixes: 7b674e390e51 ("rxrpc: Fix IPv6 support") Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04afs: Sort address lists so that they are in logical ascending orderDavid Howells
Sort address lists so that they are in logical ascending order rather than being partially in ascending order of the BE representations of those values. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04afs: Always build address lists using the helper functionsDavid Howells
Make the address list string parser use the helper functions for adding addresses to an address list so that they end up appropriately sorted. This will better handles overruns and make them easier to compare. It also reduces the number of places that addresses are handled, making it easier to fix the handling. Signed-off-by: David Howells <dhowells@redhat.com>
2018-10-04afs: Do better max capacity handling on address listsDavid Howells
Note the maximum allocated capacity in an afs_addr_list struct and discard addresses that would exceed it in afs_merge_fs_addr{4,6}(). Also, since the current maximum capacity is less than 255, reduce the relevant members to bytes. Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-16Merge branch 'afs-proc' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull AFS updates from Al Viro: "Assorted AFS stuff - ended up in vfs.git since most of that consists of David's AFS-related followups to Christoph's procfs series" * 'afs-proc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Optimise callback breaking by not repeating volume lookup afs: Display manually added cells in dynamic root mount afs: Enable IPv6 DNS lookups afs: Show all of a server's addresses in /proc/fs/afs/servers afs: Handle CONFIG_PROC_FS=n proc: Make inline name size calculation automatic afs: Implement network namespacing afs: Mark afs_net::ws_cell as __rcu and set using rcu functions afs: Fix a Sparse warning in xdr_decode_AFSFetchStatus() proc: Add a way to make network proc files writable afs: Rearrange fs/afs/proc.c to remove remaining predeclarations. afs: Rearrange fs/afs/proc.c to move the show routines up afs: Rearrange fs/afs/proc.c by moving fops and open functions down afs: Move /proc management functions to the end of the file
2018-06-15afs: Enable IPv6 DNS lookupsDavid Howells
Remove the restriction on DNS lookup upcalls that prevents ipv6 addresses from being looked up. Signed-off-by: David Howells <dhowells@redhat.com>
2018-06-06Merge tag 'overflow-v4.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull overflow updates from Kees Cook: "This adds the new overflow checking helpers and adds them to the 2-factor argument allocators. And this adds the saturating size helpers and does a treewide replacement for the struct_size() usage. Additionally this adds the overflow testing modules to make sure everything works. I'm still working on the treewide replacements for allocators with "simple" multiplied arguments: *alloc(a * b, ...) -> *alloc_array(a, b, ...) and *zalloc(a * b, ...) -> *calloc(a, b, ...) as well as the more complex cases, but that's separable from this portion of the series. I expect to have the rest sent before -rc1 closes; there are a lot of messy cases to clean up. Summary: - Introduce arithmetic overflow test helper functions (Rasmus) - Use overflow helpers in 2-factor allocators (Kees, Rasmus) - Introduce overflow test module (Rasmus, Kees) - Introduce saturating size helper functions (Matthew, Kees) - Treewide use of struct_size() for allocators (Kees)" * tag 'overflow-v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: treewide: Use struct_size() for devm_kmalloc() and friends treewide: Use struct_size() for vmalloc()-family treewide: Use struct_size() for kmalloc()-family device: Use overflow helpers for devm_kmalloc() mm: Use overflow helpers in kvmalloc() mm: Use overflow helpers in kmalloc_array*() test_overflow: Add memory allocation overflow tests overflow.h: Add allocation size calculation helpers test_overflow: Report test failures test_overflow: macrofy some more, do more tests for free lib: add runtime test of check_*_overflow functions compiler.h: enable builtin overflow checkers and add fallback code
2018-06-06treewide: Use struct_size() for kmalloc()-familyKees Cook
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct foo { int stuff; void *entry[]; }; instance = kmalloc(sizeof(struct foo) + sizeof(void *) * count, GFP_KERNEL); Instead of leaving these open-coded and prone to type mistakes, we can now use the new struct_size() helper: instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL); This patch makes the changes for kmalloc()-family (and kvmalloc()-family) uses. It was done via automatic conversion with manual review for the "CHECKME" non-standard cases noted below, using the following Coccinelle script: // pkey_cache = kmalloc(sizeof *pkey_cache + tprops->pkey_tbl_len * // sizeof *pkey_cache->table, GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(*VAR->ELEMENT), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // mr = kzalloc(sizeof(*mr) + m * sizeof(mr->map[0]), GFP_KERNEL); @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; identifier VAR, ELEMENT; expression COUNT; @@ - alloc(sizeof(*VAR) + COUNT * sizeof(VAR->ELEMENT[0]), GFP) + alloc(struct_size(VAR, ELEMENT, COUNT), GFP) // Same pattern, but can't trivially locate the trailing element name, // or variable name. @@ identifier alloc =~ "kmalloc|kzalloc|kvmalloc|kvzalloc"; expression GFP; expression SOMETHING, COUNT, ELEMENT; @@ - alloc(sizeof(SOMETHING) + COUNT * sizeof(ELEMENT), GFP) + alloc(CHECKME_struct_size(&SOMETHING, ELEMENT, COUNT), GFP) Signed-off-by: Kees Cook <keescook@chromium.org>
2018-05-14afs: Fix address list parsingDavid Howells
The parsing of port specifiers in the address list obtained from the DNS resolution upcall doesn't work as in4_pton() and in6_pton() will fail on encountering an unexpected delimiter (in this case, the '+' marking the port number). However, in*_pton() can't be given multiple specifiers. Fix this by finding the delimiter in advance and not relying on in*_pton() to find the end of the address for us. Fixes: 8b2a464ced77 ("afs: Add an address list concept") Signed-off-by: David Howells <dhowells@redhat.com>
2018-04-09afs: Fix checker warningsDavid Howells
Fix warnings raised by checker, including: (*) Warnings raised by unequal comparison for the purposes of sorting, where the endianness doesn't matter: fs/afs/addr_list.c:246:21: warning: restricted __be16 degrades to integer fs/afs/addr_list.c:246:30: warning: restricted __be16 degrades to integer fs/afs/addr_list.c:248:21: warning: restricted __be32 degrades to integer fs/afs/addr_list.c:248:49: warning: restricted __be32 degrades to integer fs/afs/addr_list.c:283:21: warning: restricted __be16 degrades to integer fs/afs/addr_list.c:283:30: warning: restricted __be16 degrades to integer (*) afs_set_cb_interest() is not actually used and can be removed. (*) afs_cell_gc_delay() should be provided with a sysctl. (*) afs_cell_destroy() needs to use rcu_access_pointer() to read cell->vl_addrs. (*) afs_init_fs_cursor() should be static. (*) struct afs_vnode::permit_cache needs to be marked __rcu. (*) afs_server_rcu() needs to use rcu_access_pointer(). (*) afs_destroy_server() should use rcu_access_pointer() on server->addresses as the server object is no longer accessible. (*) afs_find_server() casts __be16/__be32 values to int in order to directly compare them for the purpose of finding a match in a list, but is should also annotate the cast with __force to avoid checker warnings. (*) afs_check_permit() accesses vnode->permit_cache outside of the RCU readlock, though it doesn't then access the value; the extraneous access is deleted. False positives: (*) Conditional locking around the code in xdr_decode_AFSFetchStatus. This can be dealt with in a separate patch. fs/afs/fsclient.c:148:9: warning: context imbalance in 'xdr_decode_AFSFetchStatus' - different lock contexts for basic block (*) Incorrect handling of seq-retry lock context balance: fs/afs/inode.c:455:38: warning: context imbalance in 'afs_getattr' - different lock contexts for basic block fs/afs/server.c:52:17: warning: context imbalance in 'afs_find_server' - different lock contexts for basic block fs/afs/server.c:128:17: warning: context imbalance in 'afs_find_server_by_uuid' - different lock contexts for basic block Errors: (*) afs_lookup_cell_rcu() needs to break out of the seq-retry loop, not go round again if it successfully found the workstation cell. (*) Fix UUID decode in afs_deliver_cb_probe_uuid(). (*) afs_cache_permit() has a missing rcu_read_unlock() before one of the jumps to the someone_else_changed_it label. Move the unlock to after the label. (*) afs_vl_get_addrs_u() is using ntohl() rather than htonl() when encoding to XDR. (*) afs_deliver_yfsvl_get_endpoints() is using htonl() rather than ntohl() when decoding from XDR. Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-06afs: Fix missing cursor clearanceDavid Howells
afs_select_fileserver() ends the address cursor it is using in the case in which we get some sort of network error and run out of addresses to iterate through, before it jumps to try the next server. This also needs to be done when the server aborts with some sort of error that means we should try the next server. Fix this by: (1) Move the iterate_address afs_end_cursor() call to the next_server case. (2) End the cursor in the failed case. (3) Make afs_end_cursor() clear the ->begun flag and ->addr pointer in the address cursor. (4) Make afs_end_cursor() able to be called on an already cleared cursor. Without this, something like the following oops may occur: AFS: Assertion failed 18446612134397189888 == 0 is false 0xffff88007c279f00 == 0x0 is false ------------[ cut here ]------------ kernel BUG at fs/afs/rotate.c:360! RIP: 0010:afs_select_fileserver+0x79b/0xa30 [kafs] Call Trace: afs_statfs+0xcc/0x180 [kafs] ? p9_client_statfs+0x9e/0x110 [9pnet] ? _cond_resched+0x19/0x40 statfs_by_dentry+0x6d/0x90 vfs_statfs+0x1b/0xc0 user_statfs+0x4b/0x80 SYSC_statfs+0x15/0x30 SyS_statfs+0xe/0x10 entry_SYSCALL_64_fastpath+0x20/0x83 Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
2017-11-13afs: Make use of the YFS service upgrade to fully support IPv6David Howells
YFS VL servers offer an upgraded Volume Location service that can return IPv6 addresses to fileservers and volume servers in addition to IPv4 addresses using the YFSVL.GetEndpoints operation which we should use if it's available. To this end: (1) Make rxrpc_kernel_recv_data() return the call's current service ID so that the caller can detect service upgrade and see what the service was upgraded to. (2) When we see a VL server address we haven't seen before, send a VL.GetCapabilities operation to it with the service upgrade bit set. If we get an upgrade to the YFS VL service, change the service ID in the address list for that address to use the upgraded service and set a flag to note that this appears to be a YFS-compatible server. (3) If, when a server's addresses are being looked up, we note that we previously detected a YFS-compatible server, then send the YFSVL.GetEndpoints operation rather than VL.GetAddrsU. (4) Build a fileserver address list from the reply of YFSVL.GetEndpoints, including both IPv4 and IPv6 addresses. Volume server addresses are discarded. (5) The address list is sorted by address and port now, instead of just address. This allows multiple servers on the same host sitting on different ports. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Overhaul volume and server record caching and fileserver rotationDavid Howells
The current code assumes that volumes and servers are per-cell and are never shared, but this is not enforced, and, indeed, public cells do exist that are aliases of each other. Further, an organisation can, say, set up a public cell and a private cell with overlapping, but not identical, sets of servers. The difference is purely in the database attached to the VL servers. The current code will malfunction if it sees a server in two cells as it assumes global address -> server record mappings and that each server is in just one cell. Further, each server may have multiple addresses - and may have addresses of different families (IPv4 and IPv6, say). To this end, the following structural changes are made: (1) Server record management is overhauled: (a) Server records are made independent of cell. The namespace keeps track of them, volume records have lists of them and each vnode has a server on which its callback interest currently resides. (b) The cell record no longer keeps a list of servers known to be in that cell. (c) The server records are now kept in a flat list because there's no single address to sort on. (d) Server records are now keyed by their UUID within the namespace. (e) The addresses for a server are obtained with the VL.GetAddrsU rather than with VL.GetEntryByName, using the server's UUID as a parameter. (f) Cached server records are garbage collected after a period of non-use and are counted out of existence before purging is allowed to complete. This protects the work functions against rmmod. (g) The servers list is now in /proc/fs/afs/servers. (2) Volume record management is overhauled: (a) An RCU-replaceable server list is introduced. This tracks both servers and their coresponding callback interests. (b) The superblock is now keyed on cell record and numeric volume ID. (c) The volume record is now tied to the superblock which mounts it, and is activated when mounted and deactivated when unmounted. This makes it easier to handle the cache cookie without causing a double-use in fscache. (d) The volume record is loaded from the VLDB using VL.GetEntryByNameU to get the server UUID list. (e) The volume name is updated if it is seen to have changed when the volume is updated (the update is keyed on the volume ID). (3) The vlocation record is got rid of and VLDB records are no longer cached. Sufficient information is stored in the volume record, though an update to a volume record is now no longer shared between related volumes (volumes come in bundles of three: R/W, R/O and backup). and the following procedural changes are made: (1) The fileserver cursor introduced previously is now fleshed out and used to iterate over fileservers and their addresses. (2) Volume status is checked during iteration, and the server list is replaced if a change is detected. (3) Server status is checked during iteration, and the address list is replaced if a change is detected. (4) The abort code is saved into the address list cursor and -ECONNABORTED returned in afs_make_call() if a remote abort happened rather than translating the abort into an error message. This allows actions to be taken depending on the abort code more easily. (a) If a VMOVED abort is seen then this is handled by rechecking the volume and restarting the iteration. (b) If a VBUSY, VRESTARTING or VSALVAGING abort is seen then this is handled by sleeping for a short period and retrying and/or trying other servers that might serve that volume. A message is also displayed once until the condition has cleared. (c) If a VOFFLINE abort is seen, then this is handled as VBUSY for the moment. (d) If a VNOVOL abort is seen, the volume is rechecked in the VLDB to see if it has been deleted; if not, the fileserver is probably indicating that the volume couldn't be attached and needs salvaging. (e) If statfs() sees one of these aborts, it does not sleep, but rather returns an error, so as not to block the umount program. (5) The fileserver iteration functions in vnode.c are now merged into their callers and more heavily macroised around the cursor. vnode.c is removed. (6) Operations on a particular vnode are serialised on that vnode because the server will lock that vnode whilst it operates on it, so a second op sent will just have to wait. (7) Fileservers are probed with FS.GetCapabilities before being used. This is where service upgrade will be done. (8) A callback interest on a fileserver is set up before an FS operation is performed and passed through to afs_make_call() so that it can be set on the vnode if the operation returns a callback. The callback interest is passed through to afs_iget() also so that it can be set there too. In general, record updating is done on an as-needed basis when we try to access servers, volumes or vnodes rather than offloading it to work items and special threads. Notes: (1) Pre AFS-3.4 servers are no longer supported, though this can be added back if necessary (AFS-3.4 was released in 1998). (2) VBUSY is retried forever for the moment at intervals of 1s. (3) /proc/fs/afs/<cell>/servers no longer exists. Signed-off-by: David Howells <dhowells@redhat.com>
2017-11-13afs: Add an address list conceptDavid Howells
Add an RCU replaceable address list structure to hold a list of server addresses. The list also holds the To this end: (1) A cell's VL server address list can be loaded directly via insmod or echo to /proc/fs/afs/cells or dynamically from a DNS query for AFSDB or SRV records. (2) Anyone wanting to use a cell's VL server address must wait until the cell record comes online and has tried to obtain some addresses. (3) An FS server's address list, for the moment, has a single entry that is the key to the server list. This will change in the future when a server is instead keyed on its UUID and the VL.GetAddrsU operation is used. (4) An 'address cursor' concept is introduced to handle iteration through the address list. This is passed to the afs_make_call() as, in the future, stuff (such as abort code) that doesn't outlast the call will be returned in it. In the future, we might want to annotate the list with information about how each address fares. We might then want to propagate such annotations over address list replacement. Whilst we're at it, we allow IPv6 addresses to be specified in colon-delimited lists by enclosing them in square brackets. Signed-off-by: David Howells <dhowells@redhat.com>