diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2023-02-25 11:48:02 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2023-02-25 11:48:02 -0800 |
commit | 84cc6674b76ba2cdac0df8037b4d8a22a6fc1b77 (patch) | |
tree | 6669f09d4ebc3c551e0234abd2f4b82cefd47ff3 | |
parent | 49d575926890e6ada930bf6f06d62b2fde8fce95 (diff) | |
parent | deeacf35c922da579637f5db625af20baafc66ed (diff) |
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- device feature provisioning in ifcvf, mlx5
- new SolidNET driver
- support for zoned block device in virtio blk
- numa support in virtio pmem
- VIRTIO_F_RING_RESET support in vhost-net
- more debugfs entries in mlx5
- resume support in vdpa
- completion batching in virtio blk
- cleanup of dma api use in vdpa
- now simulating more features in vdpa-sim
- documentation, features, fixes all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits)
vdpa/mlx5: support device features provisioning
vdpa/mlx5: make MTU/STATUS presence conditional on feature bits
vdpa: validate device feature provisioning against supported class
vdpa: validate provisioned device features against specified attribute
vdpa: conditionally read STATUS in config space
vdpa: fix improper error message when adding vdpa dev
vdpa/mlx5: Initialize CVQ iotlb spinlock
vdpa/mlx5: Don't clear mr struct on destroy MR
vdpa/mlx5: Directly assign memory key
tools/virtio: enable to build with retpoline
vringh: fix a typo in comments for vringh_kiov
vhost-vdpa: print warning when vhost_vdpa_alloc_domain fails
scsi: virtio_scsi: fix handling of kmalloc failure
vdpa: Fix a couple of spelling mistakes in some messages
vhost-net: support VIRTIO_F_RING_RESET
vhost-scsi: convert sysfs snprintf and sprintf to sysfs_emit
vdpa: mlx5: support per virtqueue dma device
vdpa: set dma mask for vDPA device
virtio-vdpa: support per vq dma device
vdpa: introduce get_vq_dma_device()
...
47 files changed, 3535 insertions, 502 deletions
diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 7a2584ab63c4..ff9aa1afdc62 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -108,6 +108,7 @@ available subsections can be seen below. vfio-mediated-device vfio vfio-pci-device-specific-driver-acceptance + virtio/index xilinx/index xillybus zorro diff --git a/Documentation/driver-api/virtio/index.rst b/Documentation/driver-api/virtio/index.rst new file mode 100644 index 000000000000..528b14b291e3 --- /dev/null +++ b/Documentation/driver-api/virtio/index.rst @@ -0,0 +1,11 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====== +Virtio +====== + +.. toctree:: + :maxdepth: 1 + + virtio + writing_virtio_drivers diff --git a/Documentation/driver-api/virtio/virtio.rst b/Documentation/driver-api/virtio/virtio.rst new file mode 100644 index 000000000000..7947b4ca690e --- /dev/null +++ b/Documentation/driver-api/virtio/virtio.rst @@ -0,0 +1,145 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _virtio: + +=============== +Virtio on Linux +=============== + +Introduction +============ + +Virtio is an open standard that defines a protocol for communication +between drivers and devices of different types, see Chapter 5 ("Device +Types") of the virtio spec (`[1]`_). Originally developed as a standard +for paravirtualized devices implemented by a hypervisor, it can be used +to interface any compliant device (real or emulated) with a driver. + +For illustrative purposes, this document will focus on the common case +of a Linux kernel running in a virtual machine and using paravirtualized +devices provided by the hypervisor, which exposes them as virtio devices +via standard mechanisms such as PCI. + + +Device - Driver communication: virtqueues +========================================= + +Although the virtio devices are really an abstraction layer in the +hypervisor, they're exposed to the guest as if they are physical devices +using a specific transport method -- PCI, MMIO or CCW -- that is +orthogonal to the device itself. The virtio spec defines these transport +methods in detail, including device discovery, capabilities and +interrupt handling. + +The communication between the driver in the guest OS and the device in +the hypervisor is done through shared memory (that's what makes virtio +devices so efficient) using specialized data structures called +virtqueues, which are actually ring buffers [#f1]_ of buffer descriptors +similar to the ones used in a network device: + +.. kernel-doc:: include/uapi/linux/virtio_ring.h + :identifiers: struct vring_desc + +All the buffers the descriptors point to are allocated by the guest and +used by the host either for reading or for writing but not for both. + +Refer to Chapter 2.5 ("Virtqueues") of the virtio spec (`[1]`_) for the +reference definitions of virtqueues and "Virtqueues and virtio ring: How +the data travels" blog post (`[2]`_) for an illustrated overview of how +the host device and the guest driver communicate. + +The :c:type:`vring_virtqueue` struct models a virtqueue, including the +ring buffers and management data. Embedded in this struct is the +:c:type:`virtqueue` struct, which is the data structure that's +ultimately used by virtio drivers: + +.. kernel-doc:: include/linux/virtio.h + :identifiers: struct virtqueue + +The callback function pointed by this struct is triggered when the +device has consumed the buffers provided by the driver. More +specifically, the trigger will be an interrupt issued by the hypervisor +(see vring_interrupt()). Interrupt request handlers are registered for +a virtqueue during the virtqueue setup process (transport-specific). + +.. kernel-doc:: drivers/virtio/virtio_ring.c + :identifiers: vring_interrupt + + +Device discovery and probing +============================ + +In the kernel, the virtio core contains the virtio bus driver and +transport-specific drivers like `virtio-pci` and `virtio-mmio`. Then +there are individual virtio drivers for specific device types that are +registered to the virtio bus driver. + +How a virtio device is found and configured by the kernel depends on how +the hypervisor defines it. Taking the `QEMU virtio-console +<https://gitlab.com/qemu-project/qemu/-/blob/master/hw/char/virtio-console.c>`__ +device as an example. When using PCI as a transport method, the device +will present itself on the PCI bus with vendor 0x1af4 (Red Hat, Inc.) +and device id 0x1003 (virtio console), as defined in the spec, so the +kernel will detect it as it would do with any other PCI device. + +During the PCI enumeration process, if a device is found to match the +virtio-pci driver (according to the virtio-pci device table, any PCI +device with vendor id = 0x1af4):: + + /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */ + static const struct pci_device_id virtio_pci_id_table[] = { + { PCI_DEVICE(PCI_VENDOR_ID_REDHAT_QUMRANET, PCI_ANY_ID) }, + { 0 } + }; + +then the virtio-pci driver is probed and, if the probing goes well, the +device is registered to the virtio bus:: + + static int virtio_pci_probe(struct pci_dev *pci_dev, + const struct pci_device_id *id) + { + ... + + if (force_legacy) { + rc = virtio_pci_legacy_probe(vp_dev); + /* Also try modern mode if we can't map BAR0 (no IO space). */ + if (rc == -ENODEV || rc == -ENOMEM) + rc = virtio_pci_modern_probe(vp_dev); + if (rc) + goto err_probe; + } else { + rc = virtio_pci_modern_probe(vp_dev); + if (rc == -ENODEV) + rc = virtio_pci_legacy_probe(vp_dev); + if (rc) + goto err_probe; + } + + ... + + rc = register_virtio_device(&vp_dev->vdev); + +When the device is registered to the virtio bus the kernel will look +for a driver in the bus that can handle the device and call that +driver's ``probe`` method. + +At this point, the virtqueues will be allocated and configured by +calling the appropriate ``virtio_find`` helper function, such as +virtio_find_single_vq() or virtio_find_vqs(), which will end up calling +a transport-specific ``find_vqs`` method. + + +References +========== + +_`[1]` Virtio Spec v1.2: +https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html + +.. Check for later versions of the spec as well. + +_`[2]` Virtqueues and virtio ring: How the data travels +https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels + +.. rubric:: Footnotes + +.. [#f1] that's why they may be also referred to as virtrings. diff --git a/Documentation/driver-api/virtio/writing_virtio_drivers.rst b/Documentation/driver-api/virtio/writing_virtio_drivers.rst new file mode 100644 index 000000000000..e14c58796d25 --- /dev/null +++ b/Documentation/driver-api/virtio/writing_virtio_drivers.rst @@ -0,0 +1,197 @@ +.. SPDX-License-Identifier: GPL-2.0 + +.. _writing_virtio_drivers: + +====================== +Writing Virtio Drivers +====================== + +Introduction +============ + +This document serves as a basic guideline for driver programmers that +need to hack a new virtio driver or understand the essentials of the +existing ones. See :ref:`Virtio on Linux <virtio>` for a general +overview of virtio. + + +Driver boilerplate +================== + +As a bare minimum, a virtio driver needs to register in the virtio bus +and configure the virtqueues for the device according to its spec, the +configuration of the virtqueues in the driver side must match the +virtqueue definitions in the device. A basic driver skeleton could look +like this:: + + #include <linux/virtio.h> + #include <linux/virtio_ids.h> + #include <linux/virtio_config.h> + #include <linux/module.h> + + /* device private data (one per device) */ + struct virtio_dummy_dev { + struct virtqueue *vq; + }; + + static void virtio_dummy_recv_cb(struct virtqueue *vq) + { + struct virtio_dummy_dev *dev = vq->vdev->priv; + char *buf; + unsigned int len; + + while ((buf = virtqueue_get_buf(dev->vq, &len)) != NULL) { + /* process the received data */ + } + } + + static int virtio_dummy_probe(struct virtio_device *vdev) + { + struct virtio_dummy_dev *dev = NULL; + + /* initialize device data */ + dev = kzalloc(sizeof(struct virtio_dummy_dev), GFP_KERNEL); + if (!dev) + return -ENOMEM; + + /* the device has a single virtqueue */ + dev->vq = virtio_find_single_vq(vdev, virtio_dummy_recv_cb, "input"); + if (IS_ERR(dev->vq)) { + kfree(dev); + return PTR_ERR(dev->vq); + + } + vdev->priv = dev; + + /* from this point on, the device can notify and get callbacks */ + virtio_device_ready(vdev); + + return 0; + } + + static void virtio_dummy_remove(struct virtio_device *vdev) + { + struct virtio_dummy_dev *dev = vdev->priv; + + /* + * disable vq interrupts: equivalent to + * vdev->config->reset(vdev) + */ + virtio_reset_device(vdev); + + /* detach unused buffers */ + while ((buf = virtqueue_detach_unused_buf(dev->vq)) != NULL) { + kfree(buf); + } + + /* remove virtqueues */ + vdev->config->del_vqs(vdev); + + kfree(dev); + } + + static const struct virtio_device_id id_table[] = { + { VIRTIO_ID_DUMMY, VIRTIO_DEV_ANY_ID }, + { 0 }, + }; + + static struct virtio_driver virtio_dummy_driver = { + .driver.name = KBUILD_MODNAME, + .driver.owner = THIS_MODULE, + .id_table = id_table, + .probe = virtio_dummy_probe, + .remove = virtio_dummy_remove, + }; + + module_virtio_driver(virtio_dummy_driver); + MODULE_DEVICE_TABLE(virtio, id_table); + MODULE_DESCRIPTION("Dummy virtio driver"); + MODULE_LICENSE("GPL"); + +The device id ``VIRTIO_ID_DUMMY`` here is a placeholder, virtio drivers +should be added only for devices that are defined in the spec, see +include/uapi/linux/virtio_ids.h. Device ids need to be at least reserved +in the virtio spec before being added to that file. + +If your driver doesn't have to do anything special in its ``init`` and +``exit`` methods, you can use the module_virtio_driver() helper to +reduce the amount of boilerplate code. + +The ``probe`` method does the minimum driver setup in this case +(memory allocation for the device data) and initializes the +virtqueue. virtio_device_ready() is used to enable the virtqueue and to +notify the device that the driver is ready to manage the device +("DRIVER_OK"). The virtqueues are anyway enabled automatically by the +core after ``probe`` returns. + +.. kernel-doc:: include/linux/virtio_config.h + :identifiers: virtio_device_ready + +In any case, the virtqueues need to be enabled before adding buffers to +them. + +Sending and receiving data +========================== + +The virtio_dummy_recv_cb() callback in the code above will be triggered +when the device notifies the driver after it finishes processing a +descriptor or descriptor chain, either for reading or writing. However, +that's only the second half of the virtio device-driver communication +process, as the communication is always started by the driver regardless +of the direction of the data transfer. + +To configure a buffer transfer from the driver to the device, first you +have to add the buffers -- packed as `scatterlists` -- to the +appropriate virtqueue using any of the virtqueue_add_inbuf(), +virtqueue_add_outbuf() or virtqueue_add_sgs(), depending on whether you +need to add one input `scatterlist` (for the device to fill in), one +output `scatterlist` (for the device to consume) or multiple +`scatterlists`, respectively. Then, once the virtqueue is set up, a call +to virtqueue_kick() sends a notification that will be serviced by the +hypervisor that implements the device:: + + struct scatterlist sg[1]; + sg_init_one(sg, buffer, BUFLEN); + virtqueue_add_inbuf(dev->vq, sg, 1, buffer, GFP_ATOMIC); + virtqueue_kick(dev->vq); + +.. kernel-doc:: drivers/virtio/virtio_ring.c + :identifiers: virtqueue_add_inbuf + +.. kernel-doc:: drivers/virtio/virtio_ring.c + :identifiers: virtqueue_add_outbuf + +.. kernel-doc:: drivers/virtio/virtio_ring.c + :identifiers: virtqueue_add_sgs + +Then, after the device has read or written the buffers prepared by the +driver and notifies it back, the driver can call virtqueue_get_buf() to +read the data produced by the device (if the virtqueue was set up with +input buffers) or simply to reclaim the buffers if they were already +consumed by the device: + +.. kernel-doc:: drivers/virtio/virtio_ring.c + :identifiers: virtqueue_get_buf_ctx + +The virtqueue callbacks can be disabled and re-enabled using the +virtqueue_disable_cb() and the family of virtqueue_enable_cb() functions +respectively. See drivers/virtio/virtio_ring.c for more details: + +.. kernel-doc:: drivers/virtio/virtio_ring.c + :identifiers: virtqueue_disable_cb + +.. kernel-doc:: drivers/virtio/virtio_ring.c + :identifiers: virtqueue_enable_cb + +But note that some spurious callbacks can still be triggered under +certain scenarios. The way to disable callbacks reliably is to reset the +device or the virtqueue (virtio_reset_device()). + + +References +========== + +_`[1]` Virtio Spec v1.2: +https://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html + +Check for later versions of the spec as well. diff --git a/MAINTAINERS b/MAINTAINERS index d247b1137178..a3d7c8945762 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22057,6 +22057,7 @@ S: Maintained F: Documentation/ABI/testing/sysfs-bus-vdpa F: Documentation/ABI/testing/sysfs-class-vduse F: Documentation/devicetree/bindings/virtio/ +F: Documentation/driver-api/virtio/ F: drivers/block/virtio_blk.c F: drivers/crypto/virtio/ F: drivers/net/virtio_net.c @@ -22077,6 +22078,10 @@ IFCVF VIRTIO DATA PATH ACCELERATOR R: Zhu Lingshan <lingshan.zhu@intel.com> F: drivers/vdpa/ifcvf/ +SNET DPU VIRTIO DATA PATH ACCELERATOR +R: Alvaro Karsz <alvaro.karsz@solid-run.com> +F: drivers/vdpa/solidrun/ + VIRTIO BALLOON M: "Michael S. Tsirkin" <mst@redhat.com> M: David Hildenbrand <david@redhat.com> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index dc6e9b989910..2723eede6f21 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -15,6 +15,7 @@ #include <linux/blk-mq.h> #include <linux/blk-mq-virtio.h> #include <linux/numa.h> +#include <linux/vmalloc.h> #include <uapi/linux/virtio_ring.h> #define PART_BITS 4 @@ -80,22 +81,51 @@ struct virtio_blk { int num_vqs; int io_queues[HCTX_MAX_TYPES]; struct virtio_blk_vq *vqs; + + /* For zoned device */ + unsigned int zone_sectors; }; struct virtblk_req { + /* Out header */ struct virtio_blk_outhdr out_hdr; - u8 status; + + /* In header */ + union { + u8 status; + + /* + * The zone append command has an extended in header. + * The status field in zone_append_in_hdr must have + * the same offset in virtblk_req as the non-zoned + * status field above. + */ + struct { + u8 status; + u8 reserved[7]; + __le64 append_sector; + } zone_append_in_hdr; + }; + + size_t in_hdr_len; + struct sg_table sg_table; struct scatterlist sg[]; }; -static inline blk_status_t virtblk_result(struct virtblk_req *vbr) +static inline blk_status_t virtblk_result(u8 status) { - switch (vbr->status) { + switch (status) { case VIRTIO_BLK_S_OK: return BLK_STS_OK; case VIRTIO_BLK_S_UNSUPP: return BLK_STS_NOTSUPP; + case VIRTIO_BLK_S_ZONE_OPEN_RESOURCE: + return BLK_STS_ZONE_OPEN_RESOURCE; + case VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE: + return BLK_STS_ZONE_ACTIVE_RESOURCE; + case VIRTIO_BLK_S_IOERR: + case VIRTIO_BLK_S_ZONE_UNALIGNED_WP: default: return BLK_STS_IOERR; } @@ -111,11 +141,11 @@ static inline struct virtio_blk_vq *get_virtio_blk_vq(struct blk_mq_hw_ctx *hctx static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr) { - struct scatterlist hdr, status, *sgs[3]; + struct scatterlist out_hdr, in_hdr, *sgs[3]; unsigned int num_out = 0, num_in = 0; - sg_init_one(&hdr, &vbr->out_hdr, sizeof(vbr->out_hdr)); - sgs[num_out++] = &hdr; + sg_init_one(&out_hdr, &vbr->out_hdr, sizeof(vbr->out_hdr)); + sgs[num_out++] = &out_hdr; if (vbr->sg_table.nents) { if (vbr->out_hdr.type & cpu_to_virtio32(vq->vdev, VIRTIO_BLK_T_OUT)) @@ -124,8 +154,8 @@ static int virtblk_add_req(struct virtqueue *vq, struct virtblk_req *vbr) sgs[num_out + num_in++] = vbr->sg_table.sgl; } - sg_init_one(&status, &vbr->status, sizeof(vbr->status)); - sgs[num_out + num_in++] = &status; + sg_init_one(&in_hdr, &vbr->status, vbr->in_hdr_len); + sgs[num_out + num_in++] = &in_hdr; return virtqueue_add_sgs(vq, sgs, num_out, num_in, vbr, GFP_ATOMIC); } @@ -212,21 +242,22 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev, struct request *req, struct virtblk_req *vbr) { + size_t in_hdr_len = sizeof(vbr->status); bool unmap = false; u32 type; + u64 sector = 0; - vbr->out_hdr.sector = 0; + /* Set fields for all request types */ + vbr->out_hdr.ioprio = cpu_to_virtio32(vdev, req_get_ioprio(req)); switch (req_op(req)) { case REQ_OP_READ: type = VIRTIO_BLK_T_IN; - vbr->out_hdr.sector = cpu_to_virtio64(vdev, - blk_rq_pos(req)); + sector = blk_rq_pos(req); break; case REQ_OP_WRITE: type = VIRTIO_BLK_T_OUT; - vbr->out_hdr.sector = cpu_to_virtio64(vdev, - blk_rq_pos(req)); + sector = blk_rq_pos(req); break; case REQ_OP_FLUSH: type = VIRTIO_BLK_T_FLUSH; @@ -241,16 +272,42 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev, case REQ_OP_SECURE_ERASE: type = VIRTIO_BLK_T_SECURE_ERASE; break; - case REQ_OP_DRV_IN: - type = VIRTIO_BLK_T_GET_ID; + case REQ_OP_ZONE_OPEN: + type = VIRTIO_BLK_T_ZONE_OPEN; + sector = blk_rq_pos(req); + break; + case REQ_OP_ZONE_CLOSE: + type = VIRTIO_BLK_T_ZONE_CLOSE; + sector = blk_rq_pos(req); + break; + case REQ_OP_ZONE_FINISH: + type = VIRTIO_BLK_T_ZONE_FINISH; + sector = blk_rq_pos(req); break; + case REQ_OP_ZONE_APPEND: + type = VIRTIO_BLK_T_ZONE_APPEND; + sector = blk_rq_pos(req); + in_hdr_len = sizeof(vbr->zone_append_in_hdr); + break; + case REQ_OP_ZONE_RESET: + type = VIRTIO_BLK_T_ZONE_RESET; + sector = blk_rq_pos(req); + break; + case REQ_OP_ZONE_RESET_ALL: + type = VIRTIO_BLK_T_ZONE_RESET_ALL; + break; + case REQ_OP_DRV_IN: + /* Out header already filled in, nothing to do */ + return 0; default: WARN_ON_ONCE(1); return BLK_STS_IOERR; } + /* Set fields for non-REQ_OP_DRV_IN request types */ + vbr->in_hdr_len = in_hdr_len; vbr->out_hdr.type = cpu_to_virtio32(vdev, type); - vbr->out_hdr.ioprio = cpu_to_virtio32(vdev, req_get_ioprio(req)); + vbr->out_hdr.sector = cpu_to_virtio64(vdev, sector); if (type == VIRTIO_BLK_T_DISCARD || type == VIRTIO_BLK_T_WRITE_ZEROES || type == VIRTIO_BLK_T_SECURE_ERASE) { @@ -264,39 +321,74 @@ static blk_status_t virtblk_setup_cmd(struct virtio_device *vdev, static inline void virtblk_request_done(struct request *req) { struct virtblk_req *vbr = blk_mq_rq_to_pdu(req); + blk_status_t status = virtblk_result(vbr->status); virtblk_unmap_data(req, vbr); virtblk_cleanup_cmd(req); - blk_mq_end_request(req, virtblk_result(vbr)); + + if (req_op(req) == REQ_OP_ZONE_APPEND) + req->__sector = le64_to_cpu(vbr->zone_append_in_hdr.append_sector); + + blk_mq_end_request(req, status); +} + +static void virtblk_complete_batch(struct io_comp_batch *iob) +{ + struct request *req; + + rq_list_for_each(&iob->req_list, req) { + virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); + virtblk_cleanup_cmd(req); + } + blk_mq_end_request_batch(iob); +} + +static int virtblk_handle_req(struct virtio_blk_vq *vq, + struct io_comp_batch *iob) +{ + struct virtblk_req *vbr; + int req_done = 0; + unsigned int len; + + while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { + struct request *req = blk_mq_rq_from_pdu(vbr); + + if (likely(!blk_should_fake_timeout(req->q)) && + !blk_mq_complete_request_remote(req) && + !blk_mq_add_to_batch(req, iob, vbr->status, + virtblk_complete_batch)) + virtblk_request_done(req); + req_done++; + } + + return req_done; } static void virtblk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq->vdev->priv; - bool req_done = false; - int qid = vq->index; - struct virtblk_req *vbr; + struct virtio_blk_vq *vblk_vq = &vblk->vqs[vq->index]; + int req_done = 0; unsigned long flags; - unsigned int len; + DEFINE_IO_COMP_BATCH(iob); - spin_lock_irqsave(&vblk->vqs[qid].lock, flags); + spin_lock_irqsave(&vblk_vq->lock, flags); do { virtqueue_disable_cb(vq); - while ((vbr = virtqueue_get_buf(vblk->vqs[qid].vq, &len)) != NULL) { - struct request *req = blk_mq_rq_from_pdu(vbr); + req_done += virtblk_handle_req(vblk_vq, &iob); - if (likely(!blk_should_fake_timeout(req->q))) - blk_mq_complete_request(req); - req_done = true; - } if (unlikely(virtqueue_is_broken(vq))) break; } while (!virtqueue_enable_cb(vq)); - /* In case queue is stopped waiting for more buffers. */ - if (req_done) + if (req_done) { + if (!rq_list_empty(iob.req_list)) + iob.complete(&iob); + + /* In case queue is stopped waiting for more buffers. */ blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); - spin_unlock_irqrestore(&vblk->vqs[qid].lock, flags); + } + spin_unlock_irqrestore(&vblk_vq->lock, flags); } static void virtio_commit_rqs(struct blk_mq_hw_ctx *hctx) @@ -455,6 +547,275 @@ static void virtio_queue_rqs(struct request **rqlist) *rqlist = requeue_list; } +#ifdef CONFIG_BLK_DEV_ZONED +static void *virtblk_alloc_report_buffer(struct virtio_blk *vblk, + unsigned int nr_zones, + unsigned int zone_sectors, + size_t *buflen) +{ + struct request_queue *q = vblk->disk->queue; + size_t bufsize; + void *buf; + + nr_zones = min_t(unsigned int, nr_zones, + get_capacity(vblk->disk) >> ilog2(zone_sectors)); + + bufsize = sizeof(struct virtio_blk_zone_report) + + nr_zones * sizeof(struct virtio_blk_zone_descriptor); + bufsize = min_t(size_t, bufsize, + queue_max_hw_sectors(q) << SECTOR_SHIFT); + bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT); + + while (bufsize >= sizeof(struct virtio_blk_zone_report)) { + buf = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY); + if (buf) { + *buflen = bufsize; + return buf; + } + bufsize >>= 1; + } + + return NULL; +} + +static int virtblk_submit_zone_report(struct virtio_blk *vblk, + char *report_buf, size_t report_len, + sector_t sector) +{ + struct request_queue *q = vblk->disk->queue; + struct request *req; + struct virtblk_req *vbr; + int err; + + req = blk_mq_alloc_request(q, REQ_OP_DRV_IN, 0); + if (IS_ERR(req)) + return PTR_ERR(req); + + vbr = blk_mq_rq_to_pdu(req); + vbr->in_hdr_len = sizeof(vbr->status); + vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_ZONE_REPORT); + vbr->out_hdr.sector = cpu_to_virtio64(vblk->vdev, sector); + + err = blk_rq_map_kern(q, req, report_buf, report_len, GFP_KERNEL); + if (err) + goto out; + + blk_execute_rq(req, false); + err = blk_status_to_errno(virtblk_result(vbr->status)); +out: + blk_mq_free_request(req); + return err; +} + +static int virtblk_parse_zone(struct virtio_blk *vblk, + struct virtio_blk_zone_descriptor *entry, + unsigned int idx, unsigned int zone_sectors, + report_zones_cb cb, void *data) +{ + struct blk_zone zone = { }; + + if (entry->z_type != VIRTIO_BLK_ZT_SWR && + entry->z_type != VIRTIO_BLK_ZT_SWP && + entry->z_type != VIRTIO_BLK_ZT_CONV) { + dev_err(&vblk->vdev->dev, "invalid zone type %#x\n", + entry->z_type); + return -EINVAL; + } + + zone.type = entry->z_type; + zone.cond = entry->z_state; + zone.len = zone_sectors; + zone.capacity = le64_to_cpu(entry->z_cap); + zone.start = le64_to_cpu(entry->z_start); + if (zone.cond == BLK_ZONE_COND_FULL) + zone.wp = zone.start + zone.len; + else + zone.wp = le64_to_cpu(entry->z_wp); + + return cb(&zone, idx, data); +} + +static int virtblk_report_zones(struct gendisk *disk, sector_t sector, + unsigned int nr_zones, report_zones_cb cb, + void *data) +{ + struct virtio_blk *vblk = disk->private_data; + struct virtio_blk_zone_report *report; + unsigned int zone_sectors = vblk->zone_sectors; + unsigned int nz, i; + int ret, zone_idx = 0; + size_t buflen; + + if (WARN_ON_ONCE(!vblk->zone_sectors)) + return -EOPNOTSUPP; + + report = virtblk_alloc_report_buffer(vblk, nr_zones, + zone_sectors, &buflen); + if (!report) + return -ENOMEM; + + while (zone_idx < nr_zones && sector < get_capacity(vblk->disk)) { + memset(report, 0, buflen); + + ret = virtblk_submit_zone_report(vblk, (char *)report, + buflen, sector); + if (ret) { + if (ret > 0) + ret = -EIO; + goto out_free; + } + nz = min((unsigned int)le64_to_cpu(report->nr_zones), nr_zones); + if (!nz) + break; + + for (i = 0; i < nz && zone_idx < nr_zones; i++) { + ret = virtblk_parse_zone(vblk, &report->zones[i], + zone_idx, zone_sectors, cb, data); + if (ret) + goto out_free; + sector = le64_to_cpu(report->zones[i].z_start) + zone_sectors; + zone_idx++; + } + } + + if (zone_idx > 0) + ret = zone_idx; + else + ret = -EINVAL; +out_free: + kvfree(report); + return ret; +} + +static void virtblk_revalidate_zones(struct virtio_blk *vblk) +{ + u8 model; + + if (!vblk->zone_sectors) + return; + + virtio_cread(vblk->vdev, struct virtio_blk_config, + zoned.model, &model); + if (!blk_revalidate_disk_zones(vblk->disk, NULL)) + set_capacity_and_notify(vblk->disk, 0); +} + +static int virtblk_probe_zoned_device(struct virtio_device *vdev, + struct virtio_blk *vblk, + struct request_queue *q) +{ + u32 v; + u8 model; + int ret; + + virtio_cread(vdev, struct virtio_blk_config, + zoned.model, &model); + + switch (model) { + case VIRTIO_BLK_Z_NONE: + return 0; + case VIRTIO_BLK_Z_HM: + break; + case VIRTIO_BLK_Z_HA: + /* + * Present the host-aware device as a regular drive. + * TODO It is possible to add an option to make it appear + * in the system as a zoned drive. + */ + return 0; + default: + dev_err(&vdev->dev, "unsupported zone model %d\n", model); + return -EINVAL; + } + + dev_dbg(&vdev->dev, "probing host-managed zoned device\n"); + + disk_set_zoned(vblk->disk, BLK_ZONED_HM); + blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q); + + virtio_cread(vdev, struct virtio_blk_config, + zoned.max_open_zones, &v); + disk_set_max_open_zones(vblk->disk, le32_to_cpu(v)); + + dev_dbg(&vdev->dev, "max open zones = %u\n", le32_to_cpu(v)); + + virtio_cread(vdev, struct virtio_blk_config, + zoned.max_active_zones, &v); + disk_set_max_active_zones(vblk->disk, le32_to_cpu(v)); + dev_dbg(&vdev->dev, "max active zones = %u\n", le32_to_cpu(v)); + + virtio_cread(vdev, struct virtio_blk_config, + zoned.write_granularity, &v); + if (!v) { + dev_warn(&vdev->dev, "zero write granularity reported\n"); + return -ENODEV; + } + blk_queue_physical_block_size(q, le32_to_cpu(v)); + blk_queue_io_min(q, le32_to_cpu(v)); + + dev_dbg(&vdev->dev, "write granularity = %u\n", le32_to_cpu(v)); + + /* + * virtio ZBD specification doesn't require zones to be a power of + * two sectors in size, but the code in this driver expects that. + */ + virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors, &v); + vblk->zone_sectors = le32_to_cpu(v); + if (vblk->zone_sectors == 0 || !is_power_of_2(vblk->zone_sectors)) { + dev_err(&vdev->dev, + "zoned device with non power of two zone size %u\n", + vblk->zone_sectors); + return -ENODEV; + } + dev_dbg(&vdev->dev, "zone sectors = %u\n", vblk->zone_sectors); + + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) { + dev_warn(&vblk->vdev->dev, + "ignoring negotiated F_DISCARD for zoned device\n"); + blk_queue_max_discard_sectors(q, 0); + } + + ret = blk_revalidate_disk_zones(vblk->disk, NULL); + if (!ret) { + virtio_cread(vdev, struct virtio_blk_config, + zoned.max_append_sectors, &v); + if (!v) { + dev_warn(&vdev->dev, "zero max_append_sectors reported\n"); + return -ENODEV; + } + blk_queue_max_zone_append_sectors(q, le32_to_cpu(v)); + dev_dbg(&vdev->dev, "max append sectors = %u\n", le32_to_cpu(v)); + } + + return ret; +} + +static inline bool virtblk_has_zoned_feature(struct virtio_device *vdev) +{ + return virtio_has_feature(vdev, VIRTIO_BLK_F_ZONED); +} +#else + +/* + * Zoned block device support is not configured in this kernel. + * We only need to define a few symbols to avoid compilation errors. + */ +#define virtblk_report_zones NULL +static inline void virtblk_revalidate_zones(struct virtio_blk *vblk) +{ +} +static inline int virtblk_probe_zoned_device(struct virtio_device *vdev, + struct virtio_blk *vblk, struct request_queue *q) +{ + return -EOPNOTSUPP; +} + +static inline bool virtblk_has_zoned_feature(struct virtio_device *vdev) +{ + return false; +} +#endif /* CONFIG_BLK_DEV_ZONED */ + /* return id (s/n) string for *disk to *id_str */ static int virtblk_get_id(struct gendisk *disk, char *id_str) @@ -462,18 +823,24 @@ static int virtblk_get_id(struct gendisk *disk, char *id_str) struct virtio_blk *vblk = disk->private_data; struct request_queue *q = vblk->disk->queue; struct request *req; + struct virtblk_req *vbr; int err; req = blk_mq_alloc_request(q, REQ_OP_DRV_IN, 0); if (IS_ERR(req)) return PTR_ERR(req); + vbr = blk_mq_rq_to_pdu(req); + vbr->in_hdr_len = sizeof(vbr->status); + vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev, VIRTIO_BLK_T_GET_ID); + vbr->out_hdr.sector = 0; + err = blk_rq_map_kern(q, req, id_str, VIRTIO_BLK_ID_BYTES, GFP_KERNEL); if (err) goto out; blk_execute_rq(req, false); - err = blk_status_to_errno(virtblk_result(blk_mq_rq_to_pdu(req))); + err = blk_status_to_errno(virtblk_result(vbr->status)); out: blk_mq_free_request(req); return err; @@ -524,6 +891,7 @@ static const struct block_device_operations virtblk_fops = { .owner = THIS_MODULE, .getgeo = virtblk_getgeo, .free_disk = virtblk_free_disk, + .report_zones = virtblk_report_zones, }; static int index_to_minor(int index) @@ -594,6 +962,7 @@ static void virtblk_config_changed_work(struct work_struct *work) struct virtio_blk *vblk = container_of(work, struct virtio_blk, config_work); + virtblk_revalidate_zones(vblk); virtblk_update_capacity(vblk, true); } @@ -835,36 +1204,15 @@ static void virtblk_map_queues(struct blk_mq_tag_set *set) } } -static void virtblk_complete_batch(struct io_comp_batch *iob) -{ - struct request *req; - - rq_list_for_each(&iob->req_list, req) { - virtblk_unmap_data(req, blk_mq_rq_to_pdu(req)); - virtblk_cleanup_cmd(req); - } - blk_mq_end_request_batch(iob); -} - static int virtblk_poll(struct blk_mq_hw_ctx *hctx, struct io_comp_batch *iob) { struct virtio_blk *vblk = hctx->queue->queuedata; struct virtio_blk_vq *vq = get_virtio_blk_vq(hctx); - struct virtblk_req *vbr; unsigned long flags; - unsigned int len; int found = 0; spin_lock_irqsave(&vq->lock, flags); - - while ((vbr = virtqueue_get_buf(vq->vq, &len)) != NULL) { - struct request *req = blk_mq_rq_from_pdu(vbr); - - found++; - if (!blk_mq_add_to_batch(req, iob, vbr->status, - virtblk_complete_batch)) - blk_mq_complete_request(req); - } + found = virtblk_handle_req(vq, iob); if (found) blk_mq_start_stopped_hw_queues(vblk->disk->queue, true); @@ -1150,6 +1498,15 @@ static int virtblk_probe(struct virtio_device *vdev) virtblk_update_capacity(vblk, false); virtio_device_ready(vdev); + if (virtblk_has_zoned_feature(vdev)) { + err = virtblk_probe_zoned_device(vdev, vblk, q); + if (err) + goto out_cleanup_disk; + } + + dev_info(&vdev->dev, "blk config size: %zu\n", + sizeof(struct virtio_blk_config)); + err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups); if (err) goto out_cleanup_disk; @@ -1251,6 +1608,9 @@ static unsigned int features[] = { VIRTIO_BLK_F_FLUSH, VIRTIO_BLK_F_TOPOLOGY, VIRTIO_BLK_F_CONFIG_WCE, VIRTIO_BLK_F_MQ, VIRTIO_BLK_F_DISCARD, VIRTIO_BLK_F_WRITE_ZEROES, VIRTIO_BLK_F_SECURE_ERASE, +#ifdef CONFIG_BLK_DEV_ZONED + VIRTIO_BLK_F_ZONED, +#endif /* CONFIG_BLK_DEV_ZONED */ }; static struct virtio_driver virtio_blk = { diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c index 20da455d2ef6..a92eb172f0e7 100644 --- a/drivers/nvdimm/virtio_pmem.c +++ b/drivers/nvdimm/virtio_pmem.c @@ -32,7 +32,6 @@ static int init_vq(struct virtio_pmem *vpmem) static int virtio_pmem_probe(struct virtio_device *vdev) { struct nd_region_desc ndr_desc = {}; - int nid = dev_to_node(&vdev->dev); struct nd_region *nd_region; struct virtio_pmem *vpmem; struct resource res; @@ -79,7 +78,15 @@ static int virtio_pmem_probe(struct virtio_device *vdev) dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus); ndr_desc.res = &res; - ndr_desc.numa_node = nid; + + ndr_desc.numa_node = memory_add_physaddr_to_nid(res.start); + ndr_desc.target_node = phys_to_target_node(res.start); + if (ndr_desc.target_node == NUMA_NO_NODE) { + ndr_desc.target_node = ndr_desc.numa_node; + dev_dbg(&vdev->dev, "changing target node from %d to %d", + NUMA_NO_NODE, ndr_desc.target_node); + } + ndr_desc.flush = async_pmem_flush; ndr_desc.provider_data = vdev; set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags); diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 494fa46f5767..44cab813bf95 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -5366,6 +5366,14 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x7901, quirk_no_flr); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, quirk_no_flr); DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, quirk_no_flr); +/* FLR may cause the SolidRun SNET DPU (rev 0x1) to hang */ +static void quirk_no_flr_snet(struct pci_dev *dev) +{ + if (dev->revision == 0x1) + quirk_no_flr(dev); +} +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SOLIDRUN, 0x1000, quirk_no_flr_snet); + static void quirk_no_ext_tags(struct pci_dev *pdev) { struct pci_host_bridge *bridge = pci_find_host_bridge(pdev->bus); diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c index b221c3c99320..c5558c45ab3a 100644 --- a/drivers/scsi/virtio_scsi.c +++ b/drivers/scsi/virtio_scsi.c @@ -330,7 +330,7 @@ static void virtscsi_handle_param_change(struct virtio_scsi *vscsi, scsi_device_put(sdev); } -static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi) +static int virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi) { struct scsi_device *sdev; struct Scsi_Host *shost = virtio_scsi_host(vscsi->vdev); @@ -338,6 +338,11 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi) int result, inquiry_len, inq_result_len = 256; char *inq_result = kmalloc(inq_result_len, GFP_KERNEL); + if (!inq_result) { + kfree(inq_result); + return -ENOMEM; + } + shost_for_each_device(sdev, shost) { inquiry_len = sdev->inquiry_len ? sdev->inquiry_len : 36; @@ -366,6 +371,7 @@ static void virtscsi_rescan_hotunplug(struct virtio_scsi *vscsi) } kfree(inq_result); + return 0; } static void virtscsi_handle_event(struct work_struct *work) @@ -377,9 +383,13 @@ static void virtscsi_handle_event(struct work_struct *work) if (event->event & cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED)) { + int ret; + event->event &= ~cpu_to_virtio32(vscsi->vdev, VIRTIO_SCSI_T_EVENTS_MISSED); - virtscsi_rescan_hotunplug(vscsi); + ret = virtscsi_rescan_hotunplug(vscsi); + if (ret) + return; scsi_scan_host(virtio_scsi_host(vscsi->vdev)); } diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index 50f45d037611..cd6ad92f3f05 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -71,6 +71,18 @@ config MLX5_VDPA_NET be executed by the hardware. It also supports a variety of stateless offloads depending on the actual device used and firmware version. +config MLX5_VDPA_STEERING_DEBUG + bool "expose steering counters on debugfs" + select MLX5_VDPA + help + Expose RX steering counters in debugfs to aid in debugging. For each VLAN + or non VLAN interface, two hardware counters are added to the RX flow + table: one for unicast and one for multicast. + The counters counts the number of packets and bytes and exposes them in + debugfs. Once can read the counters using, e.g.: + cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/ucast/packets + cat /sys/kernel/debug/mlx5/mlx5_core.sf.1/vdpa-0/rx/untagged/mcast/bytes + config VP_VDPA tristate "Virtio PCI bridge vDPA driver" select VIRTIO_PCI_LIB @@ -86,4 +98,22 @@ config ALIBABA_ENI_VDPA VDPA driver for Alibaba ENI (Elastic Network Interface) which is built upon virtio 0.9.5 specification. + config SNET_VDPA + tristate "SolidRun's vDPA driver for SolidNET" + depends on PCI_MSI && PCI_IOV && (HWMON || HWMON=n) + + # This driver MAY create a HWMON device. + # Depending on (HWMON || HWMON=n) ensures that: + # If HWMON=n the driver can be compiled either as a module or built-in. + # If HWMON=y the driver can be compiled either as a module or built-in. + # If HWMON=m the driver is forced to be compiled as a module. + # By doing so, IS_ENABLED can be used instead of IS_REACHABLE + + help + vDPA driver for SolidNET DPU. + With this driver, the VirtIO dataplane can be + offloaded to a SolidNET DPU. + This driver includes a HW monitor device that + reads health values from the DPU. + endif # VDPA diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index 15665563a7f4..59396ff2a318 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -6,3 +6,4 @@ obj-$(CONFIG_IFCVF) += ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA) += virtio_pci/ obj-$(CONFIG_ALIBABA_ENI_VDPA) += alibaba/ +obj-$(CONFIG_SNET_VDPA) += solidrun/ diff --git a/drivers/vdpa/ifcvf/ifcvf_base.c b/drivers/vdpa/ifcvf/ifcvf_base.c index 3e4486bfa0b7..5563b3a773c7 100644 --- a/drivers/vdpa/ifcvf/ifcvf_base.c +++ b/drivers/vdpa/ifcvf/ifcvf_base.c @@ -10,11 +10,6 @@ #include "ifcvf_base.h" -struct ifcvf_adapter *vf_to_adapter(struct ifcvf_hw *hw) -{ - return container_of(hw, struct ifcvf_adapter, vf); -} - u16 ifcvf_set_vq_vector(struct ifcvf_hw *hw, u16 qid, int vector) { struct virtio_pci_common_cfg __iomem *cfg = hw->common_cfg; @@ -37,8 +32,6 @@ u16 ifcvf_set_config_vector(struct ifcvf_hw *hw, int vector) static void __iomem *get_cap_addr(struct ifcvf_hw *hw, struct virtio_pci_cap *cap) { - struct ifcvf_adapter *ifcvf; - struct pci_dev *pdev; u32 length, offset; u8 bar; @@ -46,17 +39,14 @@ static void __iomem *get_cap_addr(struct ifcvf_hw *hw, offset = le32_to_cpu(cap->offset); bar = cap->bar; - ifcvf= vf_to_adapter(hw); - pdev = ifcvf->pdev; - if (bar >= IFCVF_PCI_MAX_RESOURCE) { - IFCVF_DBG(pdev, + IFCVF_DBG(hw->pdev, "Invalid bar number %u to get capabilities\n", bar); return NULL; } - if (offset + length > pci_resource_len(pdev, bar)) { - IFCVF_DBG(pdev, + if (offset + length > pci_resource_len(hw->pdev, bar)) { + IFCVF_DBG(hw->pdev, "offset(%u) + len(%u) overflows bar%u's capability\n", offset, length, bar); return NULL; @@ -92,6 +82,7 @@ int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *pdev) IFCVF_ERR(pdev, "Failed to read PCI capability list\n"); return -EIO; } + hw->pdev = pdev; while (pos) { ret = ifcvf_read_config_range(pdev, (u32 *)&cap, @@ -215,15 +206,13 @@ u64 ifcvf_get_hw_features(struct ifcvf_hw *hw) u64 ifcvf_get_features(struct ifcvf_hw *hw) { - return hw->hw_features; + return hw->dev_features; } int ifcvf_verify_min_features(struct ifcvf_hw *hw, u64 features) { - struct ifcvf_adapter *ifcvf = vf_to_adapter(hw); - if (!(features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)) && features) { - IFCVF_ERR(ifcvf->pdev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n"); + IFCVF_ERR(hw->pdev, "VIRTIO_F_ACCESS_PLATFORM is not negotiated\n"); return -EINVAL; } @@ -232,13 +221,11 @@ int ifcvf_verify_min_features(struct ifcvf_hw *hw, u64 features) u32 ifcvf_get_config_size(struct ifcvf_hw *hw) { - struct ifcvf_adapter *adapter; u32 net_config_size = sizeof(struct virtio_net_config); u32 blk_config_size = sizeof(struct virtio_blk_config); u32 cap_size = hw->cap_dev_config_size; u32 config_size; - adapter = vf_to_adapter(hw); /* If the onboard device config space size is greater than * the size of struct virtio_net/blk_config, only the spec * implementing contents size is returned, this is very @@ -253,7 +240,7 @@ u32 ifcvf_get_config_size(struct ifcvf_hw *hw) break; default: config_size = 0; - IFCVF_ERR(adapter->pdev, "VIRTIO ID %u not supported\n", hw->dev_type); + IFCVF_ERR(hw->pdev, "VIRTIO ID %u not supported\n", hw->dev_type); } return config_size; @@ -301,14 +288,11 @@ static void ifcvf_set_features(struct ifcvf_hw *hw, u64 features) static int ifcvf_config_features(struct ifcvf_hw *hw) { - struct ifcvf_adapter *ifcvf; - - ifcvf = vf_to_adapter(hw); ifcvf_set_features(hw, hw->req_features); ifcvf_add_status(hw, VIRTIO_CONFIG_S_FEATURES_OK); if (!(ifcvf_get_status(hw) & VIRTIO_CONFIG_S_FEATURES_OK)) { - IFCVF_ERR(ifcvf->pdev, "Failed to set FEATURES_OK status\n"); + IFCVF_ERR(hw->pdev, "Failed to set FEATURES_OK status\n"); return -EIO; } diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h index f5563f665cc6..c20d1c40214e 100644 --- a/drivers/vdpa/ifcvf/ifcvf_base.h +++ b/drivers/vdpa/ifcvf/ifcvf_base.h @@ -19,6 +19,7 @@ #include <uapi/linux/virtio_blk.h> #include <uapi/linux/virtio_config.h> #include <uapi/linux/virtio_pci.h> +#include <uapi/linux/vdpa.h> #define N3000_DEVICE_ID 0x1041 #define N3000_SUBSYS_DEVICE_ID 0x001A @@ -38,9 +39,6 @@ #define IFCVF_DBG(pdev, fmt, ...) dev_dbg(&pdev->dev, fmt, ##__VA_ARGS__) #define IFCVF_INFO(pdev, fmt, ...) dev_info(&pdev->dev, fmt, ##__VA_ARGS__) -#define ifcvf_private_to_vf(adapter) \ - (&((struct ifcvf_adapter *)adapter)->vf) - /* all vqs and config interrupt has its own vector */ #define MSIX_VECTOR_PER_VQ_AND_CONFIG 1 /* all vqs share a vector, and config interrupt has a separate vector */ @@ -78,6 +76,8 @@ struct ifcvf_hw { u32 dev_type; u64 req_features; u64 hw_features; + /* provisioned device features */ + u64 dev_features; struct virtio_pci_common_cfg __iomem *common_cfg; void __iomem *dev_cfg; struct vring_info vring[IFCVF_MAX_QUEUES]; @@ -89,12 +89,13 @@ struct ifcvf_hw { u16 nr_vring; /* VIRTIO_PCI_CAP_DEVICE_CFG size */ u32 cap_dev_config_size; + struct pci_dev *pdev; }; struct ifcvf_adapter { struct vdpa_device vdpa; struct pci_dev *pdev; - struct ifcvf_hw vf; + struct ifcvf_hw *vf; }; struct ifcvf_vring_lm_cfg { @@ -109,6 +110,7 @@ struct ifcvf_lm_cfg { struct ifcvf_vdpa_mgmt_dev { struct vdpa_mgmt_dev mdev; + struct ifcvf_hw vf; struct ifcvf_adapter *adapter; struct pci_dev *pdev; }; diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c index 44b29289aa19..7f78c47e40d6 100644 --- a/drivers/vdpa/ifcvf/ifcvf_main.c +++ b/drivers/vdpa/ifcvf/ifcvf_main.c @@ -69,10 +69,9 @@ static void ifcvf_free_irq_vectors(void *data) pci_free_irq_vectors(data); } -static void ifcvf_free_per_vq_irq(struct ifcvf_adapter *adapter) +static void ifcvf_free_per_vq_irq(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; - struct ifcvf_hw *vf = &adapter->vf; + struct pci_dev *pdev = vf->pdev; int i; for (i = 0; i < vf->nr_vring; i++) { @@ -83,10 +82,9 @@ static void ifcvf_free_per_vq_irq(struct ifcvf_adapter *adapter) } } -static void ifcvf_free_vqs_reused_irq(struct ifcvf_adapter *adapter) +static void ifcvf_free_vqs_reused_irq(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; - struct ifcvf_hw *vf = &adapter->vf; + struct pci_dev *pdev = vf->pdev; if (vf->vqs_reused_irq != -EINVAL) { devm_free_irq(&pdev->dev, vf->vqs_reused_irq, vf); @@ -95,20 +93,17 @@ static void ifcvf_free_vqs_reused_irq(struct ifcvf_adapter *adapter) } -static void ifcvf_free_vq_irq(struct ifcvf_adapter *adapter) +static void ifcvf_free_vq_irq(struct ifcvf_hw *vf) { - struct ifcvf_hw *vf = &adapter->vf; - if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG) - ifcvf_free_per_vq_irq(adapter); + ifcvf_free_per_vq_irq(vf); else - ifcvf_free_vqs_reused_irq(adapter); + ifcvf_free_vqs_reused_irq(vf); } -static void ifcvf_free_config_irq(struct ifcvf_adapter *adapter) +static void ifcvf_free_config_irq(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; - struct ifcvf_hw *vf = &adapter->vf; + struct pci_dev *pdev = vf->pdev; if (vf->config_irq == -EINVAL) return; @@ -123,12 +118,12 @@ static void ifcvf_free_config_irq(struct ifcvf_adapter *adapter) } } -static void ifcvf_free_irq(struct ifcvf_adapter *adapter) +static void ifcvf_free_irq(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; + struct pci_dev *pdev = vf->pdev; - ifcvf_free_vq_irq(adapter); - ifcvf_free_config_irq(adapter); + ifcvf_free_vq_irq(vf); + ifcvf_free_config_irq(vf); ifcvf_free_irq_vectors(pdev); } @@ -137,10 +132,9 @@ static void ifcvf_free_irq(struct ifcvf_adapter *adapter) * It returns the number of allocated vectors, negative * return value when fails. */ -static int ifcvf_alloc_vectors(struct ifcvf_adapter *adapter) +static int ifcvf_alloc_vectors(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; - struct ifcvf_hw *vf = &adapter->vf; + struct pci_dev *pdev = vf->pdev; int max_intr, ret; /* all queues and config interrupt */ @@ -160,10 +154,9 @@ static int ifcvf_alloc_vectors(struct ifcvf_adapter *adapter) return ret; } -static int ifcvf_request_per_vq_irq(struct ifcvf_adapter *adapter) +static int ifcvf_request_per_vq_irq(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; - struct ifcvf_hw *vf = &adapter->vf; + struct pci_dev *pdev = vf->pdev; int i, vector, ret, irq; vf->vqs_reused_irq = -EINVAL; @@ -190,15 +183,14 @@ static int ifcvf_request_per_vq_irq(struct ifcvf_adapter *adapter) return 0; err: - ifcvf_free_irq(adapter); + ifcvf_free_irq(vf); return -EFAULT; } -static int ifcvf_request_vqs_reused_irq(struct ifcvf_adapter *adapter) +static int ifcvf_request_vqs_reused_irq(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; - struct ifcvf_hw *vf = &adapter->vf; + struct pci_dev *pdev = vf->pdev; int i, vector, ret, irq; vector = 0; @@ -224,15 +216,14 @@ static int ifcvf_request_vqs_reused_irq(struct ifcvf_adapter *adapter) return 0; err: - ifcvf_free_irq(adapter); + ifcvf_free_irq(vf); return -EFAULT; } -static int ifcvf_request_dev_irq(struct ifcvf_adapter *adapter) +static int ifcvf_request_dev_irq(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; - struct ifcvf_hw *vf = &adapter->vf; + struct pci_dev *pdev = vf->pdev; int i, vector, ret, irq; vector = 0; @@ -265,29 +256,27 @@ static int ifcvf_request_dev_irq(struct ifcvf_adapter *adapter) return 0; err: - ifcvf_free_irq(adapter); + ifcvf_free_irq(vf); return -EFAULT; } -static int ifcvf_request_vq_irq(struct ifcvf_adapter *adapter) +static int ifcvf_request_vq_irq(struct ifcvf_hw *vf) { - struct ifcvf_hw *vf = &adapter->vf; int ret; if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG) - ret = ifcvf_request_per_vq_irq(adapter); + ret = ifcvf_request_per_vq_irq(vf); else - ret = ifcvf_request_vqs_reused_irq(adapter); + ret = ifcvf_request_vqs_reused_irq(vf); return ret; } -static int ifcvf_request_config_irq(struct ifcvf_adapter *adapter) +static int ifcvf_request_config_irq(struct ifcvf_hw *vf) { - struct pci_dev *pdev = adapter->pdev; - struct ifcvf_hw *vf = &adapter->vf; + struct pci_dev *pdev = vf->pdev; int config_vector, ret; if (vf->msix_vector_status == MSIX_VECTOR_PER_VQ_AND_CONFIG) @@ -320,17 +309,16 @@ static int ifcvf_request_config_irq(struct ifcvf_adapter *adapter) return 0; err: - ifcvf_free_irq(adapter); + ifcvf_free_irq(vf); return -EFAULT; } -static int ifcvf_request_irq(struct ifcvf_adapter *adapter) +static int ifcvf_request_irq(struct ifcvf_hw *vf) { - struct ifcvf_hw *vf = &adapter->vf; int nvectors, ret, max_intr; - nvectors = ifcvf_alloc_vectors(adapter); + nvectors = ifcvf_alloc_vectors(vf); if (nvectors <= 0) return -EFAULT; @@ -341,16 +329,16 @@ static int ifcvf_request_irq(struct ifcvf_adapter *adapter) if (nvectors == 1) { vf->msix_vector_status = MSIX_VECTOR_DEV_SHARED; - ret = ifcvf_request_dev_irq(adapter); + ret = ifcvf_request_dev_irq(vf); return ret; } - ret = ifcvf_request_vq_irq(adapter); + ret = ifcvf_request_vq_irq(vf); if (ret) return ret; - ret = ifcvf_request_config_irq(adapter); + ret = ifcvf_request_config_irq(vf); if (ret) return ret; @@ -358,9 +346,9 @@ static int ifcvf_request_irq(struct ifcvf_adapter *adapter) return 0; } -static int ifcvf_start_datapath(void *private) +static int ifcvf_start_datapath(struct ifcvf_adapter *adapter) { - struct ifcvf_hw *vf = ifcvf_private_to_vf(private); + struct ifcvf_hw *vf = adapter->vf; u8 status; int ret; @@ -374,9 +362,9 @@ static int ifcvf_start_datapath(void *private) return ret; } -static int ifcvf_stop_datapath(void *private) +static int ifcvf_stop_datapath(struct ifcvf_adapter *adapter) { - struct ifcvf_hw *vf = ifcvf_private_to_vf(private); + struct ifcvf_hw *vf = adapter->vf; int i; for (i = 0; i < vf->nr_vring; i++) @@ -389,7 +377,7 @@ static int ifcvf_stop_datapath(void *private) static void ifcvf_reset_vring(struct ifcvf_adapter *adapter) { - struct ifcvf_hw *vf = ifcvf_private_to_vf(adapter); + struct ifcvf_hw *vf = adapter->vf; int i; for (i = 0; i < vf->nr_vring; i++) { @@ -414,7 +402,7 @@ static struct ifcvf_hw *vdpa_to_vf(struct vdpa_device *vdpa_dev) { struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev); - return &adapter->vf; + return adapter->vf; } static u64 ifcvf_vdpa_get_device_features(struct vdpa_device *vdpa_dev) @@ -479,7 +467,7 @@ static void ifcvf_vdpa_set_status(struct vdpa_device *vdpa_dev, u8 status) if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) { - ret = ifcvf_request_irq(adapter); + ret = ifcvf_request_irq(vf); if (ret) { status = ifcvf_get_status(vf); status |= VIRTIO_CONFIG_S_FAILED; @@ -511,7 +499,7 @@ static int ifcvf_vdpa_reset(struct vdpa_device *vdpa_dev) if (status_old & VIRTIO_CONFIG_S_DRIVER_OK) { ifcvf_stop_datapath(adapter); - ifcvf_free_irq(adapter); + ifcvf_free_irq(vf); } ifcvf_reset_vring(adapter); @@ -755,17 +743,37 @@ static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, struct vdpa_device *vdpa_dev; struct pci_dev *pdev; struct ifcvf_hw *vf; + u64 device_features; int ret; ifcvf_mgmt_dev = container_of(mdev, struct ifcvf_vdpa_mgmt_dev, mdev); - if (!ifcvf_mgmt_dev->adapter) - return -EOPNOTSUPP; + vf = &ifcvf_mgmt_dev->vf; + pdev = vf->pdev; + adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, + &pdev->dev, &ifc_vdpa_ops, 1, 1, NULL, false); + if (IS_ERR(adapter)) { + IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); + return PTR_ERR(adapter); + } - adapter = ifcvf_mgmt_dev->adapter; - vf = &adapter->vf; - pdev = adapter->pdev; + ifcvf_mgmt_dev->adapter = adapter; + adapter->pdev = pdev; + adapter->vdpa.dma_dev = &pdev->dev; + adapter->vdpa.mdev = mdev; + adapter->vf = vf; vdpa_dev = &adapter->vdpa; + device_features = vf->hw_features; + if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) { + if (config->device_features & ~device_features) { + IFCVF_ERR(pdev, "The provisioned features 0x%llx are not supported by this device with features 0x%llx\n", + config->device_features, device_features); + return -EINVAL; + } + device_features &= config->device_features; + } + vf->dev_features = device_features; + if (name) ret = dev_set_name(&vdpa_dev->dev, "%s", name); else @@ -781,7 +789,6 @@ static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, return 0; } - static void ifcvf_vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev) { struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; @@ -800,7 +807,6 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; struct device *dev = &pdev->dev; - struct ifcvf_adapter *adapter; struct ifcvf_hw *vf; u32 dev_type; int ret, i; @@ -831,20 +837,16 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) } pci_set_master(pdev); - - adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, - dev, &ifc_vdpa_ops, 1, 1, NULL, false); - if (IS_ERR(adapter)) { - IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); - return PTR_ERR(adapter); + ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL); + if (!ifcvf_mgmt_dev) { + IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n"); + return -ENOMEM; } - vf = &adapter->vf; + vf = &ifcvf_mgmt_dev->vf; vf->dev_type = get_dev_type(pdev); vf->base = pcim_iomap_table(pdev); - - adapter->pdev = pdev; - adapter->vdpa.dma_dev = &pdev->dev; + vf->pdev = pdev; ret = ifcvf_init_hw(vf, pdev); if (ret) { @@ -858,16 +860,6 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) vf->hw_features = ifcvf_get_hw_features(vf); vf->config_size = ifcvf_get_config_size(vf); - ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL); - if (!ifcvf_mgmt_dev) { - IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n"); - return -ENOMEM; - } - - ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops; - ifcvf_mgmt_dev->mdev.device = dev; - ifcvf_mgmt_dev->adapter = adapter; - dev_type = get_dev_type(pdev); switch (dev_type) { case VIRTIO_ID_NET: @@ -882,11 +874,11 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto err; } + ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops; + ifcvf_mgmt_dev->mdev.device = dev; ifcvf_mgmt_dev->mdev.max_supported_vqs = vf->nr_vring; ifcvf_mgmt_dev->mdev.supported_features = vf->hw_features; - - adapter->vdpa.mdev = &ifcvf_mgmt_dev->mdev; - + ifcvf_mgmt_dev->mdev.config_attr_mask = (1 << VDPA_ATTR_DEV_FEATURES); ret = vdpa_mgmtdev_register(&ifcvf_mgmt_dev->mdev); if (ret) { diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile index f717978c83bf..e791394c33e3 100644 --- a/drivers/vdpa/mlx5/Makefile +++ b/drivers/vdpa/mlx5/Makefile @@ -1,4 +1,4 @@ subdir-ccflags-y += -I$(srctree)/drivers/vdpa/mlx5/core obj-$(CONFIG_MLX5_VDPA_NET) += mlx5_vdpa.o -mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/mlx5_vnet.o core/resources.o core/mr.o +mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/mlx5_vnet.o core/resources.o core/mr.o net/debug.o diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c index 0a1e0b0dc37e..03e543229791 100644 --- a/drivers/vdpa/mlx5/core/mr.c +++ b/drivers/vdpa/mlx5/core/mr.c @@ -503,7 +503,6 @@ void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev) else destroy_dma_mr(mvdev, mr); - memset(mr, 0, sizeof(*mr)); mr->initialized = false; out: mutex_unlock(&mr->mkey_mtx); diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c index 9800f9bec225..d5a59c9035fb 100644 --- a/drivers/vdpa/mlx5/core/resources.c +++ b/drivers/vdpa/mlx5/core/resources.c @@ -213,7 +213,7 @@ int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, u32 *mkey, u32 *in, return err; mkey_index = MLX5_GET(create_mkey_out, lout, mkey_index); - *mkey |= mlx5_idx_to_mkey(mkey_index); + *mkey = mlx5_idx_to_mkey(mkey_index); return 0; } @@ -233,6 +233,7 @@ static int init_ctrl_vq(struct mlx5_vdpa_dev *mvdev) if (!mvdev->cvq.iotlb) return -ENOMEM; + spin_lock_init(&mvdev->cvq.iommu_lock); vringh_set_iotlb(&mvdev->cvq.vring, mvdev->cvq.iotlb, &mvdev->cvq.iommu_lock); return 0; diff --git a/drivers/vdpa/mlx5/net/debug.c b/drivers/vdpa/mlx5/net/debug.c new file mode 100644 index 000000000000..60d6ac68cdc4 --- /dev/null +++ b/drivers/vdpa/mlx5/net/debug.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ + +#include <linux/debugfs.h> +#include <linux/mlx5/fs.h> +#include "mlx5_vnet.h" + +static int tirn_show(struct seq_file *file, void *priv) +{ + struct mlx5_vdpa_net *ndev = file->private; + + seq_printf(file, "0x%x\n", ndev->res.tirn); + return 0; +} + +DEFINE_SHOW_ATTRIBUTE(tirn); + +void mlx5_vdpa_remove_tirn(struct mlx5_vdpa_net *ndev) +{ + if (ndev->debugfs) + debugfs_remove(ndev->res.tirn_dent); +} + +void mlx5_vdpa_add_tirn(struct mlx5_vdpa_net *ndev) +{ + ndev->res.tirn_dent = debugfs_create_file("tirn", 0444, ndev->rx_dent, + ndev, &tirn_fops); +} + +static int rx_flow_table_show(struct seq_file *file, void *priv) +{ + struct mlx5_vdpa_net *ndev = file->private; + + seq_printf(file, "0x%x\n", mlx5_flow_table_id(ndev->rxft)); + return 0; +} + +DEFINE_SHOW_ATTRIBUTE(rx_flow_table); + +void mlx5_vdpa_remove_rx_flow_table(struct mlx5_vdpa_net *ndev) +{ + if (ndev->debugfs) + debugfs_remove(ndev->rx_table_dent); +} + +void mlx5_vdpa_add_rx_flow_table(struct mlx5_vdpa_net *ndev) +{ + ndev->rx_table_dent = debugfs_create_file("table_id", 0444, ndev->rx_dent, + ndev, &rx_flow_table_fops); +} + +#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) +static int packets_show(struct seq_file *file, void *priv) +{ + struct mlx5_vdpa_counter *counter = file->private; + u64 packets; + u64 bytes; + int err; + + err = mlx5_fc_query(counter->mdev, counter->counter, &packets, &bytes); + if (err) + return err; + + seq_printf(file, "0x%llx\n", packets); + return 0; +} + +static int bytes_show(struct seq_file *file, void *priv) +{ + struct mlx5_vdpa_counter *counter = file->private; + u64 packets; + u64 bytes; + int err; + + err = mlx5_fc_query(counter->mdev, counter->counter, &packets, &bytes); + if (err) + return err; + + seq_printf(file, "0x%llx\n", bytes); + return 0; +} + +DEFINE_SHOW_ATTRIBUTE(packets); +DEFINE_SHOW_ATTRIBUTE(bytes); + +static void add_counter_node(struct mlx5_vdpa_counter *counter, + struct dentry *parent) +{ + debugfs_create_file("packets", 0444, parent, counter, + &packets_fops); + debugfs_create_file("bytes", 0444, parent, counter, + &bytes_fops); +} + +void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev, + struct macvlan_node *node) +{ + static const char *ut = "untagged"; + char vidstr[9]; + u16 vid; + + node->ucast_counter.mdev = ndev->mvdev.mdev; + node->mcast_counter.mdev = ndev->mvdev.mdev; + if (node->tagged) { + vid = key2vid(node->macvlan); + snprintf(vidstr, sizeof(vidstr), "0x%x", vid); + } else { + strcpy(vidstr, ut); + } + + node->dent = debugfs_create_dir(vidstr, ndev->rx_dent); + if (IS_ERR(node->dent)) { + node->dent = NULL; + return; + } + + node->ucast_counter.dent = debugfs_create_dir("ucast", node->dent); + if (IS_ERR(node->ucast_counter.dent)) + return; + + add_counter_node(&node->ucast_counter, node->ucast_counter.dent); + + node->mcast_counter.dent = debugfs_create_dir("mcast", node->dent); + if (IS_ERR(node->mcast_counter.dent)) + return; + + add_counter_node(&node->mcast_counter, node->mcast_counter.dent); +} + +void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev, + struct macvlan_node *node) +{ + if (node->dent && ndev->debugfs) + debugfs_remove_recursive(node->dent); +} +#endif + +void mlx5_vdpa_add_debugfs(struct mlx5_vdpa_net *ndev) +{ + struct mlx5_core_dev *mdev; + + mdev = ndev->mvdev.mdev; + ndev->debugfs = debugfs_create_dir(dev_name(&ndev->mvdev.vdev.dev), + mlx5_debugfs_get_dev_root(mdev)); + if (!IS_ERR(ndev->debugfs)) + ndev->rx_dent = debugfs_create_dir("rx", ndev->debugfs); +} + +void mlx5_vdpa_remove_debugfs(struct dentry *dbg) +{ + debugfs_remove_recursive(dbg); +} diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c index 3a6dbbc6440d..3a0e721aef05 100644 --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c @@ -18,15 +18,12 @@ #include <linux/mlx5/mlx5_ifc_vdpa.h> #include <linux/mlx5/mpfs.h> #include "mlx5_vdpa.h" +#include "mlx5_vnet.h" MODULE_AUTHOR("Eli Cohen <eli@mellanox.com>"); MODULE_DESCRIPTION("Mellanox VDPA driver"); MODULE_LICENSE("Dual BSD/GPL"); -#define to_mlx5_vdpa_ndev(__mvdev) \ - container_of(__mvdev, struct mlx5_vdpa_net, mvdev) -#define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev) - #define VALID_FEATURES_MASK \ (BIT_ULL(VIRTIO_NET_F_CSUM) | BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) | \ BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) | BIT_ULL(VIRTIO_NET_F_MTU) | BIT_ULL(VIRTIO_NET_F_MAC) | \ @@ -50,14 +47,6 @@ MODULE_LICENSE("Dual BSD/GPL"); #define MLX5V_UNTAGGED 0x1000 -struct mlx5_vdpa_net_resources { - u32 tisn; - u32 tdn; - u32 tirn; - u32 rqtn; - bool valid; -}; - struct mlx5_vdpa_cq_buf { struct mlx5_frag_buf_ctrl fbc; struct mlx5_frag_buf frag_buf; @@ -146,38 +135,6 @@ static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx) return idx <= mvdev->max_idx; } -#define MLX5V_MACVLAN_SIZE 256 - -struct mlx5_vdpa_net { - struct mlx5_vdpa_dev mvdev; - struct mlx5_vdpa_net_resources res; - struct virtio_net_config config; - struct mlx5_vdpa_virtqueue *vqs; - struct vdpa_callback *event_cbs; - - /* Serialize vq resources creation and destruction. This is required - * since memory map might change and we need to destroy and create - * resources while driver in operational. - */ - struct rw_semaphore reslock; - struct mlx5_flow_table *rxft; - bool setup; - u32 cur_num_vqs; - u32 rqt_size; - bool nb_registered; - struct notifier_block nb; - struct vdpa_callback config_cb; - struct mlx5_vdpa_wq_ent cvq_ent; - struct hlist_head macvlan_hash[MLX5V_MACVLAN_SIZE]; -}; - -struct macvlan_node { - struct hlist_node hlist; - struct mlx5_flow_handle *ucast_rule; - struct mlx5_flow_handle *mcast_rule; - u64 macvlan; -}; - static void free_resources(struct mlx5_vdpa_net *ndev); static void init_mvqs(struct mlx5_vdpa_net *ndev); static int setup_driver(struct mlx5_vdpa_dev *mvdev); @@ -1431,36 +1388,85 @@ static int create_tir(struct mlx5_vdpa_net *ndev) err = mlx5_vdpa_create_tir(&ndev->mvdev, in, &ndev->res.tirn); kfree(in); + if (err) + return err; + + mlx5_vdpa_add_tirn(ndev); return err; } static void destroy_tir(struct mlx5_vdpa_net *ndev) { + mlx5_vdpa_remove_tirn(ndev); mlx5_vdpa_destroy_tir(&ndev->mvdev, ndev->res.tirn); } #define MAX_STEERING_ENT 0x8000 #define MAX_STEERING_GROUPS 2 +#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) + #define NUM_DESTS 2 +#else + #define NUM_DESTS 1 +#endif + +static int add_steering_counters(struct mlx5_vdpa_net *ndev, + struct macvlan_node *node, + struct mlx5_flow_act *flow_act, + struct mlx5_flow_destination *dests) +{ +#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) + int err; + + node->ucast_counter.counter = mlx5_fc_create(ndev->mvdev.mdev, false); + if (IS_ERR(node->ucast_counter.counter)) + return PTR_ERR(node->ucast_counter.counter); + + node->mcast_counter.counter = mlx5_fc_create(ndev->mvdev.mdev, false); + if (IS_ERR(node->mcast_counter.counter)) { + err = PTR_ERR(node->mcast_counter.counter); + goto err_mcast_counter; + } + + dests[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER; + flow_act->action |= MLX5_FLOW_CONTEXT_ACTION_COUNT; + return 0; + +err_mcast_counter: + mlx5_fc_destroy(ndev->mvdev.mdev, node->ucast_counter.counter); + return err; +#else + return 0; +#endif +} + +static void remove_steering_counters(struct mlx5_vdpa_net *ndev, + struct macvlan_node *node) +{ +#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) + mlx5_fc_destroy(ndev->mvdev.mdev, node->mcast_counter.counter); + mlx5_fc_destroy(ndev->mvdev.mdev, node->ucast_counter.counter); +#endif +} + static int mlx5_vdpa_add_mac_vlan_rules(struct mlx5_vdpa_net *ndev, u8 *mac, - u16 vid, bool tagged, - struct mlx5_flow_handle **ucast, - struct mlx5_flow_handle **mcast) + struct macvlan_node *node) { - struct mlx5_flow_destination dest = {}; + struct mlx5_flow_destination dests[NUM_DESTS] = {}; struct mlx5_flow_act flow_act = {}; - struct mlx5_flow_handle *rule; struct mlx5_flow_spec *spec; void *headers_c; void *headers_v; u8 *dmac_c; u8 *dmac_v; int err; + u16 vid; spec = kvzalloc(sizeof(*spec), GFP_KERNEL); if (!spec) return -ENOMEM; + vid = key2vid(node->macvlan); spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; headers_c = MLX5_ADDR_OF(fte_match_param, spec->match_criteria, outer_headers); headers_v = MLX5_ADDR_OF(fte_match_param, spec->match_value, outer_headers); @@ -1472,44 +1478,58 @@ static int mlx5_vdpa_add_mac_vlan_rules(struct mlx5_vdpa_net *ndev, u8 *mac, MLX5_SET(fte_match_set_lyr_2_4, headers_c, cvlan_tag, 1); MLX5_SET_TO_ONES(fte_match_set_lyr_2_4, headers_c, first_vid); } - if (tagged) { + if (node->tagged) { MLX5_SET(fte_match_set_lyr_2_4, headers_v, cvlan_tag, 1); MLX5_SET(fte_match_set_lyr_2_4, headers_v, first_vid, vid); } flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST; - dest.type = MLX5_FLOW_DESTINATION_TYPE_TIR; - dest.tir_num = ndev->res.tirn; - rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1); - if (IS_ERR(rule)) - return PTR_ERR(rule); + dests[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR; + dests[0].tir_num = ndev->res.tirn; + err = add_steering_counters(ndev, node, &flow_act, dests); + if (err) + goto out_free; + +#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) + dests[1].counter_id = mlx5_fc_id(node->ucast_counter.counter); +#endif + node->ucast_rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, dests, NUM_DESTS); + if (IS_ERR(node->ucast_rule)) { + err = PTR_ERR(node->ucast_rule); + goto err_ucast; + } - *ucast = rule; +#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) + dests[1].counter_id = mlx5_fc_id(node->mcast_counter.counter); +#endif memset(dmac_c, 0, ETH_ALEN); memset(dmac_v, 0, ETH_ALEN); dmac_c[0] = 1; dmac_v[0] = 1; - rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, &dest, 1); - kvfree(spec); - if (IS_ERR(rule)) { - err = PTR_ERR(rule); + node->mcast_rule = mlx5_add_flow_rules(ndev->rxft, spec, &flow_act, dests, NUM_DESTS); + if (IS_ERR(node->mcast_rule)) { + err = PTR_ERR(node->mcast_rule); goto err_mcast; } - - *mcast = rule; + kvfree(spec); + mlx5_vdpa_add_rx_counters(ndev, node); return 0; err_mcast: - mlx5_del_flow_rules(*ucast); + mlx5_del_flow_rules(node->ucast_rule); +err_ucast: + remove_steering_counters(ndev, node); +out_free: + kvfree(spec); return err; } static void mlx5_vdpa_del_mac_vlan_rules(struct mlx5_vdpa_net *ndev, - struct mlx5_flow_handle *ucast, - struct mlx5_flow_handle *mcast) + struct macvlan_node *node) { - mlx5_del_flow_rules(ucast); - mlx5_del_flow_rules(mcast); + mlx5_vdpa_remove_rx_counters(ndev, node); + mlx5_del_flow_rules(node->ucast_rule); + mlx5_del_flow_rules(node->mcast_rule); } static u64 search_val(u8 *mac, u16 vlan, bool tagged) @@ -1543,14 +1563,14 @@ static struct macvlan_node *mac_vlan_lookup(struct mlx5_vdpa_net *ndev, u64 valu return NULL; } -static int mac_vlan_add(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vlan, bool tagged) // vlan -> vid +static int mac_vlan_add(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vid, bool tagged) { struct macvlan_node *ptr; u64 val; u32 idx; int err; - val = search_val(mac, vlan, tagged); + val = search_val(mac, vid, tagged); if (mac_vlan_lookup(ndev, val)) return -EEXIST; @@ -1558,12 +1578,13 @@ static int mac_vlan_add(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vlan, bool tagg if (!ptr) return -ENOMEM; - err = mlx5_vdpa_add_mac_vlan_rules(ndev, ndev->config.mac, vlan, tagged, - &ptr->ucast_rule, &ptr->mcast_rule); + ptr->tagged = tagged; + ptr->macvlan = val; + ptr->ndev = ndev; + err = mlx5_vdpa_add_mac_vlan_rules(ndev, ndev->config.mac, ptr); if (err) goto err_add; - ptr->macvlan = val; idx = hash_64(val, 8); hlist_add_head(&ptr->hlist, &ndev->macvlan_hash[idx]); return 0; @@ -1582,7 +1603,8 @@ static void mac_vlan_del(struct mlx5_vdpa_net *ndev, u8 *mac, u16 vlan, bool tag return; hlist_del(&ptr->hlist); - mlx5_vdpa_del_mac_vlan_rules(ndev, ptr->ucast_rule, ptr->mcast_rule); + mlx5_vdpa_del_mac_vlan_rules(ndev, ptr); + remove_steering_counters(ndev, ptr); kfree(ptr); } @@ -1595,7 +1617,8 @@ static void clear_mac_vlan_table(struct mlx5_vdpa_net *ndev) for (i = 0; i < MLX5V_MACVLAN_SIZE; i++) { hlist_for_each_entry_safe(pos, n, &ndev->macvlan_hash[i], hlist) { hlist_del(&pos->hlist); - mlx5_vdpa_del_mac_vlan_rules(ndev, pos->ucast_rule, pos->mcast_rule); + mlx5_vdpa_del_mac_vlan_rules(ndev, pos); + remove_steering_counters(ndev, pos); kfree(pos); } } @@ -1621,6 +1644,7 @@ static int setup_steering(struct mlx5_vdpa_net *ndev) mlx5_vdpa_warn(&ndev->mvdev, "failed to create flow table\n"); return PTR_ERR(ndev->rxft); } + mlx5_vdpa_add_rx_flow_table(ndev); err = mac_vlan_add(ndev, ndev->config.mac, 0, false); if (err) @@ -1629,6 +1653,7 @@ static int setup_steering(struct mlx5_vdpa_net *ndev) return 0; err_add: + mlx5_vdpa_remove_rx_flow_table(ndev); mlx5_destroy_flow_table(ndev->rxft); return err; } @@ -1636,6 +1661,7 @@ err_add: static void teardown_steering(struct mlx5_vdpa_net *ndev) { clear_mac_vlan_table(ndev); + mlx5_vdpa_remove_rx_flow_table(ndev); mlx5_destroy_flow_table(ndev->rxft); } @@ -2183,6 +2209,7 @@ static u64 get_supported_features(struct mlx5_core_dev *mdev) mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_STATUS); mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_MTU); mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_CTRL_VLAN); + mlx_vdpa_features |= BIT_ULL(VIRTIO_NET_F_MAC); return mlx_vdpa_features; } @@ -2655,6 +2682,16 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, unsigned int asid, return err; } +static struct device *mlx5_get_vq_dma_dev(struct vdpa_device *vdev, u16 idx) +{ + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); + + if (is_ctrl_vq_idx(mvdev, idx)) + return &vdev->dev; + + return mvdev->vdev.dma_dev; +} + static void mlx5_vdpa_free(struct vdpa_device *vdev) { struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); @@ -2870,6 +2907,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = { .get_generation = mlx5_vdpa_get_generation, .set_map = mlx5_vdpa_set_map, .set_group_asid = mlx5_set_group_asid, + .get_vq_dma_dev = mlx5_get_vq_dma_dev, .free = mlx5_vdpa_free, .suspend = mlx5_vdpa_suspend, }; @@ -3009,6 +3047,8 @@ static int event_handler(struct notifier_block *nb, unsigned long event, void *p struct mlx5_vdpa_wq_ent *wqent; if (event == MLX5_EVENT_TYPE_PORT_CHANGE) { + if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_STATUS))) + return NOTIFY_DONE; switch (eqe->sub_type) { case MLX5_PORT_CHANGE_SUBTYPE_DOWN: case MLX5_PORT_CHANGE_SUBTYPE_ACTIVE: @@ -3060,6 +3100,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, struct mlx5_vdpa_dev *mvdev; struct mlx5_vdpa_net *ndev; struct mlx5_core_dev *mdev; + u64 device_features; u32 max_vqs; u16 mtu; int err; @@ -3068,6 +3109,24 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, return -ENOSPC; mdev = mgtdev->madev->mdev; + device_features = mgtdev->mgtdev.supported_features; + if (add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) { + if (add_config->device_features & ~device_features) { + dev_warn(mdev->device, + "The provisioned features 0x%llx are not supported by this device with features 0x%llx\n", + add_config->device_features, device_features); + return -EINVAL; + } + device_features &= add_config->device_features; + } + if (!(device_features & BIT_ULL(VIRTIO_F_VERSION_1) && + device_features & BIT_ULL(VIRTIO_F_ACCESS_PLATFORM))) { + dev_warn(mdev->device, + "Must provision minimum features 0x%llx for this device", + BIT_ULL(VIRTIO_F_VERSION_1) | BIT_ULL(VIRTIO_F_ACCESS_PLATFORM)); + return -EOPNOTSUPP; + } + if (!(MLX5_CAP_DEV_VDPA_EMULATION(mdev, virtio_queue_type) & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_SPLIT)) { dev_warn(mdev->device, "missing support for split virtqueues\n"); @@ -3096,7 +3155,6 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, if (IS_ERR(ndev)) return PTR_ERR(ndev); - ndev->mvdev.mlx_features = mgtdev->mgtdev.supported_features; ndev->mvdev.max_vqs = max_vqs; mvdev = &ndev->mvdev; mvdev->mdev = mdev; @@ -3118,20 +3176,26 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, goto err_alloc; } - err = query_mtu(mdev, &mtu); - if (err) - goto err_alloc; + if (device_features & BIT_ULL(VIRTIO_NET_F_MTU)) { + err = query_mtu(mdev, &mtu); + if (err) + goto err_alloc; - ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, mtu); + ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, mtu); + } - if (get_link_state(mvdev)) - ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); - else - ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, ~VIRTIO_NET_S_LINK_UP); + if (device_features & BIT_ULL(VIRTIO_NET_F_STATUS)) { + if (get_link_state(mvdev)) + ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); + else + ndev->config.status &= cpu_to_mlx5vdpa16(mvdev, ~VIRTIO_NET_S_LINK_UP); + } if (add_config->mask & (1 << VDPA_ATTR_DEV_NET_CFG_MACADDR)) { memcpy(ndev->config.mac, add_config->net.mac, ETH_ALEN); - } else { + /* No bother setting mac address in config if not going to provision _F_MAC */ + } else if ((add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) == 0 || + device_features & BIT_ULL(VIRTIO_NET_F_MAC)) { err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac); if (err) goto err_alloc; @@ -3142,11 +3206,26 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, err = mlx5_mpfs_add_mac(pfmdev, config->mac); if (err) goto err_alloc; - - ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_MAC); + } else if ((add_config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) == 0) { + /* + * We used to clear _F_MAC feature bit if seeing + * zero mac address when device features are not + * specifically provisioned. Keep the behaviour + * so old scripts do not break. + */ + device_features &= ~BIT_ULL(VIRTIO_NET_F_MAC); + } else if (device_features & BIT_ULL(VIRTIO_NET_F_MAC)) { + /* Don't provision zero mac address for _F_MAC */ + mlx5_vdpa_warn(&ndev->mvdev, + "No mac address provisioned?\n"); + err = -EINVAL; + goto err_alloc; } - config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs / 2); + if (device_features & BIT_ULL(VIRTIO_NET_F_MQ)) + config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, max_vqs / 2); + + ndev->mvdev.mlx_features = device_features; mvdev->vdev.dma_dev = &mdev->pdev->dev; err = mlx5_vdpa_alloc_resources(&ndev->mvdev); if (err) @@ -3178,6 +3257,7 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name, if (err) goto err_reg; + mlx5_vdpa_add_debugfs(ndev); mgtdev->ndev = ndev; return 0; @@ -3204,6 +3284,8 @@ static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device * struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); struct workqueue_struct *wq; + mlx5_vdpa_remove_debugfs(ndev->debugfs); + ndev->debugfs = NULL; if (ndev->nb_registered) { ndev->nb_registered = false; mlx5_notifier_unregister(mvdev->mdev, &ndev->nb); @@ -3243,7 +3325,8 @@ static int mlx5v_probe(struct auxiliary_device *adev, mgtdev->mgtdev.id_table = id_table; mgtdev->mgtdev.config_attr_mask = BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MACADDR) | BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP) | - BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU); + BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) | + BIT_ULL(VDPA_ATTR_DEV_FEATURES); mgtdev->mgtdev.max_supported_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues) + 1; mgtdev->mgtdev.supported_features = get_supported_features(mdev); diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h new file mode 100644 index 000000000000..c90a89e1de4d --- /dev/null +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h @@ -0,0 +1,94 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. */ + +#ifndef __MLX5_VNET_H__ +#define __MLX5_VNET_H__ + +#include "mlx5_vdpa.h" + +#define to_mlx5_vdpa_ndev(__mvdev) \ + container_of(__mvdev, struct mlx5_vdpa_net, mvdev) +#define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev) + +struct mlx5_vdpa_net_resources { + u32 tisn; + u32 tdn; + u32 tirn; + u32 rqtn; + bool valid; + struct dentry *tirn_dent; +}; + +#define MLX5V_MACVLAN_SIZE 256 + +static inline u16 key2vid(u64 key) +{ + return (u16)(key >> 48) & 0xfff; +} + +struct mlx5_vdpa_net { + struct mlx5_vdpa_dev mvdev; + struct mlx5_vdpa_net_resources res; + struct virtio_net_config config; + struct mlx5_vdpa_virtqueue *vqs; + struct vdpa_callback *event_cbs; + + /* Serialize vq resources creation and destruction. This is required + * since memory map might change and we need to destroy and create + * resources while driver in operational. + */ + struct rw_semaphore reslock; + struct mlx5_flow_table *rxft; + struct dentry *rx_dent; + struct dentry *rx_table_dent; + bool setup; + u32 cur_num_vqs; + u32 rqt_size; + bool nb_registered; + struct notifier_block nb; + struct vdpa_callback config_cb; + struct mlx5_vdpa_wq_ent cvq_ent; + struct hlist_head macvlan_hash[MLX5V_MACVLAN_SIZE]; + struct dentry *debugfs; +}; + +struct mlx5_vdpa_counter { + struct mlx5_fc *counter; + struct dentry *dent; + struct mlx5_core_dev *mdev; +}; + +struct macvlan_node { + struct hlist_node hlist; + struct mlx5_flow_handle *ucast_rule; + struct mlx5_flow_handle *mcast_rule; + u64 macvlan; + struct mlx5_vdpa_net *ndev; + bool tagged; +#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) + struct dentry *dent; + struct mlx5_vdpa_counter ucast_counter; + struct mlx5_vdpa_counter mcast_counter; +#endif +}; + +void mlx5_vdpa_add_debugfs(struct mlx5_vdpa_net *ndev); +void mlx5_vdpa_remove_debugfs(struct dentry *dbg); +void mlx5_vdpa_add_rx_flow_table(struct mlx5_vdpa_net *ndev); +void mlx5_vdpa_remove_rx_flow_table(struct mlx5_vdpa_net *ndev); +void mlx5_vdpa_add_tirn(struct mlx5_vdpa_net *ndev); +void mlx5_vdpa_remove_tirn(struct mlx5_vdpa_net *ndev); +#if defined(CONFIG_MLX5_VDPA_STEERING_DEBUG) +void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev, + struct macvlan_node *node); +void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev, + struct macvlan_node *node); +#else +static inline void mlx5_vdpa_add_rx_counters(struct mlx5_vdpa_net *ndev, + struct macvlan_node *node) {} +static inline void mlx5_vdpa_remove_rx_counters(struct mlx5_vdpa_net *ndev, + struct macvlan_node *node) {} +#endif + + +#endif /* __MLX5_VNET_H__ */ diff --git a/drivers/vdpa/solidrun/Makefile b/drivers/vdpa/solidrun/Makefile new file mode 100644 index 000000000000..c0aa3415bf7b --- /dev/null +++ b/drivers/vdpa/solidrun/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_SNET_VDPA) += snet_vdpa.o +snet_vdpa-$(CONFIG_SNET_VDPA) += snet_main.o +ifdef CONFIG_HWMON +snet_vdpa-$(CONFIG_SNET_VDPA) += snet_hwmon.o +endif diff --git a/drivers/vdpa/solidrun/snet_hwmon.c b/drivers/vdpa/solidrun/snet_hwmon.c new file mode 100644 index 000000000000..e695e36ff753 --- /dev/null +++ b/drivers/vdpa/solidrun/snet_hwmon.c @@ -0,0 +1,188 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * SolidRun DPU driver for control plane + * + * Copyright (C) 2022 SolidRun + * + * Author: Alvaro Karsz <alvaro.karsz@solid-run.com> + * + */ +#include <linux/hwmon.h> + +#include "snet_vdpa.h" + +/* Monitor offsets */ +#define SNET_MON_TMP0_IN_OFF 0x00 +#define SNET_MON_TMP0_MAX_OFF 0x08 +#define SNET_MON_TMP0_CRIT_OFF 0x10 +#define SNET_MON_TMP1_IN_OFF 0x18 +#define SNET_MON_TMP1_CRIT_OFF 0x20 +#define SNET_MON_CURR_IN_OFF 0x28 +#define SNET_MON_CURR_MAX_OFF 0x30 +#define SNET_MON_CURR_CRIT_OFF 0x38 +#define SNET_MON_PWR_IN_OFF 0x40 +#define SNET_MON_VOLT_IN_OFF 0x48 +#define SNET_MON_VOLT_CRIT_OFF 0x50 +#define SNET_MON_VOLT_LCRIT_OFF 0x58 + +static void snet_hwmon_read_reg(struct psnet *psnet, u32 reg, long *out) +{ + *out = psnet_read64(psnet, psnet->cfg.hwmon_off + reg); +} + +static umode_t snet_howmon_is_visible(const void *data, + enum hwmon_sensor_types type, + u32 attr, int channel) +{ + return 0444; +} + +static int snet_howmon_read(struct device *dev, enum hwmon_sensor_types type, + u32 attr, int channel, long *val) +{ + struct psnet *psnet = dev_get_drvdata(dev); + int ret = 0; + + switch (type) { + case hwmon_in: + switch (attr) { + case hwmon_in_lcrit: + snet_hwmon_read_reg(psnet, SNET_MON_VOLT_LCRIT_OFF, val); + break; + case hwmon_in_crit: + snet_hwmon_read_reg(psnet, SNET_MON_VOLT_CRIT_OFF, val); + break; + case hwmon_in_input: + snet_hwmon_read_reg(psnet, SNET_MON_VOLT_IN_OFF, val); + break; + default: + ret = -EOPNOTSUPP; + break; + } + break; + + case hwmon_power: + switch (attr) { + case hwmon_power_input: + snet_hwmon_read_reg(psnet, SNET_MON_PWR_IN_OFF, val); + break; + + default: + ret = -EOPNOTSUPP; + break; + } + break; + + case hwmon_curr: + switch (attr) { + case hwmon_curr_input: + snet_hwmon_read_reg(psnet, SNET_MON_CURR_IN_OFF, val); + break; + case hwmon_curr_max: + snet_hwmon_read_reg(psnet, SNET_MON_CURR_MAX_OFF, val); + break; + case hwmon_curr_crit: + snet_hwmon_read_reg(psnet, SNET_MON_CURR_CRIT_OFF, val); + break; + default: + ret = -EOPNOTSUPP; + break; + } + break; + + case hwmon_temp: + switch (attr) { + case hwmon_temp_input: + if (channel == 0) + snet_hwmon_read_reg(psnet, SNET_MON_TMP0_IN_OFF, val); + else + snet_hwmon_read_reg(psnet, SNET_MON_TMP1_IN_OFF, val); + break; + case hwmon_temp_max: + if (channel == 0) + snet_hwmon_read_reg(psnet, SNET_MON_TMP0_MAX_OFF, val); + else + ret = -EOPNOTSUPP; + break; + case hwmon_temp_crit: + if (channel == 0) + snet_hwmon_read_reg(psnet, SNET_MON_TMP0_CRIT_OFF, val); + else + snet_hwmon_read_reg(psnet, SNET_MON_TMP1_CRIT_OFF, val); + break; + + default: + ret = -EOPNOTSUPP; + break; + } + break; + + default: + ret = -EOPNOTSUPP; + break; + } + return ret; +} + +static int snet_hwmon_read_string(struct device *dev, + enum hwmon_sensor_types type, u32 attr, + int channel, const char **str) +{ + int ret = 0; + + switch (type) { + case hwmon_in: + *str = "main_vin"; + break; + case hwmon_power: + *str = "soc_pin"; + break; + case hwmon_curr: + *str = "soc_iin"; + break; + case hwmon_temp: + if (channel == 0) + *str = "power_stage_temp"; + else + *str = "ic_junction_temp"; + break; + default: + ret = -EOPNOTSUPP; + break; + } + return ret; +} + +static const struct hwmon_ops snet_hwmon_ops = { + .is_visible = snet_howmon_is_visible, + .read = snet_howmon_read, + .read_string = snet_hwmon_read_string +}; + +static const struct hwmon_channel_info *snet_hwmon_info[] = { + HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT | HWMON_T_LABEL, + HWMON_T_INPUT | HWMON_T_CRIT | HWMON_T_LABEL), + HWMON_CHANNEL_INFO(power, HWMON_P_INPUT | HWMON_P_LABEL), + HWMON_CHANNEL_INFO(curr, HWMON_C_INPUT | HWMON_C_MAX | HWMON_C_CRIT | HWMON_C_LABEL), + HWMON_CHANNEL_INFO(in, HWMON_I_INPUT | HWMON_I_CRIT | HWMON_I_LCRIT | HWMON_I_LABEL), + NULL +}; + +static const struct hwmon_chip_info snet_hwmono_info = { + .ops = &snet_hwmon_ops, + .info = snet_hwmon_info, +}; + +/* Create an HW monitor device */ +void psnet_create_hwmon(struct pci_dev *pdev) +{ + struct device *hwmon; + struct psnet *psnet = pci_get_drvdata(pdev); + + snprintf(psnet->hwmon_name, SNET_NAME_SIZE, "snet_%s", pci_name(pdev)); + hwmon = devm_hwmon_device_register_with_info(&pdev->dev, psnet->hwmon_name, psnet, + &snet_hwmono_info, NULL); + /* The monitor is not mandatory, Just alert user in case of an error */ + if (IS_ERR(hwmon)) + SNET_WARN(pdev, "Failed to create SNET hwmon, error %ld\n", PTR_ERR(hwmon)); +} diff --git a/drivers/vdpa/solidrun/snet_main.c b/drivers/vdpa/solidrun/snet_main.c new file mode 100644 index 000000000000..68de727398ed --- /dev/null +++ b/drivers/vdpa/solidrun/snet_main.c @@ -0,0 +1,1111 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * SolidRun DPU driver for control plane + * + * Copyright (C) 2022 SolidRun + * + * Author: Alvaro Karsz <alvaro.karsz@solid-run.com> + * + */ +#include <linux/iopoll.h> + +#include "snet_vdpa.h" + +/* SNET DPU device ID */ +#define SNET_DEVICE_ID 0x1000 +/* SNET signature */ +#define SNET_SIGNATURE 0xD0D06363 +/* Max. config version that we can work with */ +#define SNET_CFG_VERSION 0x1 +/* Queue align */ +#define SNET_QUEUE_ALIGNMENT PAGE_SIZE +/* Kick value to notify that new data is available */ +#define SNET_KICK_VAL 0x1 +#define SNET_CONFIG_OFF 0x0 +/* ACK timeout for a message */ +#define SNET_ACK_TIMEOUT 2000000 +/* How long we are willing to wait for a SNET device */ +#define SNET_DETECT_TIMEOUT 5000000 +/* How long should we wait for the DPU to read our config */ +#define SNET_READ_CFG_TIMEOUT 3000000 +/* Size of configs written to the DPU */ +#define SNET_GENERAL_CFG_LEN 36 +#define SNET_GENERAL_CFG_VQ_LEN 40 + +enum snet_msg { + SNET_MSG_DESTROY = 1, +}; + +static struct snet *vdpa_to_snet(struct vdpa_device *vdpa) +{ + return container_of(vdpa, struct snet, vdpa); +} + +static int snet_wait_for_msg_ack(struct snet *snet) +{ + struct pci_dev *pdev = snet->pdev; + int ret; + u32 val; + + /* The DPU will clear the messages offset once messages + * are processed. + */ + ret = readx_poll_timeout(ioread32, snet->bar + snet->psnet->cfg.msg_off, + val, !val, 10, SNET_ACK_TIMEOUT); + if (ret) + SNET_WARN(pdev, "Timeout waiting for message ACK\n"); + + return ret; +} + +/* Sends a message to the DPU. + * If blocking is set, the function will return once the + * message was processed by the DPU (or timeout). + */ +static int snet_send_msg(struct snet *snet, u32 msg, bool blocking) +{ + int ret = 0; + + /* Make sure the DPU acked last message before issuing a new one */ + ret = snet_wait_for_msg_ack(snet); + if (ret) + return ret; + + /* Write the message */ + snet_write32(snet, snet->psnet->cfg.msg_off, msg); + + if (blocking) + ret = snet_wait_for_msg_ack(snet); + else /* If non-blocking, flush the write by issuing a read */ + snet_read32(snet, snet->psnet->cfg.msg_off); + + return ret; +} + +static irqreturn_t snet_cfg_irq_hndlr(int irq, void *data) +{ + struct snet *snet = data; + /* Call callback if any */ + if (snet->cb.callback) + return snet->cb.callback(snet->cb.private); + + return IRQ_HANDLED; +} + +static irqreturn_t snet_vq_irq_hndlr(int irq, void *data) +{ + struct snet_vq *vq = data; + /* Call callback if any */ + if (vq->cb.callback) + return vq->cb.callback(vq->cb.private); + + return IRQ_HANDLED; +} + +static void snet_free_irqs(struct snet *snet) +{ + struct psnet *psnet = snet->psnet; + struct pci_dev *pdev; + u32 i; + + /* Which Device allcoated the IRQs? */ + if (PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF)) + pdev = snet->pdev->physfn; + else + pdev = snet->pdev; + + /* Free config's IRQ */ + if (snet->cfg_irq != -1) { + devm_free_irq(&pdev->dev, snet->cfg_irq, snet); + snet->cfg_irq = -1; + } + /* Free VQ IRQs */ + for (i = 0; i < snet->cfg->vq_num; i++) { + if (snet->vqs[i] && snet->vqs[i]->irq != -1) { + devm_free_irq(&pdev->dev, snet->vqs[i]->irq, snet->vqs[i]); + snet->vqs[i]->irq = -1; + } + } + + /* IRQ vectors are freed when the pci remove callback is called */ +} + +static int snet_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_area, + u64 driver_area, u64 device_area) +{ + struct snet *snet = vdpa_to_snet(vdev); + /* save received parameters in vqueue sturct */ + snet->vqs[idx]->desc_area = desc_area; + snet->vqs[idx]->driver_area = driver_area; + snet->vqs[idx]->device_area = device_area; + + return 0; +} + +static void snet_set_vq_num(struct vdpa_device *vdev, u16 idx, u32 num) +{ + struct snet *snet = vdpa_to_snet(vdev); + /* save num in vqueue */ + snet->vqs[idx]->num = num; +} + +static void snet_kick_vq(struct vdpa_device *vdev, u16 idx) +{ + struct snet *snet = vdpa_to_snet(vdev); + /* not ready - ignore */ + if (!snet->vqs[idx]->ready) + return; + + iowrite32(SNET_KICK_VAL, snet->vqs[idx]->kick_ptr); +} + +static void snet_set_vq_cb(struct vdpa_device *vdev, u16 idx, struct vdpa_callback *cb) +{ + struct snet *snet = vdpa_to_snet(vdev); + + snet->vqs[idx]->cb.callback = cb->callback; + snet->vqs[idx]->cb.private = cb->private; +} + +static void snet_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready) +{ + struct snet *snet = vdpa_to_snet(vdev); + + snet->vqs[idx]->ready = ready; +} + +static bool snet_get_vq_ready(struct vdpa_device *vdev, u16 idx) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return snet->vqs[idx]->ready; +} + +static int snet_set_vq_state(struct vdpa_device *vdev, u16 idx, const struct vdpa_vq_state *state) +{ + struct snet *snet = vdpa_to_snet(vdev); + /* Setting the VQ state is not supported. + * If the asked state is the same as the initial one + * we can ignore it. + */ + if (SNET_HAS_FEATURE(snet, VIRTIO_F_RING_PACKED)) { + const struct vdpa_vq_state_packed *p = &state->packed; + + if (p->last_avail_counter == 1 && p->last_used_counter == 1 && + p->last_avail_idx == 0 && p->last_used_idx == 0) + return 0; + } else { + const struct vdpa_vq_state_split *s = &state->split; + + if (s->avail_index == 0) + return 0; + } + + return -EOPNOTSUPP; +} + +static int snet_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa_vq_state *state) +{ + /* Not supported */ + return -EOPNOTSUPP; +} + +static int snet_get_vq_irq(struct vdpa_device *vdev, u16 idx) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return snet->vqs[idx]->irq; +} + +static u32 snet_get_vq_align(struct vdpa_device *vdev) +{ + return (u32)SNET_QUEUE_ALIGNMENT; +} + +static int snet_reset_dev(struct snet *snet) +{ + struct pci_dev *pdev = snet->pdev; + int ret = 0; + u32 i; + + /* If status is 0, nothing to do */ + if (!snet->status) + return 0; + + /* If DPU started, send a destroy message */ + if (snet->status & VIRTIO_CONFIG_S_DRIVER_OK) + ret = snet_send_msg(snet, SNET_MSG_DESTROY, true); + + /* Clear VQs */ + for (i = 0; i < snet->cfg->vq_num; i++) { + if (!snet->vqs[i]) + continue; + snet->vqs[i]->cb.callback = NULL; + snet->vqs[i]->cb.private = NULL; + snet->vqs[i]->desc_area = 0; + snet->vqs[i]->device_area = 0; + snet->vqs[i]->driver_area = 0; + snet->vqs[i]->ready = false; + } + + /* Clear config callback */ + snet->cb.callback = NULL; + snet->cb.private = NULL; + /* Free IRQs */ + snet_free_irqs(snet); + /* Reset status */ + snet->status = 0; + snet->dpu_ready = false; + + if (ret) + SNET_WARN(pdev, "Incomplete reset to SNET[%u] device\n", snet->sid); + else + SNET_DBG(pdev, "Reset SNET[%u] device\n", snet->sid); + + return 0; +} + +static int snet_reset(struct vdpa_device *vdev) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return snet_reset_dev(snet); +} + +static size_t snet_get_config_size(struct vdpa_device *vdev) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return (size_t)snet->cfg->cfg_size; +} + +static u64 snet_get_features(struct vdpa_device *vdev) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return snet->cfg->features; +} + +static int snet_set_drv_features(struct vdpa_device *vdev, u64 features) +{ + struct snet *snet = vdpa_to_snet(vdev); + + snet->negotiated_features = snet->cfg->features & features; + return 0; +} + +static u64 snet_get_drv_features(struct vdpa_device *vdev) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return snet->negotiated_features; +} + +static u16 snet_get_vq_num_max(struct vdpa_device *vdev) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return (u16)snet->cfg->vq_size; +} + +static void snet_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb) +{ + struct snet *snet = vdpa_to_snet(vdev); + + snet->cb.callback = cb->callback; + snet->cb.private = cb->private; +} + +static u32 snet_get_device_id(struct vdpa_device *vdev) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return snet->cfg->virtio_id; +} + +static u32 snet_get_vendor_id(struct vdpa_device *vdev) +{ + return (u32)PCI_VENDOR_ID_SOLIDRUN; +} + +static u8 snet_get_status(struct vdpa_device *vdev) +{ + struct snet *snet = vdpa_to_snet(vdev); + + return snet->status; +} + +static int snet_write_conf(struct snet *snet) +{ + u32 off, i, tmp; + int ret; + + /* No need to write the config twice */ + if (snet->dpu_ready) + return true; + + /* Snet data : + * + * General data: SNET_GENERAL_CFG_LEN bytes long + * 0 0x4 0x8 0xC 0x10 0x14 0x1C 0x24 + * | MAGIC NUMBER | CFG VER | SNET SID | NUMBER OF QUEUES | IRQ IDX | FEATURES | RSVD | + * + * For every VQ: SNET_GENERAL_CFG_VQ_LEN bytes long + * 0 0x4 0x8 + * | VQ SID AND QUEUE SIZE | IRQ Index | + * | DESC AREA | + * | DEVICE AREA | + * | DRIVER AREA | + * | RESERVED | + * + * Magic number should be written last, this is the DPU indication that the data is ready + */ + + /* Init offset */ + off = snet->psnet->cfg.host_cfg_off; + + /* Ignore magic number for now */ + off += 4; + snet_write32(snet, off, snet->psnet->negotiated_cfg_ver); + off += 4; + snet_write32(snet, off, snet->sid); + off += 4; + snet_write32(snet, off, snet->cfg->vq_num); + off += 4; + snet_write32(snet, off, snet->cfg_irq_idx); + off += 4; + snet_write64(snet, off, snet->negotiated_features); + off += 8; + /* Ignore reserved */ + off += 8; + /* Write VQs */ + for (i = 0 ; i < snet->cfg->vq_num ; i++) { + tmp = (i << 16) | (snet->vqs[i]->num & 0xFFFF); + snet_write32(snet, off, tmp); + off += 4; + snet_write32(snet, off, snet->vqs[i]->irq_idx); + off += 4; + snet_write64(snet, off, snet->vqs[i]->desc_area); + off += 8; + snet_write64(snet, off, snet->vqs[i]->device_area); + off += 8; + snet_write64(snet, off, snet->vqs[i]->driver_area); + off += 8; + /* Ignore reserved */ + off += 8; + } + + /* Clear snet messages address for this device */ + snet_write32(snet, snet->psnet->cfg.msg_off, 0); + /* Write magic number - data is ready */ + snet_write32(snet, snet->psnet->cfg.host_cfg_off, SNET_SIGNATURE); + + /* The DPU will ACK the config by clearing the signature */ + ret = readx_poll_timeout(ioread32, snet->bar + snet->psnet->cfg.host_cfg_off, + tmp, !tmp, 10, SNET_READ_CFG_TIMEOUT); + if (ret) { + SNET_ERR(snet->pdev, "Timeout waiting for the DPU to read the config\n"); + return false; + } + + /* set DPU flag */ + snet->dpu_ready = true; + + return true; +} + +static int snet_request_irqs(struct pci_dev *pdev, struct snet *snet) +{ + int ret, i, irq; + + /* Request config IRQ */ + irq = pci_irq_vector(pdev, snet->cfg_irq_idx); + ret = devm_request_irq(&pdev->dev, irq, snet_cfg_irq_hndlr, 0, + snet->cfg_irq_name, snet); + if (ret) { + SNET_ERR(pdev, "Failed to request IRQ\n"); + return ret; + } + snet->cfg_irq = irq; + + /* Request IRQ for every VQ */ + for (i = 0; i < snet->cfg->vq_num; i++) { + irq = pci_irq_vector(pdev, snet->vqs[i]->irq_idx); + ret = devm_request_irq(&pdev->dev, irq, snet_vq_irq_hndlr, 0, + snet->vqs[i]->irq_name, snet->vqs[i]); + if (ret) { + SNET_ERR(pdev, "Failed to request IRQ\n"); + return ret; + } + snet->vqs[i]->irq = irq; + } + return 0; +} + +static void snet_set_status(struct vdpa_device *vdev, u8 status) +{ + struct snet *snet = vdpa_to_snet(vdev); + struct psnet *psnet = snet->psnet; + struct pci_dev *pdev = snet->pdev; + int ret; + bool pf_irqs; + + if (status == snet->status) + return; + + if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && + !(snet->status & VIRTIO_CONFIG_S_DRIVER_OK)) { + /* Request IRQs */ + pf_irqs = PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF); + ret = snet_request_irqs(pf_irqs ? pdev->physfn : pdev, snet); + if (ret) + goto set_err; + + /* Write config to the DPU */ + if (snet_write_conf(snet)) { + SNET_INFO(pdev, "Create SNET[%u] device\n", snet->sid); + } else { + snet_free_irqs(snet); + goto set_err; + } + } + + /* Save the new status */ + snet->status = status; + return; + +set_err: + snet->status |= VIRTIO_CONFIG_S_FAILED; +} + +static void snet_get_config(struct vdpa_device *vdev, unsigned int offset, + void *buf, unsigned int len) +{ + struct snet *snet = vdpa_to_snet(vdev); + void __iomem *cfg_ptr = snet->cfg->virtio_cfg + offset; + u8 *buf_ptr = buf; + u32 i; + + /* check for offset error */ + if (offset + len > snet->cfg->cfg_size) + return; + + /* Write into buffer */ + for (i = 0; i < len; i++) + *buf_ptr++ = ioread8(cfg_ptr + i); +} + +static void snet_set_config(struct vdpa_device *vdev, unsigned int offset, + const void *buf, unsigned int len) +{ + struct snet *snet = vdpa_to_snet(vdev); + void __iomem *cfg_ptr = snet->cfg->virtio_cfg + offset; + const u8 *buf_ptr = buf; + u32 i; + + /* check for offset error */ + if (offset + len > snet->cfg->cfg_size) + return; + + /* Write into PCI BAR */ + for (i = 0; i < len; i++) + iowrite8(*buf_ptr++, cfg_ptr + i); +} + +static const struct vdpa_config_ops snet_config_ops = { + .set_vq_address = snet_set_vq_address, + .set_vq_num = snet_set_vq_num, + .kick_vq = snet_kick_vq, + .set_vq_cb = snet_set_vq_cb, + .set_vq_ready = snet_set_vq_ready, + .get_vq_ready = snet_get_vq_ready, + .set_vq_state = snet_set_vq_state, + .get_vq_state = snet_get_vq_state, + .get_vq_irq = snet_get_vq_irq, + .get_vq_align = snet_get_vq_align, + .reset = snet_reset, + .get_config_size = snet_get_config_size, + .get_device_features = snet_get_features, + .set_driver_features = snet_set_drv_features, + .get_driver_features = snet_get_drv_features, + .get_vq_num_min = snet_get_vq_num_max, + .get_vq_num_max = snet_get_vq_num_max, + .set_config_cb = snet_set_config_cb, + .get_device_id = snet_get_device_id, + .get_vendor_id = snet_get_vendor_id, + .get_status = snet_get_status, + .set_status = snet_set_status, + .get_config = snet_get_config, + .set_config = snet_set_config, +}; + +static int psnet_open_pf_bar(struct pci_dev *pdev, struct psnet *psnet) +{ + char name[50]; + int ret, i, mask = 0; + /* We don't know which BAR will be used to communicate.. + * We will map every bar with len > 0. + * + * Later, we will discover the BAR and unmap all other BARs. + */ + for (i = 0; i < PCI_STD_NUM_BARS; i++) { + if (pci_resource_len(pdev, i)) + mask |= (1 << i); + } + + /* No BAR can be used.. */ + if (!mask) { + SNET_ERR(pdev, "Failed to find a PCI BAR\n"); + return -ENODEV; + } + + snprintf(name, sizeof(name), "psnet[%s]-bars", pci_name(pdev)); + ret = pcim_iomap_regions(pdev, mask, name); + if (ret) { + SNET_ERR(pdev, "Failed to request and map PCI BARs\n"); + return ret; + } + + for (i = 0; i < PCI_STD_NUM_BARS; i++) { + if (mask & (1 << i)) + psnet->bars[i] = pcim_iomap_table(pdev)[i]; + } + + return 0; +} + +static int snet_open_vf_bar(struct pci_dev *pdev, struct snet *snet) +{ + char name[50]; + int ret; + + snprintf(name, sizeof(name), "snet[%s]-bar", pci_name(pdev)); + /* Request and map BAR */ + ret = pcim_iomap_regions(pdev, BIT(snet->psnet->cfg.vf_bar), name); + if (ret) { + SNET_ERR(pdev, "Failed to request and map PCI BAR for a VF\n"); + return ret; + } + + snet->bar = pcim_iomap_table(pdev)[snet->psnet->cfg.vf_bar]; + + return 0; +} + +static void snet_free_cfg(struct snet_cfg *cfg) +{ + u32 i; + + if (!cfg->devs) + return; + + /* Free devices */ + for (i = 0; i < cfg->devices_num; i++) { + if (!cfg->devs[i]) + break; + + kfree(cfg->devs[i]); + } + /* Free pointers to devices */ + kfree(cfg->devs); +} + +/* Detect which BAR is used for communication with the device. */ +static int psnet_detect_bar(struct psnet *psnet, u32 off) +{ + unsigned long exit_time; + int i; + + exit_time = jiffies + usecs_to_jiffies(SNET_DETECT_TIMEOUT); + + /* SNET DPU will write SNET's signature when the config is ready. */ + while (time_before(jiffies, exit_time)) { + for (i = 0; i < PCI_STD_NUM_BARS; i++) { + /* Is this BAR mapped? */ + if (!psnet->bars[i]) + continue; + + if (ioread32(psnet->bars[i] + off) == SNET_SIGNATURE) + return i; + } + usleep_range(1000, 10000); + } + + return -ENODEV; +} + +static void psnet_unmap_unused_bars(struct pci_dev *pdev, struct psnet *psnet) +{ + int i, mask = 0; + + for (i = 0; i < PCI_STD_NUM_BARS; i++) { + if (psnet->bars[i] && i != psnet->barno) + mask |= (1 << i); + } + + if (mask) + pcim_iounmap_regions(pdev, mask); +} + +/* Read SNET config from PCI BAR */ +static int psnet_read_cfg(struct pci_dev *pdev, struct psnet *psnet) +{ + struct snet_cfg *cfg = &psnet->cfg; + u32 i, off; + int barno; + + /* Move to where the config starts */ + off = SNET_CONFIG_OFF; + + /* Find BAR used for communication */ + barno = psnet_detect_bar(psnet, off); + if (barno < 0) { + SNET_ERR(pdev, "SNET config is not ready.\n"); + return barno; + } + + /* Save used BAR number and unmap all other BARs */ + psnet->barno = barno; + SNET_DBG(pdev, "Using BAR number %d\n", barno); + + psnet_unmap_unused_bars(pdev, psnet); + + /* load config from BAR */ + cfg->key = psnet_read32(psnet, off); + off += 4; + cfg->cfg_size = psnet_read32(psnet, off); + off += 4; + cfg->cfg_ver = psnet_read32(psnet, off); + off += 4; + /* The negotiated config version is the lower one between this driver's config + * and the DPU's. + */ + psnet->negotiated_cfg_ver = min_t(u32, cfg->cfg_ver, SNET_CFG_VERSION); + SNET_DBG(pdev, "SNET config version %u\n", psnet->negotiated_cfg_ver); + + cfg->vf_num = psnet_read32(psnet, off); + off += 4; + cfg->vf_bar = psnet_read32(psnet, off); + off += 4; + cfg->host_cfg_off = psnet_read32(psnet, off); + off += 4; + cfg->max_size_host_cfg = psnet_read32(psnet, off); + off += 4; + cfg->virtio_cfg_off = psnet_read32(psnet, off); + off += 4; + cfg->kick_off = psnet_read32(psnet, off); + off += 4; + cfg->hwmon_off = psnet_read32(psnet, off); + off += 4; + cfg->msg_off = psnet_read32(psnet, off); + off += 4; + cfg->flags = psnet_read32(psnet, off); + off += 4; + /* Ignore Reserved */ + off += sizeof(cfg->rsvd); + + cfg->devices_num = psnet_read32(psnet, off); + off += 4; + /* Allocate memory to hold pointer to the devices */ + cfg->devs = kcalloc(cfg->devices_num, sizeof(void *), GFP_KERNEL); + if (!cfg->devs) + return -ENOMEM; + + /* Load device configuration from BAR */ + for (i = 0; i < cfg->devices_num; i++) { + cfg->devs[i] = kzalloc(sizeof(*cfg->devs[i]), GFP_KERNEL); + if (!cfg->devs[i]) { + snet_free_cfg(cfg); + return -ENOMEM; + } + /* Read device config */ + cfg->devs[i]->virtio_id = psnet_read32(psnet, off); + off += 4; + cfg->devs[i]->vq_num = psnet_read32(psnet, off); + off += 4; + cfg->devs[i]->vq_size = psnet_read32(psnet, off); + off += 4; + cfg->devs[i]->vfid = psnet_read32(psnet, off); + off += 4; + cfg->devs[i]->features = psnet_read64(psnet, off); + off += 8; + /* Ignore Reserved */ + off += sizeof(cfg->devs[i]->rsvd); + + cfg->devs[i]->cfg_size = psnet_read32(psnet, off); + off += 4; + + /* Is the config witten to the DPU going to be too big? */ + if (SNET_GENERAL_CFG_LEN + SNET_GENERAL_CFG_VQ_LEN * cfg->devs[i]->vq_num > + cfg->max_size_host_cfg) { + SNET_ERR(pdev, "Failed to read SNET config, the config is too big..\n"); + snet_free_cfg(cfg); + return -EINVAL; + } + } + return 0; +} + +static int psnet_alloc_irq_vector(struct pci_dev *pdev, struct psnet *psnet) +{ + int ret = 0; + u32 i, irq_num = 0; + + /* Let's count how many IRQs we need, 1 for every VQ + 1 for config change */ + for (i = 0; i < psnet->cfg.devices_num; i++) + irq_num += psnet->cfg.devs[i]->vq_num + 1; + + ret = pci_alloc_irq_vectors(pdev, irq_num, irq_num, PCI_IRQ_MSIX); + if (ret != irq_num) { + SNET_ERR(pdev, "Failed to allocate IRQ vectors\n"); + return ret; + } + SNET_DBG(pdev, "Allocated %u IRQ vectors from physical function\n", irq_num); + + return 0; +} + +static int snet_alloc_irq_vector(struct pci_dev *pdev, struct snet_dev_cfg *snet_cfg) +{ + int ret = 0; + u32 irq_num; + + /* We want 1 IRQ for every VQ + 1 for config change events */ + irq_num = snet_cfg->vq_num + 1; + + ret = pci_alloc_irq_vectors(pdev, irq_num, irq_num, PCI_IRQ_MSIX); + if (ret <= 0) { + SNET_ERR(pdev, "Failed to allocate IRQ vectors\n"); + return ret; + } + + return 0; +} + +static void snet_free_vqs(struct snet *snet) +{ + u32 i; + + if (!snet->vqs) + return; + + for (i = 0 ; i < snet->cfg->vq_num ; i++) { + if (!snet->vqs[i]) + break; + + kfree(snet->vqs[i]); + } + kfree(snet->vqs); +} + +static int snet_build_vqs(struct snet *snet) +{ + u32 i; + /* Allocate the VQ pointers array */ + snet->vqs = kcalloc(snet->cfg->vq_num, sizeof(void *), GFP_KERNEL); + if (!snet->vqs) + return -ENOMEM; + + /* Allocate the VQs */ + for (i = 0; i < snet->cfg->vq_num; i++) { + snet->vqs[i] = kzalloc(sizeof(*snet->vqs[i]), GFP_KERNEL); + if (!snet->vqs[i]) { + snet_free_vqs(snet); + return -ENOMEM; + } + /* Reset IRQ num */ + snet->vqs[i]->irq = -1; + /* VQ serial ID */ + snet->vqs[i]->sid = i; + /* Kick address - every VQ gets 4B */ + snet->vqs[i]->kick_ptr = snet->bar + snet->psnet->cfg.kick_off + + snet->vqs[i]->sid * 4; + /* Clear kick address for this VQ */ + iowrite32(0, snet->vqs[i]->kick_ptr); + } + return 0; +} + +static int psnet_get_next_irq_num(struct psnet *psnet) +{ + int irq; + + spin_lock(&psnet->lock); + irq = psnet->next_irq++; + spin_unlock(&psnet->lock); + + return irq; +} + +static void snet_reserve_irq_idx(struct pci_dev *pdev, struct snet *snet) +{ + struct psnet *psnet = snet->psnet; + int i; + + /* one IRQ for every VQ, and one for config changes */ + snet->cfg_irq_idx = psnet_get_next_irq_num(psnet); + snprintf(snet->cfg_irq_name, SNET_NAME_SIZE, "snet[%s]-cfg[%d]", + pci_name(pdev), snet->cfg_irq_idx); + + for (i = 0; i < snet->cfg->vq_num; i++) { + /* Get next free IRQ ID */ + snet->vqs[i]->irq_idx = psnet_get_next_irq_num(psnet); + /* Write IRQ name */ + snprintf(snet->vqs[i]->irq_name, SNET_NAME_SIZE, "snet[%s]-vq[%d]", + pci_name(pdev), snet->vqs[i]->irq_idx); + } +} + +/* Find a device config based on virtual function id */ +static struct snet_dev_cfg *snet_find_dev_cfg(struct snet_cfg *cfg, u32 vfid) +{ + u32 i; + + for (i = 0; i < cfg->devices_num; i++) { + if (cfg->devs[i]->vfid == vfid) + return cfg->devs[i]; + } + /* Oppss.. no config found.. */ + return NULL; +} + +/* Probe function for a physical PCI function */ +static int snet_vdpa_probe_pf(struct pci_dev *pdev) +{ + struct psnet *psnet; + int ret = 0; + bool pf_irqs = false; + + ret = pcim_enable_device(pdev); + if (ret) { + SNET_ERR(pdev, "Failed to enable PCI device\n"); + return ret; + } + + /* Allocate a PCI physical function device */ + psnet = kzalloc(sizeof(*psnet), GFP_KERNEL); + if (!psnet) + return -ENOMEM; + + /* Init PSNET spinlock */ + spin_lock_init(&psnet->lock); + + pci_set_master(pdev); + pci_set_drvdata(pdev, psnet); + + /* Open SNET MAIN BAR */ + ret = psnet_open_pf_bar(pdev, psnet); + if (ret) + goto free_psnet; + + /* Try to read SNET's config from PCI BAR */ + ret = psnet_read_cfg(pdev, psnet); + if (ret) + goto free_psnet; + + /* If SNET_CFG_FLAG_IRQ_PF flag is set, we should use + * PF MSI-X vectors + */ + pf_irqs = PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF); + + if (pf_irqs) { + ret = psnet_alloc_irq_vector(pdev, psnet); + if (ret) + goto free_cfg; + } + + SNET_DBG(pdev, "Enable %u virtual functions\n", psnet->cfg.vf_num); + ret = pci_enable_sriov(pdev, psnet->cfg.vf_num); + if (ret) { + SNET_ERR(pdev, "Failed to enable SR-IOV\n"); + goto free_irq; + } + + /* Create HW monitor device */ + if (PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_HWMON)) { +#if IS_ENABLED(CONFIG_HWMON) + psnet_create_hwmon(pdev); +#else + SNET_WARN(pdev, "Can't start HWMON, CONFIG_HWMON is not enabled\n"); +#endif + } + + return 0; + +free_irq: + if (pf_irqs) + pci_free_irq_vectors(pdev); +free_cfg: + snet_free_cfg(&psnet->cfg); +free_psnet: + kfree(psnet); + return ret; +} + +/* Probe function for a virtual PCI function */ +static int snet_vdpa_probe_vf(struct pci_dev *pdev) +{ + struct pci_dev *pdev_pf = pdev->physfn; + struct psnet *psnet = pci_get_drvdata(pdev_pf); + struct snet_dev_cfg *dev_cfg; + struct snet *snet; + u32 vfid; + int ret; + bool pf_irqs = false; + + /* Get virtual function id. + * (the DPU counts the VFs from 1) + */ + ret = pci_iov_vf_id(pdev); + if (ret < 0) { + SNET_ERR(pdev, "Failed to find a VF id\n"); + return ret; + } + vfid = ret + 1; + + /* Find the snet_dev_cfg based on vfid */ + dev_cfg = snet_find_dev_cfg(&psnet->cfg, vfid); + if (!dev_cfg) { + SNET_WARN(pdev, "Failed to find a VF config..\n"); + return -ENODEV; + } + + /* Which PCI device should allocate the IRQs? + * If the SNET_CFG_FLAG_IRQ_PF flag set, the PF device allocates the IRQs + */ + pf_irqs = PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF); + + ret = pcim_enable_device(pdev); + if (ret) { + SNET_ERR(pdev, "Failed to enable PCI VF device\n"); + return ret; + } + + /* Request for MSI-X IRQs */ + if (!pf_irqs) { + ret = snet_alloc_irq_vector(pdev, dev_cfg); + if (ret) + return ret; + } + + /* Allocate vdpa device */ + snet = vdpa_alloc_device(struct snet, vdpa, &pdev->dev, &snet_config_ops, 1, 1, NULL, + false); + if (!snet) { + SNET_ERR(pdev, "Failed to allocate a vdpa device\n"); + ret = -ENOMEM; + goto free_irqs; + } + + /* Save pci device pointer */ + snet->pdev = pdev; + snet->psnet = psnet; + snet->cfg = dev_cfg; + snet->dpu_ready = false; + snet->sid = vfid; + /* Reset IRQ value */ + snet->cfg_irq = -1; + + ret = snet_open_vf_bar(pdev, snet); + if (ret) + goto put_device; + + /* Create a VirtIO config pointer */ + snet->cfg->virtio_cfg = snet->bar + snet->psnet->cfg.virtio_cfg_off; + + pci_set_master(pdev); + pci_set_drvdata(pdev, snet); + + ret = snet_build_vqs(snet); + if (ret) + goto put_device; + + /* Reserve IRQ indexes, + * The IRQs may be requested and freed multiple times, + * but the indexes won't change. + */ + snet_reserve_irq_idx(pf_irqs ? pdev_pf : pdev, snet); + + /*set DMA device*/ + snet->vdpa.dma_dev = &pdev->dev; + + /* Register VDPA device */ + ret = vdpa_register_device(&snet->vdpa, snet->cfg->vq_num); + if (ret) { + SNET_ERR(pdev, "Failed to register vdpa device\n"); + goto free_vqs; + } + + return 0; + +free_vqs: + snet_free_vqs(snet); +put_device: + put_device(&snet->vdpa.dev); +free_irqs: + if (!pf_irqs) + pci_free_irq_vectors(pdev); + return ret; +} + +static int snet_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + if (pdev->is_virtfn) + return snet_vdpa_probe_vf(pdev); + else + return snet_vdpa_probe_pf(pdev); +} + +static void snet_vdpa_remove_pf(struct pci_dev *pdev) +{ + struct psnet *psnet = pci_get_drvdata(pdev); + + pci_disable_sriov(pdev); + /* If IRQs are allocated from the PF, we should free the IRQs */ + if (PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF)) + pci_free_irq_vectors(pdev); + + snet_free_cfg(&psnet->cfg); + kfree(psnet); +} + +static void snet_vdpa_remove_vf(struct pci_dev *pdev) +{ + struct snet *snet = pci_get_drvdata(pdev); + struct psnet *psnet = snet->psnet; + + vdpa_unregister_device(&snet->vdpa); + snet_free_vqs(snet); + /* If IRQs are allocated from the VF, we should free the IRQs */ + if (!PSNET_FLAG_ON(psnet, SNET_CFG_FLAG_IRQ_PF)) + pci_free_irq_vectors(pdev); +} + +static void snet_vdpa_remove(struct pci_dev *pdev) +{ + if (pdev->is_virtfn) + snet_vdpa_remove_vf(pdev); + else + snet_vdpa_remove_pf(pdev); +} + +static struct pci_device_id snet_driver_pci_ids[] = { + { PCI_DEVICE_SUB(PCI_VENDOR_ID_SOLIDRUN, SNET_DEVICE_ID, + PCI_VENDOR_ID_SOLIDRUN, SNET_DEVICE_ID) }, + { 0 }, +}; + +MODULE_DEVICE_TABLE(pci, snet_driver_pci_ids); + +static struct pci_driver snet_vdpa_driver = { + .name = "snet-vdpa-driver", + .id_table = snet_driver_pci_ids, + .probe = snet_vdpa_probe, + .remove = snet_vdpa_remove, +}; + +module_pci_driver(snet_vdpa_driver); + +MODULE_AUTHOR("Alvaro Karsz <alvaro.karsz@solid-run.com>"); +MODULE_DESCRIPTION("SolidRun vDPA driver"); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/vdpa/solidrun/snet_vdpa.h b/drivers/vdpa/solidrun/snet_vdpa.h new file mode 100644 index 000000000000..b7f34169053f --- /dev/null +++ b/drivers/vdpa/solidrun/snet_vdpa.h @@ -0,0 +1,194 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * SolidRun DPU driver for control plane + * + * Copyright (C) 2022 SolidRun + * + * Author: Alvaro Karsz <alvaro.karsz@solid-run.com> + * + */ +#ifndef _SNET_VDPA_H_ +#define _SNET_VDPA_H_ + +#include <linux/vdpa.h> +#include <linux/pci.h> + +#define SNET_NAME_SIZE 256 + +#define SNET_ERR(pdev, fmt, ...) dev_err(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__) +#define SNET_WARN(pdev, fmt, ...) dev_warn(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__) +#define SNET_INFO(pdev, fmt, ...) dev_info(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__) +#define SNET_DBG(pdev, fmt, ...) dev_dbg(&(pdev)->dev, "%s"fmt, "snet_vdpa: ", ##__VA_ARGS__) +#define SNET_HAS_FEATURE(s, f) ((s)->negotiated_features & BIT_ULL(f)) +/* VQ struct */ +struct snet_vq { + /* VQ callback */ + struct vdpa_callback cb; + /* desc base address */ + u64 desc_area; + /* device base address */ + u64 device_area; + /* driver base address */ + u64 driver_area; + /* Queue size */ + u32 num; + /* Serial ID for VQ */ + u32 sid; + /* is ready flag */ + bool ready; + /* IRQ number */ + u32 irq; + /* IRQ index, DPU uses this to parse data from MSI-X table */ + u32 irq_idx; + /* IRQ name */ + char irq_name[SNET_NAME_SIZE]; + /* pointer to mapped PCI BAR register used by this VQ to kick */ + void __iomem *kick_ptr; +}; + +struct snet { + /* vdpa device */ + struct vdpa_device vdpa; + /* Config callback */ + struct vdpa_callback cb; + /* array of virqueues */ + struct snet_vq **vqs; + /* Used features */ + u64 negotiated_features; + /* Device serial ID */ + u32 sid; + /* device status */ + u8 status; + /* boolean indicating if snet config was passed to the device */ + bool dpu_ready; + /* IRQ number */ + u32 cfg_irq; + /* IRQ index, DPU uses this to parse data from MSI-X table */ + u32 cfg_irq_idx; + /* IRQ name */ + char cfg_irq_name[SNET_NAME_SIZE]; + /* BAR to access the VF */ + void __iomem *bar; + /* PCI device */ + struct pci_dev *pdev; + /* Pointer to snet pdev parent device */ + struct psnet *psnet; + /* Pointer to snet config device */ + struct snet_dev_cfg *cfg; +}; + +struct snet_dev_cfg { + /* Device ID following VirtIO spec. */ + u32 virtio_id; + /* Number of VQs for this device */ + u32 vq_num; + /* Size of every VQ */ + u32 vq_size; + /* Virtual Function id */ + u32 vfid; + /* Device features, following VirtIO spec */ + u64 features; + /* Reserved for future usage */ + u32 rsvd[6]; + /* VirtIO device specific config size */ + u32 cfg_size; + /* VirtIO device specific config address */ + void __iomem *virtio_cfg; +} __packed; + +struct snet_cfg { + /* Magic key */ + u32 key; + /* Size of total config in bytes */ + u32 cfg_size; + /* Config version */ + u32 cfg_ver; + /* Number of Virtual Functions to create */ + u32 vf_num; + /* BAR to use for the VFs */ + u32 vf_bar; + /* Where should we write the SNET's config */ + u32 host_cfg_off; + /* Max. allowed size for a SNET's config */ + u32 max_size_host_cfg; + /* VirtIO config offset in BAR */ + u32 virtio_cfg_off; + /* Offset in PCI BAR for VQ kicks */ + u32 kick_off; + /* Offset in PCI BAR for HW monitoring */ + u32 hwmon_off; + /* Offset in PCI BAR for SNET messages */ + u32 msg_off; + /* Config general flags - enum snet_cfg_flags */ + u32 flags; + /* Reserved for future usage */ + u32 rsvd[6]; + /* Number of snet devices */ + u32 devices_num; + /* The actual devices */ + struct snet_dev_cfg **devs; +} __packed; + +/* SolidNET PCIe device, one device per PCIe physical function */ +struct psnet { + /* PCI BARs */ + void __iomem *bars[PCI_STD_NUM_BARS]; + /* Negotiated config version */ + u32 negotiated_cfg_ver; + /* Next IRQ index to use in case when the IRQs are allocated from this device */ + u32 next_irq; + /* BAR number used to communicate with the device */ + u8 barno; + /* spinlock to protect data that can be changed by SNET devices */ + spinlock_t lock; + /* Pointer to the device's config read from BAR */ + struct snet_cfg cfg; + /* Name of monitor device */ + char hwmon_name[SNET_NAME_SIZE]; +}; + +enum snet_cfg_flags { + /* Create a HWMON device */ + SNET_CFG_FLAG_HWMON = BIT(0), + /* USE IRQs from the physical function */ + SNET_CFG_FLAG_IRQ_PF = BIT(1), +}; + +#define PSNET_FLAG_ON(p, f) ((p)->cfg.flags & (f)) + +static inline u32 psnet_read32(struct psnet *psnet, u32 off) +{ + return ioread32(psnet->bars[psnet->barno] + off); +} + +static inline u32 snet_read32(struct snet *snet, u32 off) +{ + return ioread32(snet->bar + off); +} + +static inline void snet_write32(struct snet *snet, u32 off, u32 val) +{ + iowrite32(val, snet->bar + off); +} + +static inline u64 psnet_read64(struct psnet *psnet, u32 off) +{ + u64 val; + /* 64bits are written in 2 halves, low part first */ + val = (u64)psnet_read32(psnet, off); + val |= ((u64)psnet_read32(psnet, off + 4) << 32); + return val; +} + +static inline void snet_write64(struct snet *snet, u32 off, u64 val) +{ + /* The DPU expects a 64bit integer in 2 halves, the low part first */ + snet_write32(snet, off, (u32)val); + snet_write32(snet, off + 4, (u32)(val >> 32)); +} + +#if IS_ENABLED(CONFIG_HWMON) +void psnet_create_hwmon(struct pci_dev *pdev); +#endif + +#endif //_SNET_VDPA_H_ diff --git a/drivers/vdpa/vdpa.c b/drivers/vdpa/vdpa.c index 8ef7aa1365cc..965e32529eb8 100644 --- a/drivers/vdpa/vdpa.c +++ b/drivers/vdpa/vdpa.c @@ -39,6 +39,11 @@ static int vdpa_dev_probe(struct device *d) u32 max_num, min_num = 1; int ret = 0; + d->dma_mask = &d->coherent_dma_mask; + ret = dma_set_mask_and_coherent(d, DMA_BIT_MASK(64)); + if (ret) + return ret; + max_num = ops->get_vq_num_max(vdev); if (ops->get_vq_num_min) min_num = ops->get_vq_num_min(vdev); @@ -460,12 +465,28 @@ static int vdpa_nl_mgmtdev_handle_fill(struct sk_buff *msg, const struct vdpa_mg return 0; } +static u64 vdpa_mgmtdev_get_classes(const struct vdpa_mgmt_dev *mdev, + unsigned int *nclasses) +{ + u64 supported_classes = 0; + unsigned int n = 0; + + for (int i = 0; mdev->id_table[i].device; i++) { + if (mdev->id_table[i].device > 63) + continue; + supported_classes |= BIT_ULL(mdev->id_table[i].device); + n++; + } + if (nclasses) + *nclasses = n; + + return supported_classes; +} + static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *msg, u32 portid, u32 seq, int flags) { - u64 supported_classes = 0; void *hdr; - int i = 0; int err; hdr = genlmsg_put(msg, portid, seq, &vdpa_nl_family, flags, VDPA_CMD_MGMTDEV_NEW); @@ -475,14 +496,9 @@ static int vdpa_mgmtdev_fill(const struct vdpa_mgmt_dev *mdev, struct sk_buff *m if (err) goto msg_err; - while (mdev->id_table[i].device) { - if (mdev->id_table[i].device <= 63) - supported_classes |= BIT_ULL(mdev->id_table[i].device); - i++; - } - if (nla_put_u64_64bit(msg, VDPA_ATTR_MGMTDEV_SUPPORTED_CLASSES, - supported_classes, VDPA_ATTR_UNSPEC)) { + vdpa_mgmtdev_get_classes(mdev, NULL), + VDPA_ATTR_UNSPEC)) { err = -EMSGSIZE; goto msg_err; } @@ -566,13 +582,25 @@ out: BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MTU) | \ BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP)) +/* + * Bitmask for all per-device features: feature bits VIRTIO_TRANSPORT_F_START + * through VIRTIO_TRANSPORT_F_END are unset, i.e. 0xfffffc000fffffff for + * all 64bit features. If the features are extended beyond 64 bits, or new + * "holes" are reserved for other type of features than per-device, this + * macro would have to be updated. + */ +#define VIRTIO_DEVICE_F_MASK (~0ULL << (VIRTIO_TRANSPORT_F_END + 1) | \ + ((1ULL << VIRTIO_TRANSPORT_F_START) - 1)) + static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *info) { struct vdpa_dev_set_config config = {}; struct nlattr **nl_attrs = info->attrs; struct vdpa_mgmt_dev *mdev; + unsigned int ncls = 0; const u8 *macaddr; const char *name; + u64 classes; int err = 0; if (!info->attrs[VDPA_ATTR_DEV_NAME]) @@ -601,8 +629,26 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i config.mask |= BIT_ULL(VDPA_ATTR_DEV_NET_CFG_MAX_VQP); } if (nl_attrs[VDPA_ATTR_DEV_FEATURES]) { + u64 missing = 0x0ULL; + config.device_features = nla_get_u64(nl_attrs[VDPA_ATTR_DEV_FEATURES]); + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MACADDR] && + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MAC))) + missing |= BIT_ULL(VIRTIO_NET_F_MAC); + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MTU] && + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MTU))) + missing |= BIT_ULL(VIRTIO_NET_F_MTU); + if (nl_attrs[VDPA_ATTR_DEV_NET_CFG_MAX_VQP] && + config.net.max_vq_pairs > 1 && + !(config.device_features & BIT_ULL(VIRTIO_NET_F_MQ))) + missing |= BIT_ULL(VIRTIO_NET_F_MQ); + if (missing) { + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "Missing features 0x%llx for provided attributes", + missing); + return -EINVAL; + } config.mask |= BIT_ULL(VDPA_ATTR_DEV_FEATURES); } @@ -622,13 +668,33 @@ static int vdpa_nl_cmd_dev_add_set_doit(struct sk_buff *skb, struct genl_info *i err = PTR_ERR(mdev); goto err; } + if ((config.mask & mdev->config_attr_mask) != config.mask) { - NL_SET_ERR_MSG_MOD(info->extack, - "All provided attributes are not supported"); + NL_SET_ERR_MSG_FMT_MOD(info->extack, + "Some provided attributes are not supported: 0x%llx", + config.mask & ~mdev->config_attr_mask); err = -EOPNOTSUPP; goto err; } + classes = vdpa_mgmtdev_get_classes(mdev, &ncls); + if (config.mask & VDPA_DEV_NET_ATTRS_MASK && + !(classes & BIT_ULL(VIRTIO_ID_NET))) { + NL_SET_ERR_MSG_MOD(info->extack, + "Network class attributes provided on unsupported management device"); + err = -EINVAL; + goto err; + } + if (!(config.mask & VDPA_DEV_NET_ATTRS_MASK) && + config.mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES) && + classes & BIT_ULL(VIRTIO_ID_NET) && ncls > 1 && + config.device_features & VIRTIO_DEVICE_F_MASK) { + NL_SET_ERR_MSG_MOD(info->extack, + "Management device supports multi-class while device features specified are ambiguous"); + err = -EINVAL; + goto err; + } + err = mdev->ops->dev_add(mdev, name, &config); err: up_write(&vdpa_dev_lock); @@ -841,18 +907,25 @@ static int vdpa_dev_net_mac_config_fill(struct sk_buff *msg, u64 features, sizeof(config->mac), config->mac); } +static int vdpa_dev_net_status_config_fill(struct sk_buff *msg, u64 features, + const struct virtio_net_config *config) +{ + u16 val_u16; + + if ((features & BIT_ULL(VIRTIO_NET_F_STATUS)) == 0) + return 0; + + val_u16 = __virtio16_to_cpu(true, config->status); + return nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16); +} + static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *msg) { struct virtio_net_config config = {}; u64 features_device; - u16 val_u16; vdev->config->get_config(vdev, 0, &config, sizeof(config)); - val_u16 = __virtio16_to_cpu(true, config.status); - if (nla_put_u16(msg, VDPA_ATTR_DEV_NET_STATUS, val_u16)) - return -EMSGSIZE; - features_device = vdev->config->get_device_features(vdev); if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_FEATURES, features_device, @@ -865,6 +938,9 @@ static int vdpa_dev_net_config_fill(struct vdpa_device *vdev, struct sk_buff *ms if (vdpa_dev_net_mac_config_fill(msg, features_device, &config)) return -EMSGSIZE; + if (vdpa_dev_net_status_config_fill(msg, features_device, &config)) + return -EMSGSIZE; + return vdpa_dev_net_mq_config_fill(msg, features_device, &config); } @@ -1011,7 +1087,7 @@ static int vdpa_dev_vendor_stats_fill(struct vdpa_device *vdev, switch (device_id) { case VIRTIO_ID_NET: if (index > VIRTIO_NET_CTRL_MQ_VQ_PAIRS_MAX) { - NL_SET_ERR_MSG_MOD(info->extack, "queue index excceeds max value"); + NL_SET_ERR_MSG_MOD(info->extack, "queue index exceeds max value"); err = -ERANGE; break; } diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index cb88891b44a8..6a0a65814626 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -17,7 +17,6 @@ #include <linux/vringh.h> #include <linux/vdpa.h> #include <linux/vhost_iotlb.h> -#include <linux/iova.h> #include <uapi/linux/vdpa.h> #include "vdpa_sim.h" @@ -45,13 +44,6 @@ static struct vdpasim *vdpa_to_sim(struct vdpa_device *vdpa) return container_of(vdpa, struct vdpasim, vdpa); } -static struct vdpasim *dev_to_sim(struct device *dev) -{ - struct vdpa_device *vdpa = dev_to_vdpa(dev); - - return vdpa_to_sim(vdpa); -} - static void vdpasim_vq_notify(struct vringh *vring) { struct vdpasim_virtqueue *vq = @@ -66,14 +58,16 @@ static void vdpasim_vq_notify(struct vringh *vring) static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) { struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + uint16_t last_avail_idx = vq->vring.last_avail_idx; - vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, false, + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, true, (struct vring_desc *)(uintptr_t)vq->desc_addr, (struct vring_avail *) (uintptr_t)vq->driver_addr, (struct vring_used *) (uintptr_t)vq->device_addr); + vq->vring.last_avail_idx = last_avail_idx; vq->vring.notify = vdpasim_vq_notify; } @@ -104,8 +98,12 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim) &vdpasim->iommu_lock); } - for (i = 0; i < vdpasim->dev_attr.nas; i++) + for (i = 0; i < vdpasim->dev_attr.nas; i++) { vhost_iotlb_reset(&vdpasim->iommu[i]); + vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX, + 0, VHOST_MAP_RW); + vdpasim->iommu_pt[i] = true; + } vdpasim->running = true; spin_unlock(&vdpasim->iommu_lock); @@ -115,133 +113,6 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim) ++vdpasim->generation; } -static int dir_to_perm(enum dma_data_direction dir) -{ - int perm = -EFAULT; - - switch (dir) { - case DMA_FROM_DEVICE: - perm = VHOST_MAP_WO; - break; - case DMA_TO_DEVICE: - perm = VHOST_MAP_RO; - break; - case DMA_BIDIRECTIONAL: - perm = VHOST_MAP_RW; - break; - default: - break; - } - - return perm; -} - -static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr, - size_t size, unsigned int perm) -{ - struct iova *iova; - dma_addr_t dma_addr; - int ret; - - /* We set the limit_pfn to the maximum (ULONG_MAX - 1) */ - iova = alloc_iova(&vdpasim->iova, size >> iova_shift(&vdpasim->iova), - ULONG_MAX - 1, true); - if (!iova) - return DMA_MAPPING_ERROR; - - dma_addr = iova_dma_addr(&vdpasim->iova, iova); - - spin_lock(&vdpasim->iommu_lock); - ret = vhost_iotlb_add_range(&vdpasim->iommu[0], (u64)dma_addr, - (u64)dma_addr + size - 1, (u64)paddr, perm); - spin_unlock(&vdpasim->iommu_lock); - - if (ret) { - __free_iova(&vdpasim->iova, iova); - return DMA_MAPPING_ERROR; - } - - return dma_addr; -} - -static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr, - size_t size) -{ - spin_lock(&vdpasim->iommu_lock); - vhost_iotlb_del_range(&vdpasim->iommu[0], (u64)dma_addr, - (u64)dma_addr + size - 1); - spin_unlock(&vdpasim->iommu_lock); - - free_iova(&vdpasim->iova, iova_pfn(&vdpasim->iova, dma_addr)); -} - -static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, - enum dma_data_direction dir, - unsigned long attrs) -{ - struct vdpasim *vdpasim = dev_to_sim(dev); - phys_addr_t paddr = page_to_phys(page) + offset; - int perm = dir_to_perm(dir); - - if (perm < 0) - return DMA_MAPPING_ERROR; - - return vdpasim_map_range(vdpasim, paddr, size, perm); -} - -static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, - size_t size, enum dma_data_direction dir, - unsigned long attrs) -{ - struct vdpasim *vdpasim = dev_to_sim(dev); - - vdpasim_unmap_range(vdpasim, dma_addr, size); -} - -static void *vdpasim_alloc_coherent(struct device *dev, size_t size, - dma_addr_t *dma_addr, gfp_t flag, - unsigned long attrs) -{ - struct vdpasim *vdpasim = dev_to_sim(dev); - phys_addr_t paddr; - void *addr; - - addr = kmalloc(size, flag); - if (!addr) { - *dma_addr = DMA_MAPPING_ERROR; - return NULL; - } - - paddr = virt_to_phys(addr); - - *dma_addr = vdpasim_map_range(vdpasim, paddr, size, VHOST_MAP_RW); - if (*dma_addr == DMA_MAPPING_ERROR) { - kfree(addr); - return NULL; - } - - return addr; -} - -static void vdpasim_free_coherent(struct device *dev, size_t size, - void *vaddr, dma_addr_t dma_addr, - unsigned long attrs) -{ - struct vdpasim *vdpasim = dev_to_sim(dev); - - vdpasim_unmap_range(vdpasim, dma_addr, size); - - kfree(vaddr); -} - -static const struct dma_map_ops vdpasim_dma_ops = { - .map_page = vdpasim_map_page, - .unmap_page = vdpasim_unmap_page, - .alloc = vdpasim_alloc_coherent, - .free = vdpasim_free_coherent, -}; - static const struct vdpa_config_ops vdpasim_config_ops; static const struct vdpa_config_ops vdpasim_batch_config_ops; @@ -249,10 +120,14 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, const struct vdpa_dev_set_config *config) { const struct vdpa_config_ops *ops; + struct vdpa_device *vdpa; struct vdpasim *vdpasim; struct device *dev; int i, ret = -ENOMEM; + if (!dev_attr->alloc_size) + return ERR_PTR(-EINVAL); + if (config->mask & BIT_ULL(VDPA_ATTR_DEV_FEATURES)) { if (config->device_features & ~dev_attr->supported_features) @@ -266,14 +141,16 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, else ops = &vdpasim_config_ops; - vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, - dev_attr->ngroups, dev_attr->nas, - dev_attr->name, false); - if (IS_ERR(vdpasim)) { - ret = PTR_ERR(vdpasim); + vdpa = __vdpa_alloc_device(NULL, ops, + dev_attr->ngroups, dev_attr->nas, + dev_attr->alloc_size, + dev_attr->name, false); + if (IS_ERR(vdpa)) { + ret = PTR_ERR(vdpa); goto err_alloc; } + vdpasim = vdpa_to_sim(vdpa); vdpasim->dev_attr = *dev_attr; INIT_WORK(&vdpasim->work, dev_attr->work_fn); spin_lock_init(&vdpasim->lock); @@ -283,7 +160,6 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, dev->dma_mask = &dev->coherent_dma_mask; if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64))) goto err_iommu; - set_dma_ops(dev, &vdpasim_dma_ops); vdpasim->vdpa.mdev = dev_attr->mgmt_dev; vdpasim->config = kzalloc(dev_attr->config_size, GFP_KERNEL); @@ -300,6 +176,11 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, if (!vdpasim->iommu) goto err_iommu; + vdpasim->iommu_pt = kmalloc_array(vdpasim->dev_attr.nas, + sizeof(*vdpasim->iommu_pt), GFP_KERNEL); + if (!vdpasim->iommu_pt) + goto err_iommu; + for (i = 0; i < vdpasim->dev_attr.nas; i++) vhost_iotlb_init(&vdpasim->iommu[i], max_iotlb_entries, 0); @@ -311,13 +192,6 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr, vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0], &vdpasim->iommu_lock); - ret = iova_cache_get(); - if (ret) - goto err_iommu; - - /* For simplicity we use an IOVA allocator with byte granularity */ - init_iova_domain(&vdpasim->iova, 1, 0); - vdpasim->vdpa.dma_dev = dev; return vdpasim; @@ -356,6 +230,12 @@ static void vdpasim_kick_vq(struct vdpa_device *vdpa, u16 idx) struct vdpasim *vdpasim = vdpa_to_sim(vdpa); struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx]; + if (!vdpasim->running && + (vdpasim->status & VIRTIO_CONFIG_S_DRIVER_OK)) { + vdpasim->pending_kick = true; + return; + } + if (vq->ready) schedule_work(&vdpasim->work); } @@ -418,6 +298,18 @@ static int vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx, return 0; } +static int vdpasim_get_vq_stats(struct vdpa_device *vdpa, u16 idx, + struct sk_buff *msg, + struct netlink_ext_ack *extack) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + + if (vdpasim->dev_attr.get_stats) + return vdpasim->dev_attr.get_stats(vdpasim, idx, + msg, extack); + return -EOPNOTSUPP; +} + static u32 vdpasim_get_vq_align(struct vdpa_device *vdpa) { return VDPASIM_QUEUE_ALIGN; @@ -526,6 +418,27 @@ static int vdpasim_suspend(struct vdpa_device *vdpa) return 0; } +static int vdpasim_resume(struct vdpa_device *vdpa) +{ + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); + int i; + + spin_lock(&vdpasim->lock); + vdpasim->running = true; + + if (vdpasim->pending_kick) { + /* Process pending descriptors */ + for (i = 0; i < vdpasim->dev_attr.nvqs; ++i) + vdpasim_kick_vq(vdpa, i); + + vdpasim->pending_kick = false; + } + + spin_unlock(&vdpasim->lock); + + return 0; +} + static size_t vdpasim_get_config_size(struct vdpa_device *vdpa) { struct vdpasim *vdpasim = vdpa_to_sim(vdpa); @@ -621,6 +534,7 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid, iommu = &vdpasim->iommu[asid]; vhost_iotlb_reset(iommu); + vdpasim->iommu_pt[asid] = false; for (map = vhost_iotlb_itree_first(iotlb, start, last); map; map = vhost_iotlb_itree_next(map, start, last)) { @@ -649,6 +563,10 @@ static int vdpasim_dma_map(struct vdpa_device *vdpa, unsigned int asid, return -EINVAL; spin_lock(&vdpasim->iommu_lock); + if (vdpasim->iommu_pt[asid]) { + vhost_iotlb_reset(&vdpasim->iommu[asid]); + vdpasim->iommu_pt[asid] = false; + } ret = vhost_iotlb_add_range_ctx(&vdpasim->iommu[asid], iova, iova + size - 1, pa, perm, opaque); spin_unlock(&vdpasim->iommu_lock); @@ -664,6 +582,11 @@ static int vdpasim_dma_unmap(struct vdpa_device *vdpa, unsigned int asid, if (asid >= vdpasim->dev_attr.nas) return -EINVAL; + if (vdpasim->iommu_pt[asid]) { + vhost_iotlb_reset(&vdpasim->iommu[asid]); + vdpasim->iommu_pt[asid] = false; + } + spin_lock(&vdpasim->iommu_lock); vhost_iotlb_del_range(&vdpasim->iommu[asid], iova, iova + size - 1); spin_unlock(&vdpasim->iommu_lock); @@ -683,15 +606,11 @@ static void vdpasim_free(struct vdpa_device *vdpa) vringh_kiov_cleanup(&vdpasim->vqs[i].in_iov); } - if (vdpa_get_dma_dev(vdpa)) { - put_iova_domain(&vdpasim->iova); - iova_cache_put(); - } - kvfree(vdpasim->buffer); for (i = 0; i < vdpasim->dev_attr.nas; i++) vhost_iotlb_reset(&vdpasim->iommu[i]); kfree(vdpasim->iommu); + kfree(vdpasim->iommu_pt); kfree(vdpasim->vqs); kfree(vdpasim->config); } @@ -704,6 +623,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = { .set_vq_ready = vdpasim_set_vq_ready, .get_vq_ready = vdpasim_get_vq_ready, .set_vq_state = vdpasim_set_vq_state, + .get_vendor_vq_stats = vdpasim_get_vq_stats, .get_vq_state = vdpasim_get_vq_state, .get_vq_align = vdpasim_get_vq_align, .get_vq_group = vdpasim_get_vq_group, @@ -718,6 +638,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = { .set_status = vdpasim_set_status, .reset = vdpasim_reset, .suspend = vdpasim_suspend, + .resume = vdpasim_resume, .get_config_size = vdpasim_get_config_size, .get_config = vdpasim_get_config, .set_config = vdpasim_set_config, @@ -737,6 +658,7 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = { .set_vq_ready = vdpasim_set_vq_ready, .get_vq_ready = vdpasim_get_vq_ready, .set_vq_state = vdpasim_set_vq_state, + .get_vendor_vq_stats = vdpasim_get_vq_stats, .get_vq_state = vdpasim_get_vq_state, .get_vq_align = vdpasim_get_vq_align, .get_vq_group = vdpasim_get_vq_group, @@ -751,6 +673,7 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = { .set_status = vdpasim_set_status, .reset = vdpasim_reset, .suspend = vdpasim_suspend, + .resume = vdpasim_resume, .get_config_size = vdpasim_get_config_size, .get_config = vdpasim_get_config, .set_config = vdpasim_set_config, diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h index 0e78737dcc16..144858636c10 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h @@ -37,6 +37,7 @@ struct vdpasim_dev_attr { struct vdpa_mgmt_dev *mgmt_dev; const char *name; u64 supported_features; + size_t alloc_size; size_t config_size; size_t buffer_size; int nvqs; @@ -47,6 +48,9 @@ struct vdpasim_dev_attr { work_func_t work_fn; void (*get_config)(struct vdpasim *vdpasim, void *config); void (*set_config)(struct vdpasim *vdpasim, const void *config); + int (*get_stats)(struct vdpasim *vdpasim, u16 idx, + struct sk_buff *msg, + struct netlink_ext_ack *extack); }; /* State of each vdpasim device */ @@ -60,13 +64,14 @@ struct vdpasim { /* virtio config according to device type */ void *config; struct vhost_iotlb *iommu; - struct iova_domain iova; + bool *iommu_pt; void *buffer; u32 status; u32 generation; u64 features; u32 groups; bool running; + bool pending_kick; /* spinlock to synchronize iommu table */ spinlock_t iommu_lock; }; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c index f745926237a8..5117959bed8a 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c @@ -378,6 +378,7 @@ static int vdpasim_blk_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, dev_attr.nvqs = VDPASIM_BLK_VQ_NUM; dev_attr.ngroups = VDPASIM_BLK_GROUP_NUM; dev_attr.nas = VDPASIM_BLK_AS_NUM; + dev_attr.alloc_size = sizeof(struct vdpasim); dev_attr.config_size = sizeof(struct virtio_blk_config); dev_attr.get_config = vdpasim_blk_get_config; dev_attr.work_fn = vdpasim_blk_work; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c index 584b975a98a7..862f405362de 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c @@ -15,6 +15,7 @@ #include <linux/etherdevice.h> #include <linux/vringh.h> #include <linux/vdpa.h> +#include <net/netlink.h> #include <uapi/linux/virtio_net.h> #include <uapi/linux/vdpa.h> @@ -27,6 +28,7 @@ #define VDPASIM_NET_FEATURES (VDPASIM_FEATURES | \ (1ULL << VIRTIO_NET_F_MAC) | \ + (1ULL << VIRTIO_NET_F_STATUS) | \ (1ULL << VIRTIO_NET_F_MTU) | \ (1ULL << VIRTIO_NET_F_CTRL_VQ) | \ (1ULL << VIRTIO_NET_F_CTRL_MAC_ADDR)) @@ -36,6 +38,34 @@ #define VDPASIM_NET_AS_NUM 2 #define VDPASIM_NET_GROUP_NUM 2 +struct vdpasim_dataq_stats { + struct u64_stats_sync syncp; + u64 pkts; + u64 bytes; + u64 drops; + u64 errors; + u64 overruns; +}; + +struct vdpasim_cq_stats { + struct u64_stats_sync syncp; + u64 requests; + u64 successes; + u64 errors; +}; + +struct vdpasim_net{ + struct vdpasim vdpasim; + struct vdpasim_dataq_stats tx_stats; + struct vdpasim_dataq_stats rx_stats; + struct vdpasim_cq_stats cq_stats; +}; + +static struct vdpasim_net *sim_to_net(struct vdpasim *vdpasim) +{ + return container_of(vdpasim, struct vdpasim_net, vdpasim); +} + static void vdpasim_net_complete(struct vdpasim_virtqueue *vq, size_t len) { /* Make sure data is wrote before advancing index */ @@ -96,9 +126,11 @@ static virtio_net_ctrl_ack vdpasim_handle_ctrl_mac(struct vdpasim *vdpasim, static void vdpasim_handle_cvq(struct vdpasim *vdpasim) { struct vdpasim_virtqueue *cvq = &vdpasim->vqs[2]; + struct vdpasim_net *net = sim_to_net(vdpasim); virtio_net_ctrl_ack status = VIRTIO_NET_ERR; struct virtio_net_ctrl_hdr ctrl; size_t read, write; + u64 requests = 0, errors = 0, successes = 0; int err; if (!(vdpasim->features & (1ULL << VIRTIO_NET_F_CTRL_VQ))) @@ -114,10 +146,13 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim) if (err <= 0) break; + ++requests; read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->in_iov, &ctrl, sizeof(ctrl)); - if (read != sizeof(ctrl)) + if (read != sizeof(ctrl)) { + ++errors; break; + } switch (ctrl.class) { case VIRTIO_NET_CTRL_MAC: @@ -127,6 +162,11 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim) break; } + if (status == VIRTIO_NET_OK) + ++successes; + else + ++errors; + /* Make sure data is wrote before advancing index */ smp_wmb(); @@ -144,6 +184,12 @@ static void vdpasim_handle_cvq(struct vdpasim *vdpasim) cvq->cb(cvq->private); local_bh_enable(); } + + u64_stats_update_begin(&net->cq_stats.syncp); + net->cq_stats.requests += requests; + net->cq_stats.errors += errors; + net->cq_stats.successes += successes; + u64_stats_update_end(&net->cq_stats.syncp); } static void vdpasim_net_work(struct work_struct *work) @@ -151,8 +197,10 @@ static void vdpasim_net_work(struct work_struct *work) struct vdpasim *vdpasim = container_of(work, struct vdpasim, work); struct vdpasim_virtqueue *txq = &vdpasim->vqs[1]; struct vdpasim_virtqueue *rxq = &vdpasim->vqs[0]; + struct vdpasim_net *net = sim_to_net(vdpasim); ssize_t read, write; - int pkts = 0; + u64 tx_pkts = 0, rx_pkts = 0, tx_bytes = 0, rx_bytes = 0; + u64 rx_drops = 0, rx_overruns = 0, rx_errors = 0, tx_errors = 0; int err; spin_lock(&vdpasim->lock); @@ -171,14 +219,21 @@ static void vdpasim_net_work(struct work_struct *work) while (true) { err = vringh_getdesc_iotlb(&txq->vring, &txq->out_iov, NULL, &txq->head, GFP_ATOMIC); - if (err <= 0) + if (err <= 0) { + if (err) + ++tx_errors; break; + } + ++tx_pkts; read = vringh_iov_pull_iotlb(&txq->vring, &txq->out_iov, vdpasim->buffer, PAGE_SIZE); + tx_bytes += read; + if (!receive_filter(vdpasim, read)) { + ++rx_drops; vdpasim_net_complete(txq, 0); continue; } @@ -186,19 +241,25 @@ static void vdpasim_net_work(struct work_struct *work) err = vringh_getdesc_iotlb(&rxq->vring, NULL, &rxq->in_iov, &rxq->head, GFP_ATOMIC); if (err <= 0) { + ++rx_overruns; vdpasim_net_complete(txq, 0); break; } write = vringh_iov_push_iotlb(&rxq->vring, &rxq->in_iov, vdpasim->buffer, read); - if (write <= 0) + if (write <= 0) { + ++rx_errors; break; + } + + ++rx_pkts; + rx_bytes += write; vdpasim_net_complete(txq, 0); vdpasim_net_complete(rxq, write); - if (++pkts > 4) { + if (tx_pkts > 4) { schedule_work(&vdpasim->work); goto out; } @@ -206,6 +267,145 @@ static void vdpasim_net_work(struct work_struct *work) out: spin_unlock(&vdpasim->lock); + + u64_stats_update_begin(&net->tx_stats.syncp); + net->tx_stats.pkts += tx_pkts; + net->tx_stats.bytes += tx_bytes; + net->tx_stats.errors += tx_errors; + u64_stats_update_end(&net->tx_stats.syncp); + + u64_stats_update_begin(&net->rx_stats.syncp); + net->rx_stats.pkts += rx_pkts; + net->rx_stats.bytes += rx_bytes; + net->rx_stats.drops += rx_drops; + net->rx_stats.errors += rx_errors; + net->rx_stats.overruns += rx_overruns; + u64_stats_update_end(&net->rx_stats.syncp); +} + +static int vdpasim_net_get_stats(struct vdpasim *vdpasim, u16 idx, + struct sk_buff *msg, + struct netlink_ext_ack *extack) +{ + struct vdpasim_net *net = sim_to_net(vdpasim); + u64 rx_pkts, rx_bytes, rx_errors, rx_overruns, rx_drops; + u64 tx_pkts, tx_bytes, tx_errors, tx_drops; + u64 cq_requests, cq_successes, cq_errors; + unsigned int start; + int err = -EMSGSIZE; + + switch(idx) { + case 0: + do { + start = u64_stats_fetch_begin(&net->rx_stats.syncp); + rx_pkts = net->rx_stats.pkts; + rx_bytes = net->rx_stats.bytes; + rx_errors = net->rx_stats.errors; + rx_overruns = net->rx_stats.overruns; + rx_drops = net->rx_stats.drops; + } while (u64_stats_fetch_retry(&net->rx_stats.syncp, start)); + + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "rx packets")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + rx_pkts, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "rx bytes")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + rx_bytes, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "rx errors")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + rx_errors, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "rx overruns")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + rx_overruns, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "rx drops")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + rx_drops, VDPA_ATTR_PAD)) + break; + err = 0; + break; + case 1: + do { + start = u64_stats_fetch_begin(&net->tx_stats.syncp); + tx_pkts = net->tx_stats.pkts; + tx_bytes = net->tx_stats.bytes; + tx_errors = net->tx_stats.errors; + tx_drops = net->tx_stats.drops; + } while (u64_stats_fetch_retry(&net->tx_stats.syncp, start)); + + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "tx packets")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + tx_pkts, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "tx bytes")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + tx_bytes, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "tx errors")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + tx_errors, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "tx drops")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + tx_drops, VDPA_ATTR_PAD)) + break; + err = 0; + break; + case 2: + do { + start = u64_stats_fetch_begin(&net->cq_stats.syncp); + cq_requests = net->cq_stats.requests; + cq_successes = net->cq_stats.successes; + cq_errors = net->cq_stats.errors; + } while (u64_stats_fetch_retry(&net->cq_stats.syncp, start)); + + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "cvq requests")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + cq_requests, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "cvq successes")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + cq_successes, VDPA_ATTR_PAD)) + break; + if (nla_put_string(msg, VDPA_ATTR_DEV_VENDOR_ATTR_NAME, + "cvq errors")) + break; + if (nla_put_u64_64bit(msg, VDPA_ATTR_DEV_VENDOR_ATTR_VALUE, + cq_errors, VDPA_ATTR_PAD)) + break; + err = 0; + break; + default: + err = -EINVAL; + break; + } + + return err; } static void vdpasim_net_get_config(struct vdpasim *vdpasim, void *config) @@ -242,6 +442,7 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, const struct vdpa_dev_set_config *config) { struct vdpasim_dev_attr dev_attr = {}; + struct vdpasim_net *net; struct vdpasim *simdev; int ret; @@ -252,9 +453,11 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, dev_attr.nvqs = VDPASIM_NET_VQ_NUM; dev_attr.ngroups = VDPASIM_NET_GROUP_NUM; dev_attr.nas = VDPASIM_NET_AS_NUM; + dev_attr.alloc_size = sizeof(struct vdpasim_net); dev_attr.config_size = sizeof(struct virtio_net_config); dev_attr.get_config = vdpasim_net_get_config; dev_attr.work_fn = vdpasim_net_work; + dev_attr.get_stats = vdpasim_net_get_stats; dev_attr.buffer_size = PAGE_SIZE; simdev = vdpasim_create(&dev_attr, config); @@ -267,6 +470,12 @@ static int vdpasim_net_dev_add(struct vdpa_mgmt_dev *mdev, const char *name, if (ret) goto reg_err; + net = sim_to_net(simdev); + + u64_stats_init(&net->tx_stats.syncp); + u64_stats_init(&net->rx_stats.syncp); + u64_stats_init(&net->cq_stats.syncp); + return 0; reg_err: diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 4c538b30fd76..07181cd8d52e 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -73,7 +73,8 @@ enum { VHOST_NET_FEATURES = VHOST_FEATURES | (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) | (1ULL << VIRTIO_NET_F_MRG_RXBUF) | - (1ULL << VIRTIO_F_ACCESS_PLATFORM) + (1ULL << VIRTIO_F_ACCESS_PLATFORM) | + (1ULL << VIRTIO_F_RING_RESET) }; enum { @@ -1645,7 +1646,7 @@ static int vhost_net_set_features(struct vhost_net *n, u64 features) goto out_unlock; if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) { - if (vhost_init_device_iotlb(&n->dev, true)) + if (vhost_init_device_iotlb(&n->dev)) goto out_unlock; } diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index d5ecb8876fc9..b244e7c0f514 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -2105,7 +2105,7 @@ static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_show( struct vhost_scsi_tpg *tpg = container_of(se_tpg, struct vhost_scsi_tpg, se_tpg); - return sprintf(page, "%d\n", tpg->tv_fabric_prot_type); + return sysfs_emit(page, "%d\n", tpg->tv_fabric_prot_type); } CONFIGFS_ATTR(vhost_scsi_tpg_attrib_, fabric_prot_type); @@ -2215,7 +2215,7 @@ static ssize_t vhost_scsi_tpg_nexus_show(struct config_item *item, char *page) mutex_unlock(&tpg->tv_tpg_mutex); return -ENODEV; } - ret = snprintf(page, PAGE_SIZE, "%s\n", + ret = sysfs_emit(page, "%s\n", tv_nexus->tvn_se_sess->se_node_acl->initiatorname); mutex_unlock(&tpg->tv_tpg_mutex); @@ -2440,7 +2440,7 @@ static void vhost_scsi_drop_tport(struct se_wwn *wwn) static ssize_t vhost_scsi_wwn_version_show(struct config_item *item, char *page) { - return sprintf(page, "TCM_VHOST fabric module %s on %s/%s" + return sysfs_emit(page, "TCM_VHOST fabric module %s on %s/%s" "on "UTS_RELEASE"\n", VHOST_SCSI_VERSION, utsname()->sysname, utsname()->machine); } diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c index bc8e7fb1e635..42c955a5b211 100644 --- a/drivers/vhost/test.c +++ b/drivers/vhost/test.c @@ -333,13 +333,10 @@ static long vhost_test_ioctl(struct file *f, unsigned int ioctl, return -EFAULT; return 0; case VHOST_SET_FEATURES: - printk(KERN_ERR "1\n"); if (copy_from_user(&features, featurep, sizeof features)) return -EFAULT; - printk(KERN_ERR "2\n"); if (features & ~VHOST_FEATURES) return -EOPNOTSUPP; - printk(KERN_ERR "3\n"); return vhost_test_set_features(n, features); case VHOST_RESET_OWNER: return vhost_test_reset_owner(n); diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c index 315f9ba47ff2..dc12dbd5b43b 100644 --- a/drivers/vhost/vdpa.c +++ b/drivers/vhost/vdpa.c @@ -359,6 +359,14 @@ static bool vhost_vdpa_can_suspend(const struct vhost_vdpa *v) return ops->suspend; } +static bool vhost_vdpa_can_resume(const struct vhost_vdpa *v) +{ + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + + return ops->resume; +} + static long vhost_vdpa_get_features(struct vhost_vdpa *v, u64 __user *featurep) { struct vdpa_device *vdpa = v->vdpa; @@ -498,6 +506,21 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v) return ops->suspend(vdpa); } +/* After a successful return of this ioctl the device resumes processing + * virtqueue descriptors. The device becomes fully operational the same way it + * was before it was suspended. + */ +static long vhost_vdpa_resume(struct vhost_vdpa *v) +{ + struct vdpa_device *vdpa = v->vdpa; + const struct vdpa_config_ops *ops = vdpa->config; + + if (!ops->resume) + return -EOPNOTSUPP; + + return ops->resume(vdpa); +} + static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, void __user *argp) { @@ -606,11 +629,15 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, if (copy_from_user(&features, featurep, sizeof(features))) return -EFAULT; if (features & ~(VHOST_VDPA_BACKEND_FEATURES | - BIT_ULL(VHOST_BACKEND_F_SUSPEND))) + BIT_ULL(VHOST_BACKEND_F_SUSPEND) | + BIT_ULL(VHOST_BACKEND_F_RESUME))) return -EOPNOTSUPP; if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) && !vhost_vdpa_can_suspend(v)) return -EOPNOTSUPP; + if ((features & BIT_ULL(VHOST_BACKEND_F_RESUME)) && + !vhost_vdpa_can_resume(v)) + return -EOPNOTSUPP; vhost_set_backend_features(&v->vdev, features); return 0; } @@ -662,6 +689,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, features = VHOST_VDPA_BACKEND_FEATURES; if (vhost_vdpa_can_suspend(v)) features |= BIT_ULL(VHOST_BACKEND_F_SUSPEND); + if (vhost_vdpa_can_resume(v)) + features |= BIT_ULL(VHOST_BACKEND_F_RESUME); if (copy_to_user(featurep, &features, sizeof(features))) r = -EFAULT; break; @@ -677,6 +706,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, case VHOST_VDPA_SUSPEND: r = vhost_vdpa_suspend(v); break; + case VHOST_VDPA_RESUME: + r = vhost_vdpa_resume(v); + break; default: r = vhost_dev_ioctl(&v->vdev, cmd, argp); if (r == -ENOIOCTLCMD) @@ -1119,8 +1151,11 @@ static int vhost_vdpa_alloc_domain(struct vhost_vdpa *v) if (!bus) return -EFAULT; - if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY)) + if (!device_iommu_capable(dma_dev, IOMMU_CAP_CACHE_COHERENCY)) { + dev_warn_once(&v->dev, + "Failed to allocate domain, device is not IOMMU cache coherent capable\n"); return -ENOTSUPP; + } v->domain = iommu_domain_alloc(bus); if (!v->domain) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 43c9770b86e5..f11bdbe4c2c5 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1730,7 +1730,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg } EXPORT_SYMBOL_GPL(vhost_vring_ioctl); -int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled) +int vhost_init_device_iotlb(struct vhost_dev *d) { struct vhost_iotlb *niotlb, *oiotlb; int i; diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 790b296271f1..1647b750169c 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -222,7 +222,7 @@ ssize_t vhost_chr_read_iter(struct vhost_dev *dev, struct iov_iter *to, int noblock); ssize_t vhost_chr_write_iter(struct vhost_dev *dev, struct iov_iter *from); -int vhost_init_device_iotlb(struct vhost_dev *d, bool enabled); +int vhost_init_device_iotlb(struct vhost_dev *d); void vhost_iotlb_map_free(struct vhost_iotlb *iotlb, struct vhost_iotlb_map *map); diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 1f3b89c885cc..c8e6087769a1 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -793,7 +793,7 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features) } if ((features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) { - if (vhost_init_device_iotlb(&vsock->dev, true)) + if (vhost_init_device_iotlb(&vsock->dev)) goto err; } diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index 723c4e29e1d3..41144b5246a8 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -202,6 +202,9 @@ struct vring_virtqueue { /* DMA, allocation, and size information */ bool we_own_ring; + /* Device used for doing DMA */ + struct device *dma_dev; + #ifdef DEBUG /* They're supposed to lock for us. */ unsigned int in_use; @@ -219,7 +222,8 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index, bool context, bool (*notify)(struct virtqueue *), void (*callback)(struct virtqueue *), - const char *name); + const char *name, + struct device *dma_dev); static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num); static void vring_free(struct virtqueue *_vq); @@ -297,10 +301,11 @@ size_t virtio_max_dma_size(struct virtio_device *vdev) EXPORT_SYMBOL_GPL(virtio_max_dma_size); static void *vring_alloc_queue(struct virtio_device *vdev, size_t size, - dma_addr_t *dma_handle, gfp_t flag) + dma_addr_t *dma_handle, gfp_t flag, + struct device *dma_dev) { if (vring_use_dma_api(vdev)) { - return dma_alloc_coherent(vdev->dev.parent, size, + return dma_alloc_coherent(dma_dev, size, dma_handle, flag); } else { void *queue = alloc_pages_exact(PAGE_ALIGN(size), flag); @@ -330,10 +335,11 @@ static void *vring_alloc_queue(struct virtio_device *vdev, size_t size, } static void vring_free_queue(struct virtio_device *vdev, size_t size, - void *queue, dma_addr_t dma_handle) + void *queue, dma_addr_t dma_handle, + struct device *dma_dev) { if (vring_use_dma_api(vdev)) - dma_free_coherent(vdev->dev.parent, size, queue, dma_handle); + dma_free_coherent(dma_dev, size, queue, dma_handle); else free_pages_exact(queue, PAGE_ALIGN(size)); } @@ -341,11 +347,11 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size, /* * The DMA ops on various arches are rather gnarly right now, and * making all of the arch DMA ops work on the vring device itself - * is a mess. For now, we use the parent device for DMA ops. + * is a mess. */ static inline struct device *vring_dma_dev(const struct vring_virtqueue *vq) { - return vq->vq.vdev->dev.parent; + return vq->dma_dev; } /* Map one sg entry. */ @@ -1032,11 +1038,12 @@ err_state: } static void vring_free_split(struct vring_virtqueue_split *vring_split, - struct virtio_device *vdev) + struct virtio_device *vdev, struct device *dma_dev) { vring_free_queue(vdev, vring_split->queue_size_in_bytes, vring_split->vring.desc, - vring_split->queue_dma_addr); + vring_split->queue_dma_addr, + dma_dev); kfree(vring_split->desc_state); kfree(vring_split->desc_extra); @@ -1046,7 +1053,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split, struct virtio_device *vdev, u32 num, unsigned int vring_align, - bool may_reduce_num) + bool may_reduce_num, + struct device *dma_dev) { void *queue = NULL; dma_addr_t dma_addr; @@ -1061,7 +1069,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split, for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) { queue = vring_alloc_queue(vdev, vring_size(num, vring_align), &dma_addr, - GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); + GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, + dma_dev); if (queue) break; if (!may_reduce_num) @@ -1074,7 +1083,8 @@ static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split, if (!queue) { /* Try to get a single page. You are my only hope! */ queue = vring_alloc_queue(vdev, vring_size(num, vring_align), - &dma_addr, GFP_KERNEL | __GFP_ZERO); + &dma_addr, GFP_KERNEL | __GFP_ZERO, + dma_dev); } if (!queue) return -ENOMEM; @@ -1100,21 +1110,22 @@ static struct virtqueue *vring_create_virtqueue_split( bool context, bool (*notify)(struct virtqueue *), void (*callback)(struct virtqueue *), - const char *name) + const char *name, + struct device *dma_dev) { struct vring_virtqueue_split vring_split = {}; struct virtqueue *vq; int err; err = vring_alloc_queue_split(&vring_split, vdev, num, vring_align, - may_reduce_num); + may_reduce_num, dma_dev); if (err) return NULL; vq = __vring_new_virtqueue(index, &vring_split, vdev, weak_barriers, - context, notify, callback, name); + context, notify, callback, name, dma_dev); if (!vq) { - vring_free_split(&vring_split, vdev); + vring_free_split(&vring_split, vdev, dma_dev); return NULL; } @@ -1132,7 +1143,8 @@ static int virtqueue_resize_split(struct virtqueue *_vq, u32 num) err = vring_alloc_queue_split(&vring_split, vdev, num, vq->split.vring_align, - vq->split.may_reduce_num); + vq->split.may_reduce_num, + vring_dma_dev(vq)); if (err) goto err; @@ -1150,7 +1162,7 @@ static int virtqueue_resize_split(struct virtqueue *_vq, u32 num) return 0; err_state_extra: - vring_free_split(&vring_split, vdev); + vring_free_split(&vring_split, vdev, vring_dma_dev(vq)); err: virtqueue_reinit_split(vq); return -ENOMEM; @@ -1841,22 +1853,26 @@ static struct vring_desc_extra *vring_alloc_desc_extra(unsigned int num) } static void vring_free_packed(struct vring_virtqueue_packed *vring_packed, - struct virtio_device *vdev) + struct virtio_device *vdev, + struct device *dma_dev) { if (vring_packed->vring.desc) vring_free_queue(vdev, vring_packed->ring_size_in_bytes, vring_packed->vring.desc, - vring_packed->ring_dma_addr); + vring_packed->ring_dma_addr, + dma_dev); if (vring_packed->vring.driver) vring_free_queue(vdev, vring_packed->event_size_in_bytes, vring_packed->vring.driver, - vring_packed->driver_event_dma_addr); + vring_packed->driver_event_dma_addr, + dma_dev); if (vring_packed->vring.device) vring_free_queue(vdev, vring_packed->event_size_in_bytes, vring_packed->vring.device, - vring_packed->device_event_dma_addr); + vring_packed->device_event_dma_addr, + dma_dev); kfree(vring_packed->desc_state); kfree(vring_packed->desc_extra); @@ -1864,7 +1880,7 @@ static void vring_free_packed(struct vring_virtqueue_packed *vring_packed, static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed, struct virtio_device *vdev, - u32 num) + u32 num, struct device *dma_dev) { struct vring_packed_desc *ring; struct vring_packed_desc_event *driver, *device; @@ -1875,7 +1891,8 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed, ring = vring_alloc_queue(vdev, ring_size_in_bytes, &ring_dma_addr, - GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); + GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, + dma_dev); if (!ring) goto err; @@ -1887,7 +1904,8 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed, driver = vring_alloc_queue(vdev, event_size_in_bytes, &driver_event_dma_addr, - GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); + GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, + dma_dev); if (!driver) goto err; @@ -1897,7 +1915,8 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed, device = vring_alloc_queue(vdev, event_size_in_bytes, &device_event_dma_addr, - GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO); + GFP_KERNEL | __GFP_NOWARN | __GFP_ZERO, + dma_dev); if (!device) goto err; @@ -1909,7 +1928,7 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed, return 0; err: - vring_free_packed(vring_packed, vdev); + vring_free_packed(vring_packed, vdev, dma_dev); return -ENOMEM; } @@ -1987,13 +2006,14 @@ static struct virtqueue *vring_create_virtqueue_packed( bool context, bool (*notify)(struct virtqueue *), void (*callback)(struct virtqueue *), - const char *name) + const char *name, + struct device *dma_dev) { struct vring_virtqueue_packed vring_packed = {}; struct vring_virtqueue *vq; int err; - if (vring_alloc_queue_packed(&vring_packed, vdev, num)) + if (vring_alloc_queue_packed(&vring_packed, vdev, num, dma_dev)) goto err_ring; vq = kmalloc(sizeof(*vq), GFP_KERNEL); @@ -2014,6 +2034,7 @@ static struct virtqueue *vring_create_virtqueue_packed( vq->broken = false; #endif vq->packed_ring = true; + vq->dma_dev = dma_dev; vq->use_dma_api = vring_use_dma_api(vdev); vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && @@ -2040,7 +2061,7 @@ static struct virtqueue *vring_create_virtqueue_packed( err_state_extra: kfree(vq); err_vq: - vring_free_packed(&vring_packed, vdev); + vring_free_packed(&vring_packed, vdev, dma_dev); err_ring: return NULL; } @@ -2052,7 +2073,7 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num) struct virtio_device *vdev = _vq->vdev; int err; - if (vring_alloc_queue_packed(&vring_packed, vdev, num)) + if (vring_alloc_queue_packed(&vring_packed, vdev, num, vring_dma_dev(vq))) goto err_ring; err = vring_alloc_state_extra_packed(&vring_packed); @@ -2069,7 +2090,7 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num) return 0; err_state_extra: - vring_free_packed(&vring_packed, vdev); + vring_free_packed(&vring_packed, vdev, vring_dma_dev(vq)); err_ring: virtqueue_reinit_packed(vq); return -ENOMEM; @@ -2481,7 +2502,8 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index, bool context, bool (*notify)(struct virtqueue *), void (*callback)(struct virtqueue *), - const char *name) + const char *name, + struct device *dma_dev) { struct vring_virtqueue *vq; int err; @@ -2507,6 +2529,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index, #else vq->broken = false; #endif + vq->dma_dev = dma_dev; vq->use_dma_api = vring_use_dma_api(vdev); vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) && @@ -2549,14 +2572,39 @@ struct virtqueue *vring_create_virtqueue( if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED)) return vring_create_virtqueue_packed(index, num, vring_align, vdev, weak_barriers, may_reduce_num, - context, notify, callback, name); + context, notify, callback, name, vdev->dev.parent); return vring_create_virtqueue_split(index, num, vring_align, vdev, weak_barriers, may_reduce_num, - context, notify, callback, name); + context, notify, callback, name, vdev->dev.parent); } EXPORT_SYMBOL_GPL(vring_create_virtqueue); +struct virtqueue *vring_create_virtqueue_dma( + unsigned int index, + unsigned int num, + unsigned int vring_align, + struct virtio_device *vdev, + bool weak_barriers, + bool may_reduce_num, + bool context, + bool (*notify)(struct virtqueue *), + void (*callback)(struct virtqueue *), + const char *name, + struct device *dma_dev) +{ + + if (virtio_has_feature(vdev, VIRTIO_F_RING_PACKED)) + return vring_create_virtqueue_packed(index, num, vring_align, + vdev, weak_barriers, may_reduce_num, + context, notify, callback, name, dma_dev); + + return vring_create_virtqueue_split(index, num, vring_align, + vdev, weak_barriers, may_reduce_num, + context, notify, callback, name, dma_dev); +} +EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma); + /** * virtqueue_resize - resize the vring of vq * @_vq: the struct virtqueue we're talking about. @@ -2645,7 +2693,8 @@ struct virtqueue *vring_new_virtqueue(unsigned int index, vring_init(&vring_split.vring, num, pages, vring_align); return __vring_new_virtqueue(index, &vring_split, vdev, weak_barriers, - context, notify, callback, name); + context, notify, callback, name, + vdev->dev.parent); } EXPORT_SYMBOL_GPL(vring_new_virtqueue); @@ -2658,17 +2707,20 @@ static void vring_free(struct virtqueue *_vq) vring_free_queue(vq->vq.vdev, vq->packed.ring_size_in_bytes, vq->packed.vring.desc, - vq->packed.ring_dma_addr); + vq->packed.ring_dma_addr, + vring_dma_dev(vq)); vring_free_queue(vq->vq.vdev, vq->packed.event_size_in_bytes, vq->packed.vring.driver, - vq->packed.driver_event_dma_addr); + vq->packed.driver_event_dma_addr, + vring_dma_dev(vq)); vring_free_queue(vq->vq.vdev, vq->packed.event_size_in_bytes, vq->packed.vring.device, - vq->packed.device_event_dma_addr); + vq->packed.device_event_dma_addr, + vring_dma_dev(vq)); kfree(vq->packed.desc_state); kfree(vq->packed.desc_extra); @@ -2676,7 +2728,8 @@ static void vring_free(struct virtqueue *_vq) vring_free_queue(vq->vq.vdev, vq->split.queue_size_in_bytes, vq->split.vring.desc, - vq->split.queue_dma_addr); + vq->split.queue_dma_addr, + vring_dma_dev(vq)); } } if (!vq->packed_ring) { diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c index 9670cc79371d..d7f5af62ddaa 100644 --- a/drivers/virtio/virtio_vdpa.c +++ b/drivers/virtio/virtio_vdpa.c @@ -135,6 +135,7 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index, { struct virtio_vdpa_device *vd_dev = to_virtio_vdpa_device(vdev); struct vdpa_device *vdpa = vd_get_vdpa(vdev); + struct device *dma_dev; const struct vdpa_config_ops *ops = vdpa->config; struct virtio_vdpa_vq_info *info; struct vdpa_callback cb; @@ -175,9 +176,15 @@ virtio_vdpa_setup_vq(struct virtio_device *vdev, unsigned int index, /* Create the vring */ align = ops->get_vq_align(vdpa); - vq = vring_create_virtqueue(index, max_num, align, vdev, - true, may_reduce_num, ctx, - virtio_vdpa_notify, callback, name); + + if (ops->get_vq_dma_dev) + dma_dev = ops->get_vq_dma_dev(vdpa, index); + else + dma_dev = vdpa_get_dma_dev(vdpa); + vq = vring_create_virtqueue_dma(index, max_num, align, vdev, + true, may_reduce_num, ctx, + virtio_vdpa_notify, callback, + name, dma_dev); if (!vq) { err = -ENOMEM; goto error_new_virtqueue; diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index bc8f484cdcf3..45c3d62e616d 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -3094,6 +3094,8 @@ #define PCI_VENDOR_ID_3COM_2 0xa727 +#define PCI_VENDOR_ID_SOLIDRUN 0xd063 + #define PCI_VENDOR_ID_DIGIUM 0xd161 #define PCI_DEVICE_ID_DIGIUM_HFC4S 0xb410 diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 6d0f5e4e82c2..43f59ef10cc9 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -219,7 +219,10 @@ struct vdpa_map_file { * @reset: Reset device * @vdev: vdpa device * Returns integer: success (0) or error (< 0) - * @suspend: Suspend or resume the device (optional) + * @suspend: Suspend the device (optional) + * @vdev: vdpa device + * Returns integer: success (0) or error (< 0) + * @resume: Resume the device (optional) * @vdev: vdpa device * Returns integer: success (0) or error (< 0) * @get_config_size: Get the size of the configuration space includes @@ -282,6 +285,11 @@ struct vdpa_map_file { * @iova: iova to be unmapped * @size: size of the area * Returns integer: success (0) or error (< 0) + * @get_vq_dma_dev: Get the dma device for a specific + * virtqueue (optional) + * @vdev: vdpa device + * @idx: virtqueue index + * Returns pointer to structure device or error (NULL) * @free: Free resources that belongs to vDPA (optional) * @vdev: vdpa device */ @@ -324,6 +332,7 @@ struct vdpa_config_ops { void (*set_status)(struct vdpa_device *vdev, u8 status); int (*reset)(struct vdpa_device *vdev); int (*suspend)(struct vdpa_device *vdev); + int (*resume)(struct vdpa_device *vdev); size_t (*get_config_size)(struct vdpa_device *vdev); void (*get_config)(struct vdpa_device *vdev, unsigned int offset, void *buf, unsigned int len); @@ -341,6 +350,7 @@ struct vdpa_config_ops { u64 iova, u64 size); int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group, unsigned int asid); + struct device *(*get_vq_dma_dev)(struct vdpa_device *vdev, u16 idx); /* Free device resources */ void (*free)(struct vdpa_device *vdev); diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 4b517649cfe8..2b3438de2c4d 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -16,8 +16,10 @@ struct virtio_shm_region { u64 len; }; +typedef void vq_callback_t(struct virtqueue *); + /** - * virtio_config_ops - operations for configuring a virtio device + * struct virtio_config_ops - operations for configuring a virtio device * Note: Do not assume that a transport implements all of the operations * getting/setting a value as a simple read/write! Generally speaking, * any of @get/@set, @get_status/@set_status, or @get_features/ @@ -69,7 +71,8 @@ struct virtio_shm_region { * vdev: the virtio_device * This sends the driver feature bits to the device: it can change * the dev->feature bits if it wants. - * Note: despite the name this can be called any number of times. + * Note that despite the name this can be called any number of + * times. * Returns 0 on success or error status * @bus_name: return the bus name associated with the device (optional) * vdev: the virtio_device @@ -91,7 +94,6 @@ struct virtio_shm_region { * If disable_vq_and_reset is set, then enable_vq_after_reset must also be * set. */ -typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { void (*get)(struct virtio_device *vdev, unsigned offset, void *buf, unsigned len); diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index 8b8af1a38991..8b95b69ef694 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -77,6 +77,22 @@ struct virtqueue *vring_create_virtqueue(unsigned int index, const char *name); /* + * Creates a virtqueue and allocates the descriptor ring with per + * virtqueue DMA device. + */ +struct virtqueue *vring_create_virtqueue_dma(unsigned int index, + unsigned int num, + unsigned int vring_align, + struct virtio_device *vdev, + bool weak_barriers, + bool may_reduce_num, + bool ctx, + bool (*notify)(struct virtqueue *vq), + void (*callback)(struct virtqueue *vq), + const char *name, + struct device *dma_dev); + +/* * Creates a virtqueue with a standard layout but a caller-allocated * ring. */ diff --git a/include/linux/vringh.h b/include/linux/vringh.h index 212892cf9822..1991a02c6431 100644 --- a/include/linux/vringh.h +++ b/include/linux/vringh.h @@ -92,7 +92,7 @@ struct vringh_iov { }; /** - * struct vringh_iov - kvec mangler. + * struct vringh_kiov - kvec mangler. * * Mangles kvec in place, and restores it. * Remaining data is iov + i, of used - i elements. diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h index f9f115a7c75b..92e1b700b51c 100644 --- a/include/uapi/linux/vhost.h +++ b/include/uapi/linux/vhost.h @@ -180,4 +180,12 @@ */ #define VHOST_VDPA_SUSPEND _IO(VHOST_VIRTIO, 0x7D) +/* Resume a device so it can resume processing virtqueue requests + * + * After the return of this ioctl the device will have restored all the + * necessary states and it is fully operational to continue processing the + * virtqueue descriptors. + */ +#define VHOST_VDPA_RESUME _IO(VHOST_VIRTIO, 0x7E) + #endif diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h index 53601ce2c20a..c5690a8992d8 100644 --- a/include/uapi/linux/vhost_types.h +++ b/include/uapi/linux/vhost_types.h @@ -163,5 +163,7 @@ struct vhost_vdpa_iova_range { #define VHOST_BACKEND_F_IOTLB_ASID 0x3 /* Device can be suspended */ #define VHOST_BACKEND_F_SUSPEND 0x4 +/* Device can be resumed */ +#define VHOST_BACKEND_F_RESUME 0x5 #endif diff --git a/include/uapi/linux/virtio_blk.h b/include/uapi/linux/virtio_blk.h index 58e70b24b504..5af2a0300bb9 100644 --- a/include/uapi/linux/virtio_blk.h +++ b/include/uapi/linux/virtio_blk.h @@ -41,6 +41,7 @@ #define VIRTIO_BLK_F_DISCARD 13 /* DISCARD is supported */ #define VIRTIO_BLK_F_WRITE_ZEROES 14 /* WRITE ZEROES is supported */ #define VIRTIO_BLK_F_SECURE_ERASE 16 /* Secure Erase is supported */ +#define VIRTIO_BLK_F_ZONED 17 /* Zoned block device */ /* Legacy feature bits */ #ifndef VIRTIO_BLK_NO_LEGACY @@ -137,6 +138,16 @@ struct virtio_blk_config { /* Secure erase commands must be aligned to this number of sectors. */ __virtio32 secure_erase_sector_alignment; + /* Zoned block device characteristics (if VIRTIO_BLK_F_ZONED) */ + struct virtio_blk_zoned_characteristics { + __le32 zone_sectors; + __le32 max_open_zones; + __le32 max_active_zones; + __le32 max_append_sectors; + __le32 write_granularity; + __u8 model; + __u8 unused2[3]; + } zoned; } __attribute__((packed)); /* @@ -174,6 +185,27 @@ struct virtio_blk_config { /* Secure erase command */ #define VIRTIO_BLK_T_SECURE_ERASE 14 +/* Zone append command */ +#define VIRTIO_BLK_T_ZONE_APPEND 15 + +/* Report zones command */ +#define VIRTIO_BLK_T_ZONE_REPORT 16 + +/* Open zone command */ +#define VIRTIO_BLK_T_ZONE_OPEN 18 + +/* Close zone command */ +#define VIRTIO_BLK_T_ZONE_CLOSE 20 + +/* Finish zone command */ +#define VIRTIO_BLK_T_ZONE_FINISH 22 + +/* Reset zone command */ +#define VIRTIO_BLK_T_ZONE_RESET 24 + +/* Reset All zones command */ +#define VIRTIO_BLK_T_ZONE_RESET_ALL 26 + #ifndef VIRTIO_BLK_NO_LEGACY /* Barrier before this op. */ #define VIRTIO_BLK_T_BARRIER 0x80000000 @@ -193,6 +225,72 @@ struct virtio_blk_outhdr { __virtio64 sector; }; +/* + * Supported zoned device models. + */ + +/* Regular block device */ +#define VIRTIO_BLK_Z_NONE 0 +/* Host-managed zoned device */ +#define VIRTIO_BLK_Z_HM 1 +/* Host-aware zoned device */ +#define VIRTIO_BLK_Z_HA 2 + +/* + * Zone descriptor. A part of VIRTIO_BLK_T_ZONE_REPORT command reply. + */ +struct virtio_blk_zone_descriptor { + /* Zone capacity */ + __le64 z_cap; + /* The starting sector of the zone */ + __le64 z_start; + /* Zone write pointer position in sectors */ + __le64 z_wp; + /* Zone type */ + __u8 z_type; + /* Zone state */ + __u8 z_state; + __u8 reserved[38]; +}; + +struct virtio_blk_zone_report { + __le64 nr_zones; + __u8 reserved[56]; + struct virtio_blk_zone_descriptor zones[]; +}; + +/* + * Supported zone types. + */ + +/* Conventional zone */ +#define VIRTIO_BLK_ZT_CONV 1 +/* Sequential Write Required zone */ +#define VIRTIO_BLK_ZT_SWR 2 +/* Sequential Write Preferred zone */ +#define VIRTIO_BLK_ZT_SWP 3 + +/* + * Zone states that are available for zones of all types. + */ + +/* Not a write pointer (conventional zones only) */ +#define VIRTIO_BLK_ZS_NOT_WP 0 +/* Empty */ +#define VIRTIO_BLK_ZS_EMPTY 1 +/* Implicitly Open */ +#define VIRTIO_BLK_ZS_IOPEN 2 +/* Explicitly Open */ +#define VIRTIO_BLK_ZS_EOPEN 3 +/* Closed */ +#define VIRTIO_BLK_ZS_CLOSED 4 +/* Read-Only */ +#define VIRTIO_BLK_ZS_RDONLY 13 +/* Full */ +#define VIRTIO_BLK_ZS_FULL 14 +/* Offline */ +#define VIRTIO_BLK_ZS_OFFLINE 15 + /* Unmap this range (only valid for write zeroes command) */ #define VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP 0x00000001 @@ -219,4 +317,11 @@ struct virtio_scsi_inhdr { #define VIRTIO_BLK_S_OK 0 #define VIRTIO_BLK_S_IOERR 1 #define VIRTIO_BLK_S_UNSUPP 2 + +/* Error codes that are specific to zoned block devices */ +#define VIRTIO_BLK_S_ZONE_INVALID_CMD 3 +#define VIRTIO_BLK_S_ZONE_UNALIGNED_WP 4 +#define VIRTIO_BLK_S_ZONE_OPEN_RESOURCE 5 +#define VIRTIO_BLK_S_ZONE_ACTIVE_RESOURCE 6 + #endif /* _LINUX_VIRTIO_BLK_H */ diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile index 1b25cc7c64bb..7b7139d97d74 100644 --- a/tools/virtio/Makefile +++ b/tools/virtio/Makefile @@ -4,7 +4,7 @@ test: virtio_test vringh_test virtio_test: virtio_ring.o virtio_test.o vringh_test: vringh_test.o vringh.o virtio_ring.o -CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h +CFLAGS += -g -O2 -Werror -Wno-maybe-uninitialized -Wall -I. -I../include/ -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -include ../../include/linux/kconfig.h -mfunction-return=thunk -fcf-protection=none -mindirect-branch-register CFLAGS += -pthread LDFLAGS += -pthread vpath %.c ../../drivers/virtio ../../drivers/vhost |