summaryrefslogtreecommitdiff
path: root/drivers/hv
AgeCommit message (Collapse)Author
2015-05-24Drivers: hv: vss: convert to hv_utils_transportVitaly Kuznetsov
Convert to hv_utils_transport to support both netlink and /dev/vmbus/hv_vss communication methods. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: util: introduce hv_utils_transport abstractionVitaly Kuznetsov
The intention is to make KVP/VSS drivers work through misc char devices. Introduce an abstraction for kernel/userspace communication to make the migration smoother. Transport operational mode (netlink or char device) is determined by the first received message. To support driver upgrades the switch from netlink to chardev operational mode is supported. Every hv_util daemon is supposed to register 2 callbacks: 1) on_msg() to get notified when the userspace daemon sent a message; 2) on_reset() to get notified when the userspace daemon drops the connection. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: fcopy: set .owner reference for file operationsVitaly Kuznetsov
Get an additional reference otherwise a crash is observed when hv_utils module is being unloaded while fcopy daemon is still running. .owner gives us an additional reference when someone holds a descriptor for the device. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: fcopy: switch to using the hvutil_device_state state machineVitaly Kuznetsov
Switch to using the hvutil_device_state state machine from using 3 different state variables: fcopy_transaction.active, opened, and in_hand_shake. State transitions are: -> HVUTIL_DEVICE_INIT when driver loads or on device release -> HVUTIL_READY if the handshake was successful -> HVUTIL_HOSTMSG_RECEIVED when there is a non-negotiation message from the host -> HVUTIL_USERSPACE_REQ after userspace daemon read the message -> HVUTIL_USERSPACE_RECV after/if userspace has replied -> HVUTIL_READY after we respond to the host -> HVUTIL_DEVICE_DYING on driver unload In hv_fcopy_onchannelcallback() process ICMSGTYPE_NEGOTIATE messages even when the userspace daemon is disconnected, otherwise we can make the host think we don't support FCOPY and disable the service completely. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: vss: switch to using the hvutil_device_state state machineVitaly Kuznetsov
Switch to using the hvutil_device_state state machine from using kvp_transaction.active. State transitions are: -> HVUTIL_DEVICE_INIT when driver loads or on device release -> HVUTIL_READY if the handshake was successful -> HVUTIL_HOSTMSG_RECEIVED when there is a non-negotiation message from the host -> HVUTIL_USERSPACE_REQ after we sent the message to the userspace daemon -> HVUTIL_USERSPACE_RECV after/if the userspace daemon has replied -> HVUTIL_READY after we respond to the host -> HVUTIL_DEVICE_DYING on driver unload In hv_vss_onchannelcallback() process ICMSGTYPE_NEGOTIATE messages even when the userspace daemon is disconnected, otherwise we can make the host think we don't support VSS and disable the service completely. Unfortunately there is no good way we can figure out that the userspace daemon has died (unless we start treating all timeouts as such), add a protection against processing new VSS_OP_REGISTER messages while being in the middle of a transaction (HVUTIL_USERSPACE_REQ or HVUTIL_USERSPACE_RECV state). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: kvp: switch to using the hvutil_device_state state machineVitaly Kuznetsov
Switch to using the hvutil_device_state state machine from using 2 different state variables: kvp_transaction.active and in_hand_shake. State transitions are: -> HVUTIL_DEVICE_INIT when driver loads or on device release -> HVUTIL_READY if the handshake was successful -> HVUTIL_HOSTMSG_RECEIVED when there is a non-negotiation message from the host -> HVUTIL_USERSPACE_REQ after we sent the message to the userspace daemon -> HVUTIL_USERSPACE_RECV after/if the userspace daemon has replied -> HVUTIL_READY after we respond to the host -> HVUTIL_DEVICE_DYING on driver unload In hv_kvp_onchannelcallback() process ICMSGTYPE_NEGOTIATE messages even when the userspace daemon is disconnected, otherwise we can make the host think we don't support KVP and disable the service completely. Unfortunately there is no good way we can figure out that the userspace daemon has died (unless we start treating all timeouts as such). In case the daemon restarts we skip the negotiation procedure (so the daemon is supposed to has the same version). This behavior is unchanged from in_handshake approach. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: util: introduce state machine for util driversVitaly Kuznetsov
KVP/VSS/FCOPY drivers work in fully serialized mode: we wait till userspace daemon registers, wait for a message from the host, send this message to the daemon, get the reply, send it back to host, wait for another message. Introduce enum hvutil_device_state to represend this state in all 3 drivers. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: fcopy: rename fcopy_work -> fcopy_timeout_workVitaly Kuznetsov
'fcopy_work' (and fcopy_work_func) is a misnomer as it sounds like we expect this useful work to happen and in reality it is just an emergency escape when timeout happens. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: kvp: rename kvp_work -> kvp_timeout_workVitaly Kuznetsov
'kvp_work' (and kvp_work_func) is a misnomer as it sounds like we expect this useful work to happen and in reality it is just an emergency escape when timeout happens. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: vss: process deferred messages when we complete the transactionVitaly Kuznetsov
In theory, the host is not supposed to issue any requests before be reply to the previous one. In KVP we, however, support the following scenarios: 1) A message was received before userspace daemon registered; 2) A message was received while the previous one is still being processed. In VSS we support only the former. Add support for the later, use hv_poll_channel() to do the job. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: fcopy: process deferred messages when we complete the transactionVitaly Kuznetsov
In theory, the host is not supposed to issue any requests before be reply to the previous one. In KVP we, however, support the following scenarios: 1) A message was received before userspace daemon registered; 2) A message was received while the previous one is still being processed. In FCOPY we support only the former. Add support for the later, use hv_poll_channel() to do the job. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: kvp: move poll_channel() to hyperv_vmbus.hVitaly Kuznetsov
Move poll_channel() to hyperv_vmbus.h and make it inline and rename it to hv_poll_channel() so it can be reused in other hv_util modules. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: kvp: reset kvp_contextVitaly Kuznetsov
We set kvp_context when we want to postpone receiving a packet from vmbus due to the previous transaction being unfinished. We, however, never reset this state, all consequent kvp_respond_to_host() calls will result in poll_channel() calling hv_kvp_onchannelcallback(). This doesn't cause real issues as: 1) Host is supposed to serialize transactions as well 2) If no message is pending vmbus_recvpacket() will return 0 recvlen. This is just a cleanup. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-24Drivers: hv: util: move kvp/vss function declarations to hyperv_vmbus.hVitaly Kuznetsov
These declarations are internal to hv_util module and hv_fcopy_* declarations already reside there. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX caseVitaly Kuznetsov
balloon_wrk.num_pages is __u32 and it comes from host in struct dm_balloon where it is also __u32. We, however, use 'int' in balloon_up() and in case we happen to receive num_pages>INT_MAX request we'll end up allocating zero pages as 'num_pages < alloc_unit' check in alloc_balloon_pages() will pass. Change num_pages type to unsigned int. In real life ballooning request come with num_pages in [512, 32768] range so this is more a future-proof/cleanup. Reported-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages caseVitaly Kuznetsov
'Drivers: hv: hv_balloon: refuse to balloon below the floor' fix does not correctly handle the case when val.freeram < num_pages as val.freeram is __kernel_ulong_t and the 'val.freeram - num_pages' value will be a huge positive value instead of being negative. Usually host doesn't ask us to balloon more than val.freeram but in case he have a memory hog started after we post the last pressure report we can get into troubles. Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03hv_vmbus: Add gradually increased delay for retries in vmbus_post_msg()Haiyang Zhang
Most of the retries can be done within a millisecond successfully, so we sleep 1ms before the first retry, then gradually increase the retry interval to 2^n with max value of 2048ms. Doing so, we will have shorter overall delay time, because most of the cases succeed within 1-2 attempts. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03Drivers: hv: hv_balloon: survive ballooning request with num_pages=0Vitaly Kuznetsov
... and simplify alloc_balloon_pages() interface by removing redundant alloc_error from it. If we happen to enter balloon_up() with balloon_wrk.num_pages = 0 we will enter infinite 'while (!done)' loop as alloc_balloon_pages() will be always returning 0 and not setting alloc_error. We will also be sending a meaningless message to the host on every iteration. The 'alloc_unit == 1 && alloc_error -> num_ballooned == 0' change and alloc_error elimination requires a special comment. We do alloc_balloon_pages() with 2 different alloc_unit values and there are 4 different alloc_balloon_pages() results, let's check them all. alloc_unit = 512: 1) num_ballooned = 0, alloc_error = 0: we do 'alloc_unit=1' and retry pre- and post-patch. 2) num_ballooned > 0, alloc_error = 0: we check 'num_ballooned == num_pages' and act accordingly, pre- and post-patch. 3) num_ballooned > 0, alloc_error > 0: we report this chunk and remain within the loop, no changes here. 4) num_ballooned = 0, alloc_error > 0: we do 'alloc_unit=1' and retry pre- and post-patch. alloc_unit = 1: 1) num_ballooned = 0, alloc_error = 0: this can happen in two cases: when we passed 'num_pages=0' to alloc_balloon_pages() or when there was no space in bl_resp to place a single response. The second option is not possible as bl_resp is of PAGE_SIZE size and single response 'union dm_mem_page_range' is 8 bytes, but the first one is (in theory, I think that Hyper-V host never places such requests). Pre-patch code loops forever, post-patch code sends a reply with more_pages = 0 and finishes. 2) num_ballooned > 0, alloc_error = 0: we ran out of space in bl_resp, we report partial success and remain within the loop, no changes pre- and post-patch. 3) num_ballooned > 0, alloc_error > 0: pre-patch code finishes, post-patch code does one more try and if there is no progress (we finish with 'num_ballooned = 0') we finish. So we try a bit harder with this patch. 4) num_ballooned = 0, alloc_error > 0: both pre- and post-patch code enter 'more_pages = 0' branch and finish. So this patch has two real effects: 1) We reply with an empty response to 'num_pages=0' request. 2) We try a bit harder on alloc_unit=1 allocations (and reply with an empty tail reply in case we fail). An empty reply should be supported by host as we were able to send it even with pre-patch code when we were not able to allocate a single page. Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor functionVitaly Kuznetsov
Commit 79208c57da53 ("Drivers: hv: hv_balloon: Make adjustments in computing the floor") was inacurate as it introduced a jump in our piecewiese linear 'floor' function: At 2048MB we have: Left limit: 104 + 2048/8 = 360 Right limit: 256 + 2048/16 = 384 (so the right value is 232) We now have to make an adjustment at 8192 boundary: 232 + 8192/16 = 744 512 + 8192/32 = 768 (so the right value is 488) Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03Drivers: hv: hv_balloon: do not online pages in offline blocksVitaly Kuznetsov
Currently we add memory in 128Mb blocks but the request from host can be aligned differently. In such case we add a partially backed block and when this block goes online we skip onlining pages which are not backed (hv_online_page() callback serves this purpose). When we receive next request for the same host add region we online pages which were not backed before with hv_bring_pgs_online(). However, we don't check if the the block in question was onlined and online this tail unconditionally. This is bad as we avoid all online_pages() logic: these pages are not accounted, we don't send notifications (and hv_balloon is not the only receiver of them),... And, first of all, nobody asked as to online these pages. Solve the issue by checking if the last previously backed page was onlined and onlining the tail only in case it was. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03hv: remove the per-channel workqueueDexuan Cui
It's not necessary any longer, since we can safely run the blocking message handlers in vmbus_connection.work_queue now. Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03hv: don't schedule new works in vmbus_onoffer()/vmbus_onoffer_rescind()Dexuan Cui
Since the 2 fucntions can safely run in vmbus_connection.work_queue without hang, we don't need to schedule new work items into the per-channel workqueue. Actally we can even remove the per-channel workqueue now -- we'll do it in the next patch. Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-04-03hv: run non-blocking message handlers in the dispatch taskletDexuan Cui
A work item in vmbus_connection.work_queue can sleep, waiting for a new host message (usually it is some kind of "completion" message). Currently the new message will be handled in the same workqueue, but since work items in the workqueue is serialized, we actually have no chance to handle the new message if the current work item is sleeping -- as as result, the current work item will hang forever. K. Y. has posted the below fix to resolve the issue: Drivers: hv: vmbus: Perform device register in the per-channel work element Actually we can simplify the fix by directly running non-blocking message handlers in the dispatch tasklet (inspired by K. Y.). This patch is the fundamental change. The following 2 patches will simplify the message offering and rescind-offering handling a lot. Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-26Drivers: hv: vmbus: Don't wait after requesting offersK. Y. Srinivasan
Don't wait after sending request for offers to the host. This wait is unnecessary and simply adds 5 seconds to the boot time. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25Drivers: hv: vmbus: Fix a siganlling host signalling issueK. Y. Srinivasan
Handle the case when the write to the ringbuffer fails. In this case, unconditionally signal the host. Since we may have deferred signalling the host based on the kick_q parameter, signalling the host unconditionally in this case deals with the issue. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25Drivers: hv: vmbus: Export the vmbus_sendpacket_pagebuffer_ctl()K. Y. Srinivasan
Export the vmbus_sendpacket_pagebuffer_ctl() interface. This export will be used by the netvsc driver. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25Drivers: hv: vmbus: Fix a bug in rescind processing in vmbus_close_internal()K. Y. Srinivasan
When a channel has been rescinded, the close operation is a noop. Restructure the code so we deal with the rescind condition after we properly cleanup the channel. I would like to thank Dexuan Cui <decui@microsoft.com> for observing this problem. The current code leaks memory when the channel is rescinded. The current char-next branch is broken and this patch fixes the bug. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25hv: vmbus: missing curly braces in vmbus_process_offer()Dan Carpenter
The indenting makes it clear that there were curly braces intended here. Fixes: 2dd37cb81580 ('Drivers: hv: vmbus: Handle both rescind and offer messages in the same context') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25Drivers: hv: vmbus: Correcting truncation error for constant ↵Nick Meier
HV_CRASH_CTL_CRASH_NOTIFY HV_CRASH_CTL_CRASH_NOTIFY is a 64 bit number. Depending on the usage context, the value may be truncated. This patch is in response from the following email from Wu Fengguang <fengguang.wu@intel.com>: From: Wu Fengguang <fengguang.wu@intel.com> Subject: [char-misc:char-misc-testing 25/45] drivers/hv/vmbus_drv.c:67:9: sparse: constant 0x8000000000000000 is so big it is unsigned long tree: git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git char-misc-testing head: b3de8e3719e582f3182bb504295e4a8e43c8c96f commit: 96c1d0581d00f7abe033350edb021a9d947d8d81 [25/45] Drivers: hv: vmbus: Add support for VMBus panic notifier handler reproduce: # apt-get install sparse git checkout 96c1d0581d00f7abe033350edb021a9d947d8d81 make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) drivers/hv/vmbus_drv.c:67:9: sparse: constant 0x8000000000000000 is so big it is unsigned long ... Signed-off-by: Nick Meier <nmeier@microsoft.com> Reported-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25Drivers: hv: hv_balloon: don't lose memory when onlining order is not naturalVitaly Kuznetsov
Memory blocks can be onlined in random order. When this order is not natural some memory pages are not onlined because of the redundant check in hv_online_page(). Here is a real world scenario: 1) Host tries to hot-add the following (process_hot_add): pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144 2) This results in adding 4 memory blocks: [ 109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff] [ 114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff] [ 119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff] [ 124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff] The last one is incomplete but we have special has->covered_end_pfn counter to avoid onlining non-backed frames and hv_bring_pgs_online() function to bring them online later on. 3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12 $ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin /sys/devices/system/memory/memory10/state offline /sys/devices/system/memory/memory11/state offline /sys/devices/system/memory/memory12/state offline /sys/devices/system/memory/memory9/state offline 4) We bring them online in non-natural order: $grep MemTotal /proc/meminfo MemTotal: 966348 kB $echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo MemTotal: 1019596 kB $echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo MemTotal: 1150668 kB $echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo MemTotal: 1150668 kB As you can see memory9 block gives us zero additional memory. We can also observe a huge discrepancy between host- and guest-reported memory sizes. The root cause of the issue is the redundant pg >= covered_start_pfn check (and covered_start_pfn advancing) in hv_online_page(). When upper memory block in being onlined before the lower one (memory12 and memory11 in the above case) we advance the covered_start_pfn pointer and all memory9 pages do not pass the check. If the assumption that host always gives us requests in sequential order and pg_start always equals rg_start when the first request for the new HA region is received (that's the case in my testing) is correct than we can get rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is sufficient. The current char-next branch is broken and this patch fixes the bug. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25Drivers: hv: hv_balloon: keep locks balanced on add_memory() failureVitaly Kuznetsov
When add_memory() fails the following BUG is observed: [ 743.646107] hv_balloon: hot_add memory failed error is -17 [ 743.679973] [ 743.680930] ===================================== [ 743.680930] [ BUG: bad unlock balance detected! ] [ 743.680930] 3.19.0-rc5_bug1131426+ #552 Not tainted [ 743.680930] ------------------------------------- [ 743.680930] kworker/0:2/255 is trying to release lock (&dm_device.ha_region_mutex) at: [ 743.680930] [<ffffffff81aae5fe>] mutex_unlock+0xe/0x10 [ 743.680930] but there are no more locks to release! This happens as we don't acquire ha_region_mutex and hot_add_req() expects us to as it does unconditional mutex_unlock(). Acquire the lock on the error path. The current char-next branch is broken and this patch fixes the bug. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25Drivers: hv: vmbus: Perform device register in the per-channel work elementK. Y. Srinivasan
This patch is a continuation of the rescind handling cleanup work. We cannot block in the global message handling work context especially if we are blocking waiting for the host to wake us up. I would like to thank Dexuan Cui <decui@microsoft.com> for observing this problem. The current char-next branch is broken and this patch fixes the bug. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01mei: bus: () can be statickbuild test robot
drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static? drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static? drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static? drivers/hv/vmbus_drv.c:51:5: sparse: symbol 'hyperv_panic_event' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Suport an API to send packet with additional controlK. Y. Srinivasan
Implement an API that gives additional control on the what VMBUS flags will be set as well as if the host needs to be signalled. This API will be useful for clients that want to batch up requests to the host. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Suport an API to send pagebuffers with additional controlK. Y. Srinivasan
Implement an API for sending pagebuffers that gives more control to the client in terms of setting the vmbus flags as well as deciding when to notify the host. This will be useful for enabling batch processing. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Use a round-robin algorithm for picking the outgoing channelK. Y. Srinivasan
The current algorithm for picking an outgoing channel was not distributing the load well. Implement a simple round-robin scheme to ensure good distribution of the outgoing traffic. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Add support for VMBus panic notifier handlerNick Meier
Hyper-V allows a guest to notify the Hyper-V host that a panic condition occured. This notification can include up to five 64 bit values. These 64 bit values are written into crash MSRs. Once the data has been written into the crash MSRs, the host is then notified by writing into a Crash Control MSR. On the Hyper-V host, the panic notification data is captured in the Windows Event log as a 18590 event. Crash MSRs are defined in appendix H of the Hypervisor Top Level Functional Specification. At the time of this patch, v4.0 is the current functional spec. The URL for the v4.0 document is: http://download.microsoft.com/download/A/B/4/AB43A34E-BDD0-4FA6-BDEF-79EEF16E880B/Hypervisor Top Level Functional Specification v4.0.docx Signed-off-by: Nick Meier <nmeier@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: hv_balloon: refuse to balloon below the floorVitaly Kuznetsov
When host asks us to balloon up we need to be sure we're not committing suicide by overballooning. Use already existent 'floor' metric as our lowest possible value for free ram. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: hv_balloon: report offline pages as being usedVitaly Kuznetsov
When hot-added memory pages are not brought online or when some memory blocks are sent offline the subsequent ballooning process kills the guest with OOM killer. This happens as we don't report these pages as neither used nor free and apparently host algorithm considers them as being unused. Keep track of all online/offline operations and report all currently offline pages as being used so host won't try to balloon them out. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: hv_balloon: eliminate the trylock path in ↵Vitaly Kuznetsov
acquire/release_region_mutex When many memory regions are being added and automatically onlined the following lockup is sometimes observed: INFO: task udevd:1872 blocked for more than 120 seconds. ... Call Trace: [<ffffffff816ec0bc>] schedule_timeout+0x22c/0x350 [<ffffffff816eb98f>] wait_for_common+0x10f/0x160 [<ffffffff81067650>] ? default_wake_function+0x0/0x20 [<ffffffff816eb9fd>] wait_for_completion+0x1d/0x20 [<ffffffff8144cb9c>] hv_memory_notifier+0xdc/0x120 [<ffffffff816f298c>] notifier_call_chain+0x4c/0x70 ... When several memory blocks are going online simultaneously we got several hv_memory_notifier() trying to acquire the ha_region_mutex. When this mutex is being held by hot_add_req() all these competing acquire_region_mutex() do mutex_trylock, fail, and queue themselves into wait_for_completion(..). However when we do complete() from release_region_mutex() only one of them wakes up. This could be solved by changing complete() -> complete_all() memory onlining can be delayed as well, in that case we can still get several hv_memory_notifier() runners at the same time trying to grab the mutex. Only one of them will succeed and the others will hang for forever as complete() is not being called. We don't see this issue often because we have 5sec onlining timeout in hv_mem_hot_add() and usually all udev events arrive in this time frame. Get rid of the trylock path, waiting on the mutex is supposed to provide the required serialization. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Get rid of some unnecessary messagesK. Y. Srinivasan
Currently we log messages when either we are not able to map an ID to a channel or when the channel does not have a callback associated (in the channel interrupt handling path). These messages don't add any value, get rid of them. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: util: On device remove, close the channel after de-initializing ↵K. Y. Srinivasan
the service When the offer is rescinded, vmbus_close() can free up the channel; deinitialize the service before closing the channel. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Remove the channel from the channel list(s) on failureK. Y. Srinivasan
Properly rollback state in vmbus_pocess_offer() in the failure paths. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Handle both rescind and offer messages in the same contextK. Y. Srinivasan
Execute both ressind and offer messages in the same work context. This serializes these operations and naturally addresses the various corner cases. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Introduce a function to remove a rescinded offerK. Y. Srinivasan
In response to a rescind message, we need to remove the channel and the corresponding device. Cleanup this code path by factoring out the code to remove a channel. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Properly handle child device removeK. Y. Srinivasan
Handle the case when the device may be removed when the device has no driver attached to it. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Add support for the NetworkDirect GUIDK. Y. Srinivasan
NetworkDirect is a service that supports guest RDMA. Define the GUID for this service. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01Drivers: hv: vmbus: Fix a bug in the error path in vmbus_open()K. Y. Srinivasan
Correctly rollback state if the failure occurs after we have handed over the ownership of the buffer to the host. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01hv: hv_balloon: match var type to return type of wait_for_completionNicholas Mc Guire
return type of wait_for_completion_timeout is unsigned long not int, this patch changes the type of t from int to unsigned long. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-01hv: channel_mgmt: match var type to return type of wait_for_completionNicholas Mc Guire
return type of wait_for_completion_timeout is unsigned long not int, this patch changes the type of t from int to unsigned long. Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>