summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2015-05-31ebpf: allow bpf_ktime_get_ns_proto also for networkingDaniel Borkmann
As this is already exported from tracing side via commit d9847d310ab4 ("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might as well want to move it to the core, so also networking users can make use of it, e.g. to measure diffs for certain flows from ingress/egress. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31bpf: add missing rcu protection when releasing programs from prog_arrayAlexei Starovoitov
Normally the program attachment place (like sockets, qdiscs) takes care of rcu protection and calls bpf_prog_put() after a grace period. The programs stored inside prog_array may not be attached anywhere, so prog_array needs to take care of preserving rcu protection. Otherwise bpf_tail_call() will race with bpf_prog_put(). To solve that introduce bpf_prog_put_rcu() helper function and use it in 3 places where unattached program can decrement refcnt: closing program fd, deleting/replacing program in prog_array. Fixes: 04fd61ab36ec ("bpf: allow bpf programs to tail-call other bpf programs") Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx4: Add EQ poolMatan Barak
Previously, mlx4_en allocated EQs and used them exclusively. This affected RoCE performance, as applications which are events sensitive were limited to use only the legacy EQs. Change that by introducing an EQ pool. This pool is managed by mlx4_core. EQs are assigned to ports (when there are limited number of EQs, multiple ports could be assigned to the same EQs). An exception to this rule is the ASYNC EQ which handles various events. Legacy EQs are completely removed as all EQs could be shared. When a consumer (mlx4_ib/mlx4_en) requests an EQ, it asks for EQ serving on a specific port. The core driver calculates which EQ should be assigned to that request. Because IRQs are shared between IB and Ethernet modules, their names only include the PCI device BDF address. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionalityAmir Vadai
This is the Ethernet part of the driver for the Mellanox ConnectX(R)-4 Single/Dual-Port Adapter supporting 100Gb/s with VPI. The driver extends the existing mlx5 driver with Ethernet functionality. This patch contains the driver entry points but does not include transmit and receive (see the previous patch in the series) routines. It also adds the option MLX5_CORE_EN to Kconfig to enable/disable the Ethernet functionality. Currently, Kconfig is programmed to make Ethernet and Infiniband functionality mutally exclusive. Also changed MLX5_INFINIBAND to be depandant on MLX5_CORE instead of selecting it, since MLX5_CORE could be selected without MLX5_INFINIBAND being selected. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5: Ethernet resource handling filesAmir Vadai
This patch contains the resource handling files: - flow_table.c: This file contains the code to handle the low level API to configure hardware flow table. It is separated from the flow_table_en.c, because it will be used in the future by Raw Ethernet QP in mlx5_ib too. - en_flow_table.[ch]: Ethernet flow steering handling. The flow table object contain a mapping between flow specs and TIRs. This mechanism will be used also to configure e-switch in the future, when SR-IOV support will be added. - transobj.[ch] - Low level functions to create/modify/destroy the transport objects: RQ/SQ/TIR/TIS - vport.[ch] - Handle attributes of a virtual port (vPort) in the embedded switch. Currently this switch is a passthrough, until SR-IOV support will be added. Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5_core: Set/Query port MTU commandsSaeed Mahameed
Introduce set/Query low level functions to access MTU in hardware. To be used by the netdev. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5_core: Modify CQ moderation parametersRana Shahout
Introduce mlx5_core_modify_cq_moderation() to be used by the netdev, to set hardware coalescing. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5_core: Implement get/set port statusRana Shahout
Implemet get/set port status low level functions to be exposed by the netdev. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5_core: Implement access functions of ptys register fieldsSaeed Mahameed
Those registers will be used by the ethtool to set/get settings. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5_core: New device capabilities handlingSaeed Mahameed
- Query all supported types of dev caps on driver load. - Store the Cap data outbox per cap type into driver private data. - Introduce new Macros to access/dump stored caps (using the auto generated data types). - Obsolete SW representation of dev caps (no need for SW copy for each cap). - Modify IB driver to use new macros for checking caps. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5_core: HW data structs/types definitions cleanupSaeed Mahameed
mlx5_ifc.h was heavily modified here since it is now generated by a script from the device specification (PRM rev 0.25). This specification is backward compatible to existing hardware. Some structures/fields were added here in order to enable the Ethernet functionality of the driver. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5_core: Set irq affinity hintsSaeed Mahameed
Preparation for upcoming ethernet driver. - Move msix array from eq_table struct to priv since its not related to eq_table - Intorduce irq_info struct to hold all irq information - Move name from mlx5_eq to irq_info struct since it is irq property. - Set IRQ affinity hints Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30net/mlx5_core,mlx5_ib: Do not use vmap() on coherent memoryAmir Vadai
As David Daney pointed in mlx4_core driver [1], mlx5_core is also misusing the DMA-API. This patch is removing the code that vmap() memory allocated by dma_alloc_coherent(). After this patch, users of this drivers might fail allocating resources on memory fragmeneted systems. This will be fixed later on. [1] - https://patchwork.ozlabs.org/patch/458531/ CC: David Daney <david.daney@cavium.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30if_vlan: fix vlaue -> value typoVivien Didelot
Fixes "vlaue" for "value" in include/linux/if_vlan.h. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-30stmmac: add phy-handle support to the platform layerMathieu Olivari
On stmmac driver, PHY specification in device-tree was done using the non-standard property "snps,phy-addr". Specifying a PHY on a different MDIO bus that the one within the stmmac controller doesn't seem to be possible when device-tree is used. This change adds support for the phy-handle property, as specified in Documentation/devicetree/bindings/net/ethernet.txt. Signed-off-by: Mathieu Olivari <mathieu@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-29Merge tag 'xfs-for-linus-4.1-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "This is a little larger than I'd like late in the release cycle, but all the fixes are for regressions introduced in the 4.1-rc1 merge, or are needed back in -stable kernels fairly quickly as they are filesystem corruption or userspace visible correctness issues. Changes in this update: - regression fix for new rename whiteout code - regression fixes for new superblock generic per-cpu counter code - fix for incorrect error return sign introduced in 3.17 - metadata corruption fixes that need to go back to -stable kernels" * tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: fix broken i_nlink accounting for whiteout tmpfile inode xfs: xfs_iozero can return positive errno xfs: xfs_attr_inactive leaves inconsistent attr fork state behind xfs: extent size hints can round up extents past MAXEXTLEN xfs: inode and free block counters need to use __percpu_counter_compare percpu_counter: batch size aware __percpu_counter_compare() xfs: use percpu_counter_read_positive for mp->m_icount
2015-05-29Merge tag 'fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull fixes for cpumask and modules from Rusty Russell: "** NOW WITH TESTING! ** Two fixes which got lost in my recent distraction. One is a weird cpumask function which needed to be rewritten, the other is a module bug which is cc:stable" * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: cpumask_set_cpu_local_first => cpumask_local_spread, lament module: Call module notifier on failure after complete_formation()
2015-05-29percpu_counter: batch size aware __percpu_counter_compare()Dave Chinner
XFS uses non-stanard batch sizes for avoiding frequent global counter updates on it's allocated inode counters, as they increment or decrement in batches of 64 inodes. Hence the standard percpu counter batch of 32 means that the counter is effectively a global counter. Currently Xfs uses a batch size of 128 so that it doesn't take the global lock on every single modification. However, Xfs also needs to compare accurately against zero, which means we need to use percpu_counter_compare(), and that has a hard-coded batch size of 32, and hence will spuriously fail to detect when it is supposed to use precise comparisons and hence the accounting goes wrong. Add __percpu_counter_compare() to take a custom batch size so we can use it sanely in XFS and factor percpu_counter_compare() to use it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-28block: discard bdi_unregister() in favour of bdi_destroy()NeilBrown
bdi_unregister() now contains very little functionality. It contains a "WARN_ON" if bdi->dev is NULL. This warning is of no real consequence as bdi->dev isn't needed by anything else in the function, and it triggers if blk_cleanup_queue() -> bdi_destroy() is called before bdi_unregister, which happens since Commit: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.") So this isn't wanted. It also calls bdi_set_min_ratio(). This needs to be called after writes through the bdi have all been flushed, and before the bdi is destroyed. Calling it early is better than calling it late as it frees up a global resource. Calling it immediately after bdi_wb_shutdown() in bdi_destroy() perfectly fits these requirements. So bdi_unregister() can be discarded with the important content moved to bdi_destroy(), as can the writeback_bdi_unregister event which is already not used. Reported-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org (v4.0) Fixes: c4db59d31e39 ("fs: don't reassign dirty inodes to default_backing_dev_info") Fixes: 6cd18e711dd8 ("block: destroy bdi before blockdev is unregistered.") Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2015-05-28cpumask_set_cpu_local_first => cpumask_local_spread, lamentRusty Russell
da91309e0a7e (cpumask: Utility function to set n'th cpu...) created a genuinely weird function. I never saw it before, it went through DaveM. (He only does this to make us other maintainers feel better about our own mistakes.) cpumask_set_cpu_local_first's purpose is say "I need to spread things across N online cpus, choose the ones on this numa node first"; you call it in a loop. It can fail. One of the two callers ignores this, the other aborts and fails the device open. It can fail in two ways: allocating the off-stack cpumask, or through a convoluted codepath which AFAICT can only occur if cpu_online_mask changes. Which shouldn't happen, because if cpu_online_mask can change while you call this, it could return a now-offline cpu anyway. It contains a nonsensical test "!cpumask_of_node(numa_node)". This was drawn to my attention by Geert, who said this causes a warning on Sparc. It sets a single bit in a cpumask instead of returning a cpu number, because that's what the callers want. It could be made more efficient by passing the previous cpu rather than an index, but that would be more invasive to the callers. Fixes: da91309e0a7e8966d916a74cce42ed170fde06bf Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (then rebased) Tested-by: Amir Vadai <amirv@mellanox.com> Acked-by: Amir Vadai <amirv@mellanox.com> Acked-by: David S. Miller <davem@davemloft.net>
2015-05-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Don't use MMIO on certain iwlwifi devices otherwise we get a firmware crash. 2) Don't corrupt the GRO lists of mac80211 contexts by doing sends via timer interrupt, from Johannes Berg. 3) SKB tailroom is miscalculated in AP_VLAN crypto code, from Michal Kazior. 4) Fix fw_status memory leak in iwlwifi, from Haim Dreyfuss. 5) Fix use after free in iwl_mvm_d0i3_enable_tx(), from Eliad Peller. 6) JIT'ing of large BPF programs is broken on x86, from Alexei Starovoitov. 7) EMAC driver ethtool register dump size is miscalculated, from Ivan Mikhaylov. 8) Fix PHY initial link mode when autonegotiation is disabled in amd-xgbe, from Tom Lendacky. 9) Fix NULL deref on SOCK_DEAD socket in AF_UNIX and CAIF protocols, from Mark Salyzyn. 10) credit_bytes not initialized properly in xen-netback, from Ross Lagerwall. 11) Fallback from MSI-X to INTx interrupts not handled properly in mlx4 driver, fix from Benjamin Poirier. 12) Perform ->attach() after binding dev->qdisc in packet scheduler, otherwise we can crash. From Cong WANG. 13) Don't clobber data in sctp_v4_map_v6(). From Jason Gunthorpe. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) sctp: Fix mangled IPv4 addresses on a IPv6 listening socket net_sched: invoke ->attach() after setting dev->qdisc xen-netfront: properly destroy queues when removing device mlx4_core: Fix fallback from MSI-X to INTx xen/netback: Properly initialize credit_bytes net: netxen: correct sysfs bin attribute return code tools: bpf_jit_disasm: fix segfault on disabled debugging log output unix/caif: sk_socket can disappear when state is unlocked amd-xgbe-phy: Fix initial mode when autoneg is disabled net: dp83640: fix improper double spin locking. net: dp83640: reinforce locking rules. net: dp83640: fix broken calibration routine. net: stmmac: create one debugfs dir per net-device net/ibm/emac: fix size of emac dump memory areas x86: bpf_jit: fix compilation of large bpf programs net: phy: bcm7xxx: Fix 7425 PHY ID and flags iwlwifi: mvm: avoid use-after-free on iwl_mvm_d0i3_enable_tx() iwlwifi: mvm: clean net-detect info if device was reset during suspend iwlwifi: mvm: take the UCODE_DOWN reference when resuming iwlwifi: mvm: BT Coex - duplicate the command if sent ASYNC ...
2015-05-27pci: Add Cavium PCI vendor idSunil Goutham
This vendor id will be used by network (vNIC), USB (xHCI), SATA (AHCI), GPIO, I2C, MMC and maybe other drivers for ThunderX SoC. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-27perf/x86: Fix event/group validationPeter Zijlstra
Commit 43b4578071c0 ("perf/x86: Reduce stack usage of x86_schedule_events()") violated the rule that 'fake' scheduling; as used for event/group validation; should not change the event state. This went mostly un-noticed because repeated calls of x86_pmu::get_event_constraints() would give the same result. And x86_pmu::put_event_constraints() would mostly not do anything. Commit e979121b1b15 ("perf/x86/intel: Implement cross-HT corruption bug workaround") made the situation much worse by actually setting the event->hw.constraint value to NULL, so when validation and actual scheduling interact we get NULL ptr derefs. Fix it by removing the constraint pointer from the event and move it back to an array, this time in cpuc instead of on the stack. validate_group() x86_schedule_events() event->hw.constraint = c; # store <context switch> perf_task_event_sched_in() ... x86_schedule_events(); event->hw.constraint = c2; # store ... put_event_constraints(event); # assume failure to schedule intel_put_event_constraints() event->hw.constraint = NULL; <context switch end> c = event->hw.constraint; # read -> NULL if (!test_bit(hwc->idx, c->idxmsk)) # <- *BOOM* NULL deref This in particular is possible when the event in question is a cpu-wide event and group-leader, where the validate_group() tries to add an event to the group. Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Hunter <ahh@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 43b4578071c0 ("perf/x86: Reduce stack usage of x86_schedule_events()") Fixes: e979121b1b15 ("perf/x86/intel: Implement cross-HT corruption bug workaround") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-27net: phy: Add phy_interface_is_rgmii helperFlorian Fainelli
RGMII interfaces come in 4 different flavors that the PHY library needs to care about: regular RGMII (no delays), RGMII with either RX or TX delay, and both. In order to avoid errors of checking only for one type of RGMII interface and miss the 3 others, introduce a convenience function which tests for all values. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-26drivers: of/base: move of_init to driver_initSudeep Holla
Commit 5590f3196b29 ("drivers/core/of: Add symlink to device-tree from devices with an OF node") adds the symlink `of_node` for each device pointing to it's device tree node while creating/initialising it. However the devicetree sysfs is created and setup in of_init which is executed at core_initcall level. For all the devices created before of_init, the following error is thrown: "Error -2(-ENOENT) creating of_node link" Like many other components in driver model, initialize the sysfs support for OF/devicetree from driver_init so that it's ready before any devices are created. Fixes: 5590f3196b29 ("drivers/core/of: Add symlink to device-tree from devices with an OF node") Suggested-by: Rob Herring <robh+dt@kernel.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Robert Schwebel <r.schwebel@pengutronix.de> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-05-26bcma: add module_bcma_driver()Hauke Mehrtens
This makes it possible to save some lines of code in drivers with an simple bcma driver registration. Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
2015-05-25net: phy: bcm7xxx: Fix 7425 PHY ID and flagsFlorian Fainelli
While adding support for 7425 PHY in the 7xxx PHY driver, the ID that was used was actually coming from an external PHY: a BCM5461x. Fix this by using the proper ID for the internal 7425 PHY and set the PHY_IS_INTERNAL flag, otherwise consumers of this PHY driver would not be able to properly identify it as such. Fixes: d068b02cfdfc2 ("net: phy: add BCM7425 and BCM7429 PHYs") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Petri Gynther <pgynther@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25net: make skb_splice_bits more configureableHannes Frederic Sowa
Prepare skb_splice_bits to be able to deal with AF_UNIX sockets. AF_UNIX sockets don't use lock_sock/release_sock and thus we have to use a callback to make the locking and unlocking configureable. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-25net: skbuff: add skb_append_pagefrags and use itHannes Frederic Sowa
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-23Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Thomas Gleixner: "One more fix from the timer departement: - Handle division of negative nanosecond values proper on 32bit. A recent cleanup wrecked the sign handling of the dividend and dropped the check for negative divisors" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: ktime: Fix ktime_divns to do signed division
2015-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/cadence/macb.c drivers/net/phy/phy.c include/linux/skbuff.h net/ipv4/tcp.c net/switchdev/switchdev.c Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD} renaming overlapping with net-next changes of various sorts. phy.c was a case of two changes, one adding a local variable to a function whilst the second was removing one. tcp.c overlapped a deadlock fix with the addition of new tcp_info statistic values. macb.c involved the addition of two zyncq device entries. skbuff.h involved adding back ipv4_daddr to nf_bridge_info whilst net-next changes put two other existing members of that struct into a union. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Don't leak ipvs->sysctl_tbl, from Tommi Rentala. 2) Fix neighbour table entry leak in rocker driver, from Ying Xue. 3) Do not emit bonding notifications for unregistered interfaces, from Nicolas Dichtel. 4) Set ipv6 flow label properly when in TIME_WAIT state, from Florent Fourcot. 5) Fix regression in ipv6 multicast filter test, from Henning Rogge. 6) do_replace() in various footables netfilter modules is missing a check for 0 counters in the datastructure provided by the user. Fix from Dave Jones, and found with trinity. 7) Fix RCU bug in packet scheduler classifier module unloads, from Daniel Borkmann. 8) Avoid deadlock in tcp_get_info() by using u64_sync. From Eric Dumzaet. 9) Input packet processing can race with inetdev_destroy() teardown, fix potential OOPS in ip_error() by explicitly testing whether the inetdev is still attached. From Eric W Biederman. 10) MLDv2 parser in bridge multicast code breaks too early while parsing. Fix from Thadeu Lima de Souza Cascardo. 11) Asking for settings on non-zero PHYID doesn't work because we do not import the command structure from the user and use the PHYID provided there. Fix from Arun Parameswaran. 12) Fix UDP checksums with IPV6 RAW sockets, from Vlad Yasevich. 13) Missing NF_TABLES depends for TPROXY etc can cause build failures, fix from Florian Westphal. 14) Fix netfilter conntrack to handle RFC5961 challenge ACKs properly, from Jesper Dangaard Brouer. 15) If netlink autobind retry fails, we have to reset the sockets portid back to zero. From Herbert Xu. 16) VXLAN netns exit code unregisters using wrong device, from John W Linville. 17) Add some USB device IDs to ath3k and btusb bluetooth drivers, from Dmitry Tunin and Wen-chien Jesse Sung. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits) bridge: fix lockdep splat net: core: 'ethtool' issue with querying phy settings bridge: fix parsing of MLDv2 reports ARM: zynq: DT: Use the zynq binding with macb net: macb: Disable half duplex gigabit on Zynq net: macb: Document zynq gem dt binding ipv4: fill in table id when replacing a route cdc_ncm: Fix tx_bytes statistics ipv4: Avoid crashing in ip_error tcp: fix a potential deadlock in tcp_get_info() net: sched: fix call_rcu() race on classifier module unloads net: phy: Make sure phy_start() always re-enables the phy interrupts ipv6: fix ECMP route replacement ipv6: do not delete previously existing ECMP routes if add fails Revert "netfilter: bridge: query conntrack about skb dnat" netfilter: ensure number of counters is >0 in do_replace() netfilter: nfnetlink_{log,queue}: Register pernet in first place tcp: don't over-send F-RTO probes tcp: only undo on partial ACKs in CA_Loss net/ipv6/udp: Fix ipv6 multicast socket filter regression ...
2015-05-22Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Three small fixes that have been picked up the last few weeks. Specifically: - Fix a memory corruption issue in NVMe with malignant user constructed request. From Christoph. - Kill (now) unused blk_queue_bio(), dm was changed to not need this anymore. From Mike Snitzer. - Always use blk_schedule_flush_plug() from the io_schedule() path when flushing a plug, fixing a !TASK_RUNNING warning with md. From Shaohua" * 'for-linus' of git://git.kernel.dk/linux-block: sched: always use blk_schedule_flush_plug in io_schedule_out nvme: fix kernel memory corruption with short INQUIRY buffers block: remove export for blk_queue_bio
2015-05-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contain Netfilter fixes for your net tree, they are: 1) Fix a race in nfnetlink_log and nfnetlink_queue that can lead to a crash. This problem is due to wrong order in the per-net registration and netlink socket events. Patch from Francesco Ruggeri. 2) Make sure that counters that userspace pass us are higher than 0 in all the x_tables frontends. Discovered via Trinity, patch from Dave Jones. 3) Revert a patch for br_netfilter to rely on the conntrack status bits. This breaks stateless IPv6 NAT transformations. Patch from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-22tcp: fix a potential deadlock in tcp_get_info()Eric Dumazet
Taking socket spinlock in tcp_get_info() can deadlock, as inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i], while packet processing can use the reverse locking order. We could avoid this locking for TCP_LISTEN states, but lockdep would certainly get confused as all TCP sockets share same lockdep classes. [ 523.722504] ====================================================== [ 523.728706] [ INFO: possible circular locking dependency detected ] [ 523.734990] 4.1.0-dbg-DEV #1676 Not tainted [ 523.739202] ------------------------------------------------------- [ 523.745474] ss/18032 is trying to acquire lock: [ 523.750002] (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360 [ 523.758129] [ 523.758129] but task is already holding lock: [ 523.763968] (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0 [ 523.774661] [ 523.774661] which lock already depends on the new lock. [ 523.774661] [ 523.782850] [ 523.782850] the existing dependency chain (in reverse order) is: [ 523.790326] -> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}: [ 523.796599] [<ffffffff811126bb>] lock_acquire+0xbb/0x270 [ 523.802565] [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50 [ 523.808628] [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110 [ 523.815273] [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350 [ 523.822067] [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500 [ 523.828199] [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0 [ 523.834331] [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10 [ 523.840202] [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0 [ 523.847214] [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0 [ 523.853440] [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0 [ 523.859624] [<ffffffff81659db7>] ip_rcv+0x307/0x420 Lets use u64_sync infrastructure instead. As a bonus, 64bit arches get optimized, as these are nop for them. Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21tcp: add tcpi_segs_in and tcpi_segs_out to tcp_infoMarcelo Ricardo Leitner
This patch tracks the total number of inbound and outbound segments on a TCP socket. One may use this number to have an idea on connection quality when compared against the retransmissions. RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut These are a 32bit field each and can be fetched both from TCP_INFO getsockopt() if one has a handle on a TCP socket, or from inet_diag netlink facility (iproute2/ss patch will follow) Note that tp->segs_out was placed near tp->snd_nxt for good data locality and minimal performance impact, while tp->segs_in was placed near tp->bytes_received for the same reason. Join work with Eric Dumazet. Note that received SYN are accounted on the listener, but sent SYNACK are not accounted. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-21Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID fixes from Jiri Kosina: "Bugfixes for HID subsystem that should go in 4.1. Important highlights: - the patch that extended support for HID++ protocol for TK820 touchpad turns out to be causing regressions due to firmware issues; patch reverting back to basic support from Benjamin Tissoires - Wacom driver can oops for devices that report non-touch data on touch interfaces. Fix from Ping Cheng - gpiolib is not mandatory for i2c-hid, so the driver shouldn't fail if gpiolib is not enabled. Fix from Mika Westerberg" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: HID: wacom: fix an Oops caused by wacom_wac_finger_count_touches HID: usbhid: Add HID_QUIRK_NOGET for Aten DVI KVM switch HID: hid-sensor-hub: Fix debug lock warning Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820" HID: i2c-hid: Do not fail probing if gpiolib is not enabled
2015-05-21Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fixes from Michael Turquette: "The first set of clk fixes for 4.1 are all driver bugs, with the exception of a single locking fix in the core code. All driver fixes are for code that was merged recently. The Samsung stuff is mostly fixes around suspend/resume, the Qualcomm fixes are for invalid hardware configuration data and the Silicon Labs patches are fixes following their move away from platform_data to Device Tree" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: si5351: Do not pass struct clk in platform_data clk: si5351: Mention clock-names in the binding documentation clk: add missing lock when call clk_core_enable in clk_set_parent clk: exynos5420: Restore GATE_BUS_TOP on suspend clk: qcom: Fix MSM8916 gfx3d_clk_src configuration clk: qcom: Fix MSM8916 venus divider value clk: exynos5433: Fix wrong PMS value of exynos5433_pll_rates clk: exynos5433: Fix wrong parent clock of sclk_apollo clock clk: exynos5433: Fix CLK_PCLK_MONOTONIC_CNT clk register assignment clk: exynos5433: Fix wrong offset of PCLK_MSCL_SECURE_SMMU_JPEG clk: Use CONFIG_ARCH_EXYNOS instead of CONFIG_ARCH_EXYNOS5433
2015-05-21bpf: allow bpf programs to tail-call other bpf programsAlexei Starovoitov
introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used from BPF programs like: int bpf_prog(struct pt_regs *ctx) { ... bpf_tail_call(ctx, &jmp_table, index); ... } that is roughly equivalent to: int bpf_prog(struct pt_regs *ctx) { ... if (jmp_table[index]) return (*jmp_table[index])(ctx); ... } The important detail that it's not a normal call, but a tail call. The kernel stack is precious, so this helper reuses the current stack frame and jumps into another BPF program without adding extra call frame. It's trivially done in interpreter and a bit trickier in JITs. In case of x64 JIT the bigger part of generated assembler prologue is common for all programs, so it is simply skipped while jumping. Other JITs can do similar prologue-skipping optimization or do stack unwind before jumping into the next program. bpf_tail_call() arguments: ctx - context pointer jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table index - index in the jump table Since all BPF programs are idenitified by file descriptor, user space need to populate the jmp_table with FDs of other BPF programs. If jmp_table[index] is empty the bpf_tail_call() doesn't jump anywhere and program execution continues as normal. New BPF_MAP_TYPE_PROG_ARRAY map type is introduced so that user space can populate this jmp_table array with FDs of other bpf programs. Programs can share the same jmp_table array or use multiple jmp_tables. The chain of tail calls can form unpredictable dynamic loops therefore tail_call_cnt is used to limit the number of calls and currently is set to 32. Use cases: Acked-by: Daniel Borkmann <daniel@iogearbox.net> ========== - simplify complex programs by splitting them into a sequence of small programs - dispatch routine For tracing and future seccomp the program may be triggered on all system calls, but processing of syscall arguments will be different. It's more efficient to implement them as: int syscall_entry(struct seccomp_data *ctx) { bpf_tail_call(ctx, &syscall_jmp_table, ctx->nr /* syscall number */); ... default: process unknown syscall ... } int sys_write_event(struct seccomp_data *ctx) {...} int sys_read_event(struct seccomp_data *ctx) {...} syscall_jmp_table[__NR_write] = sys_write_event; syscall_jmp_table[__NR_read] = sys_read_event; For networking the program may call into different parsers depending on packet format, like: int packet_parser(struct __sk_buff *skb) { ... parse L2, L3 here ... __u8 ipproto = load_byte(skb, ... offsetof(struct iphdr, protocol)); bpf_tail_call(skb, &ipproto_jmp_table, ipproto); ... default: process unknown protocol ... } int parse_tcp(struct __sk_buff *skb) {...} int parse_udp(struct __sk_buff *skb) {...} ipproto_jmp_table[IPPROTO_TCP] = parse_tcp; ipproto_jmp_table[IPPROTO_UDP] = parse_udp; - for TC use case, bpf_tail_call() allows to implement reclassify-like logic - bpf_map_update_elem/delete calls into BPF_MAP_TYPE_PROG_ARRAY jump table are atomic, so user space can build chains of BPF programs on the fly Implementation details: ======================= - high performance of bpf_tail_call() is the goal. It could have been implemented without JIT changes as a wrapper on top of BPF_PROG_RUN() macro, but with two downsides: . all programs would have to pay performance penalty for this feature and tail call itself would be slower, since mandatory stack unwind, return, stack allocate would be done for every tailcall. . tailcall would be limited to programs running preempt_disabled, since generic 'void *ctx' doesn't have room for 'tail_call_cnt' and it would need to be either global per_cpu variable accessed by helper and by wrapper or global variable protected by locks. In this implementation x64 JIT bypasses stack unwind and jumps into the callee program after prologue. - bpf_prog_array_compatible() ensures that prog_type of callee and caller are the same and JITed/non-JITed flag is the same, since calling JITed program from non-JITed is invalid, since stack frames are different. Similarly calling kprobe type program from socket type program is invalid. - jump table is implemented as BPF_MAP_TYPE_PROG_ARRAY to reuse 'map' abstraction, its user space API and all of verifier logic. It's in the existing arraymap.c file, since several functions are shared with regular array map. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20Revert "netfilter: bridge: query conntrack about skb dnat"Florian Westphal
This reverts commit c055d5b03bb4cb69d349d787c9787c0383abd8b2. There are two issues: 'dnat_took_place' made me think that this is related to -j DNAT/MASQUERADE. But thats only one part of the story. This is also relevant for SNAT when we undo snat translation in reverse/reply direction. Furthermore, I originally wanted to do this mainly to avoid storing ipv6 addresses once we make DNAT/REDIRECT work for ipv6 on bridges. However, I forgot about SNPT/DNPT which is stateless. So we can't escape storing address for ipv6 anyway. Might as well do it for ipv4 too. Reported-and-tested-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-05-18Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next. Briefly speaking, cleanups and minor fixes for ipset from Jozsef Kadlecsik and Serget Popovich, more incremental updates to make br_netfilter a better place from Florian Westphal, ARP support to the x_tables mark match / target from and context Zhang Chunyu and the addition of context to know that the x_tables runs through nft_compat. More specifically, they are: 1) Fix sparse warning in ipset/ip_set_hash_ipmark.c when fetching the IPSET_ATTR_MARK netlink attribute, from Jozsef Kadlecsik. 2) Rename STREQ macro to STRNCMP in ipset, also from Jozsef. 3) Use skb->network_header to calculate the transport offset in ip_set_get_ip{4,6}_port(). From Alexander Drozdov. 4) Reduce memory consumption per element due to size miscalculation, this patch and follow up patches from Sergey Popovich. 5) Expand nomatch field from 1 bit to 8 bits to allow to simplify mtype_data_reset_flags(), also from Sergey. 6) Small clean for ipset macro trickery. 7) Fix error reporting when both ip_set_get_hostipaddr4() and ip_set_get_extensions() from per-set uadt functions. 8) Simplify IPSET_ATTR_PORT netlink attribute validation. 9) Introduce HOST_MASK instead of hardcoded 32 in ipset. 10) Return true/false instead of 0/1 in functions that return boolean in the ipset code. 11) Validate maximum length of the IPSET_ATTR_COMMENT netlink attribute. 12) Allow to dereference from ext_*() ipset macros. 13) Get rid of incorrect definitions of HKEY_DATALEN. 14) Include linux/netfilter/ipset/ip_set.h in the x_tables set match. 15) Reduce nf_bridge_info size in br_netfilter, from Florian Westphal. 16) Release nf_bridge_info after POSTROUTING since this is only needed from the physdev match, also from Florian. 17) Reduce size of ipset code by deinlining ip_set_put_extensions(), from Denys Vlasenko. 18) Oneliner to add ARP support to the x_tables mark match/target, from Zhang Chunyu. 19) Add context to know if the x_tables extension runs from nft_compat, to address minor problems with three existing extensions. 20) Correct return value in several seqfile *_show() functions in the netfilter tree, from Joe Perches. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-17net: fix two sparse errorsEric Dumazet
First one in __skb_checksum_validate_complete() fixes the following (and other callers) make C=2 CF=-D__CHECK_ENDIAN__ net/ipv4/tcp_ipv4.o CHECK net/ipv4/tcp_ipv4.c include/linux/skbuff.h:3052:24: warning: incorrect type in return expression (different base types) include/linux/skbuff.h:3052:24: expected restricted __sum16 include/linux/skbuff.h:3052:24: got int Second is fixing gso_make_checksum() : CHECK net/ipv4/gre_offload.c include/linux/skbuff.h:3360:14: warning: incorrect type in assignment (different base types) include/linux/skbuff.h:3360:14: expected unsigned short [unsigned] [usertype] csum include/linux/skbuff.h:3360:14: got restricted __sum16 include/linux/skbuff.h:3365:16: warning: incorrect type in return expression (different base types) include/linux/skbuff.h:3365:16: expected restricted __sum16 include/linux/skbuff.h:3365:16: got unsigned short [unsigned] [usertype] csum Fixes: 5a21232983aa7 ("net: Support for csum_bad in skbuff") Fixes: 7e2b10c1e52ca ("net: Support for multiple checksums with gso") Signed-off-by: Eric Dumazet <edumazet@google.com> CC: Tom Herbert <tom@herbertland.com> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-16Merge tag 'tty-4.1-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here's some TTY and serial driver fixes for reported issues. All of these have been in linux-next successfully" * tag 'tty-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: pty: Fix input race when closing tty/n_gsm.c: fix a memory leak when gsmtty is removed Revert "serial/amba-pl011: Leave the TX IRQ alone when the UART is not open" serial: omap: Fix error handling in probe earlycon: Revert log warnings
2015-05-16rhashtable: Add cap on number of elements in hash tableHerbert Xu
We currently have no limit on the number of elements in a hash table. This is a problem because some users (tipc) set a ceiling on the maximum table size and when that is reached the hash table may degenerate. Others may encounter OOM when growing and if we allow insertions when that happens the hash table perofrmance may also suffer. This patch adds a new paramater insecure_max_entries which becomes the cap on the table. If unset it defaults to max_size * 2. If it is also zero it means that there is no cap on the number of elements in the table. However, the table will grow whenever the utilisation hits 100% and if that growth fails, you will get ENOMEM on insertion. As allowing oversubscription is potentially dangerous, the name contains the word insecure. Note that the cap is not a hard limit. This is done for performance reasons as enforcing a hard limit will result in use of atomic ops that are heavier than the ones we currently use. The reasoning is that we're only guarding against a gross over- subscription of the table, rather than a small breach of the limit. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-15Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: a suspend/resume related regression fix, and an RT priority boosting fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix regression in cpuset_cpu_inactive() for suspend sched: Handle priority boosted tasks proper in setscheduler()
2015-05-15netfilter: x_tables: add context to know if extension runs from nft_compatPablo Neira Ayuso
Currently, we have four xtables extensions that cannot be used from the xt over nft compat layer. The problem is that they need real access to the full blown xt_entry to validate that the rule comes with the right dependencies. This check was introduced to overcome the lack of sufficient userspace dependency validation in iptables. To resolve this problem, this patch introduces a new field to the xt_tgchk_param structure that tell us if the extension is run from nft_compat context. The three affected extensions are: 1) CLUSTERIP, this target has been superseded by xt_cluster. So just bail out by returning -EINVAL. 2) TCPMSS. Relax the checking when used from nft_compat. If used with the wrong configuration, it will corrupt !syn packets by adding TCP MSS option. 3) ebt_stp. Relax the check to make sure it uses the reserved destination MAC address for STP. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
2015-05-14mdio-gpio: Propagate mii_bus.phy_ignore_ta_maskBert Vermeulen
This also changes mii_bus.phy_mask to u32 for consistency. Signed-off-by: Bert Vermeulen <bert@biot.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14test_bpf: add tests related to BPF_MAXINSNSDaniel Borkmann
Couple of torture test cases related to the bug fixed in 0b59d8806a31 ("ARM: net: delegate filter to kernel interpreter when imm_offset() return value can't fit into 12bits."). I've added a helper to allocate and fill the insn space. Output on x86_64 from my laptop: test_bpf: #233 BPF_MAXINSNS: Maximum possible literals jited:0 7 PASS test_bpf: #234 BPF_MAXINSNS: Single literal jited:0 8 PASS test_bpf: #235 BPF_MAXINSNS: Run/add until end jited:0 11553 PASS test_bpf: #236 BPF_MAXINSNS: Too many instructions PASS test_bpf: #237 BPF_MAXINSNS: Very long jump jited:0 9 PASS test_bpf: #238 BPF_MAXINSNS: Ctx heavy transformations jited:0 20329 20398 PASS test_bpf: #239 BPF_MAXINSNS: Call heavy transformations jited:0 32178 32475 PASS test_bpf: #240 BPF_MAXINSNS: Jump heavy test jited:0 10518 PASS test_bpf: #233 BPF_MAXINSNS: Maximum possible literals jited:1 4 PASS test_bpf: #234 BPF_MAXINSNS: Single literal jited:1 4 PASS test_bpf: #235 BPF_MAXINSNS: Run/add until end jited:1 1625 PASS test_bpf: #236 BPF_MAXINSNS: Too many instructions PASS test_bpf: #237 BPF_MAXINSNS: Very long jump jited:1 8 PASS test_bpf: #238 BPF_MAXINSNS: Ctx heavy transformations jited:1 3301 3174 PASS test_bpf: #239 BPF_MAXINSNS: Call heavy transformations jited:1 24107 23491 PASS test_bpf: #240 BPF_MAXINSNS: Jump heavy test jited:1 8651 PASS Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Nicolas Schichan <nschichan@freebox.fr> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-14uidgid: make uid_valid and gid_valid work with !CONFIG_MULTIUSERJosh Triplett
{u,g}id_valid call {u,g}id_eq, which calls __k{u,g}id_val on both arguments and compares. With !CONFIG_MULTIUSER, __k{u,g}id_val return a constant 0, which makes {u,g}id_valid always return false. Change {u,g}id_valid to compare their argument against -1 instead. That produces identical results in the normal CONFIG_MULTIUSER=y case, but with !CONFIG_MULTIUSER will make {u,g}id_valid constant-fold into "return true;" rather than "return false;". This fixes uses of devpts without CONFIG_MULTIUSER. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com>, Cc: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-14gfp: add __GFP_NOACCOUNTVladimir Davydov
Not all kmem allocations should be accounted to memcg. The following patch gives an example when accounting of a certain type of allocations to memcg can effectively result in a memory leak. This patch adds the __GFP_NOACCOUNT flag which if passed to kmalloc and friends will force the allocation to go through the root cgroup. It will be used by the next patch. Note, since in case of kmemleak enabled each kmalloc implies yet another allocation from the kmemleak_object cache, we add __GFP_NOACCOUNT to gfp_kmemleak_mask. Alternatively, we could introduce a per kmem cache flag disabling accounting for all allocations of a particular kind, but (a) we would not be able to bypass accounting for kmalloc then and (b) a kmem cache with this flag set could not be merged with a kmem cache without this flag, which would increase the number of global caches and therefore fragmentation even if the memory cgroup controller is not used. Despite its generic name, currently __GFP_NOACCOUNT disables accounting only for kmem allocations while user page allocations are always charged. To catch abusing of this flag, a warning is issued on an attempt of passing it to mem_cgroup_try_charge. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Greg Thelen <gthelen@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> [4.0.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>