commit 6f457819e8343ae097b7122a57694d26d4ce13b0 Author: Greg Kroah-Hartman Date: Sat Oct 21 17:07:27 2017 +0200 Linux 3.18.77 commit 3cde552980e99d537ff88352202932f8c0bc12e4 Author: Greg Kroah-Hartman Date: Thu Oct 19 15:28:08 2017 +0200 Revert "tty: goldfish: Fix a parameter of a call to free_irq" This reverts commit 0961072120f3e40fe98c2bb49c45549ca3f042dc which is commit 1a5c2d1de7d35f5eb9793266237903348989502b upstream. Ben writes: This fixes a bug introduced in 4.6 by commit 465893e18878 "tty: goldfish: support platform_device with id -1". For earlier kernel versions, it *introduces* a bug. So let's drop it. Reported-by: Ben Hutchings Cc: Christophe JAILLET Cc: Sasha Levin Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org commit e524bfab417cb2c109eb9b8e900f98a944dbf494 Author: Varun Prakash Date: Fri Jan 20 16:44:33 2017 +0530 target/iscsi: Fix unsolicited data seq_end_offset calculation [ Upstream commit 4d65491c269729a1e3b375c45e73213f49103d33 ] In case of unsolicited data for the first sequence seq_end_offset must be set to minimum of total data length and FirstBurstLength, so do not add cmd->write_data_done to the min of total data length and FirstBurstLength. This patch avoids that with ImmediateData=Yes, InitialR2T=No, MaxXmitDataSegmentLength < FirstBurstLength that a WRITE command with IO size above FirstBurstLength triggers sequence error messages, for example Set following parameters on target (linux-4.8.12) ImmediateData = Yes InitialR2T = No MaxXmitDataSegmentLength = 8k FirstBurstLength = 64k Log in from Open iSCSI initiator and execute dd if=/dev/zero of=/dev/sdb bs=128k count=1 oflag=direct Error messages on target Command ITT: 0x00000035 with Offset: 65536, Length: 8192 outside of Sequence 73728:131072 while DataSequenceInOrder=Yes. Command ITT: 0x00000035, received DataSN: 0x00000001 higher than expected 0x00000000. Unable to perform within-command recovery while ERL=0. Signed-off-by: Varun Prakash [ bvanassche: Use min() instead of open-coding it / edited patch description ] Signed-off-by: Bart Van Assche Signed-off-by: Nicholas Bellinger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 1405b8e763e6634f0c7503c47184509ba3911387 Author: Dmitry V. Levin Date: Thu Feb 16 18:04:29 2017 +0300 uapi: fix linux/mroute6.h userspace compilation errors [ Upstream commit 72aa107df6a275cf03359934ca5799a2be7a1bf7 ] Include to fix the following linux/mroute6.h userspace compilation errors: /usr/include/linux/mroute6.h:80:22: error: field 'mf6cc_origin' has incomplete type struct sockaddr_in6 mf6cc_origin; /* Origin of mcast */ /usr/include/linux/mroute6.h:81:22: error: field 'mf6cc_mcastgrp' has incomplete type struct sockaddr_in6 mf6cc_mcastgrp; /* Group in question */ /usr/include/linux/mroute6.h:91:22: error: field 'src' has incomplete type struct sockaddr_in6 src; /usr/include/linux/mroute6.h:92:22: error: field 'grp' has incomplete type struct sockaddr_in6 grp; /usr/include/linux/mroute6.h:132:18: error: field 'im6_src' has incomplete type struct in6_addr im6_src, im6_dst; /usr/include/linux/mroute6.h:132:27: error: field 'im6_dst' has incomplete type struct in6_addr im6_src, im6_dst; Signed-off-by: Dmitry V. Levin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit d090846a538dc4081140b4b76fbbe6b806611f7f Author: Dmitry V. Levin Date: Thu Feb 16 18:05:45 2017 +0300 uapi: fix linux/rds.h userspace compilation errors [ Upstream commit feb0869d90e51ce8b6fd8a46588465b1b5a26d09 ] Consistently use types from linux/types.h to fix the following linux/rds.h userspace compilation errors: /usr/include/linux/rds.h:106:2: error: unknown type name 'uint8_t' uint8_t name[32]; /usr/include/linux/rds.h:107:2: error: unknown type name 'uint64_t' uint64_t value; /usr/include/linux/rds.h:117:2: error: unknown type name 'uint64_t' uint64_t next_tx_seq; /usr/include/linux/rds.h:118:2: error: unknown type name 'uint64_t' uint64_t next_rx_seq; /usr/include/linux/rds.h:121:2: error: unknown type name 'uint8_t' uint8_t transport[TRANSNAMSIZ]; /* null term ascii */ /usr/include/linux/rds.h:122:2: error: unknown type name 'uint8_t' uint8_t flags; /usr/include/linux/rds.h:129:2: error: unknown type name 'uint64_t' uint64_t seq; /usr/include/linux/rds.h:130:2: error: unknown type name 'uint32_t' uint32_t len; /usr/include/linux/rds.h:135:2: error: unknown type name 'uint8_t' uint8_t flags; /usr/include/linux/rds.h:139:2: error: unknown type name 'uint32_t' uint32_t sndbuf; /usr/include/linux/rds.h:144:2: error: unknown type name 'uint32_t' uint32_t rcvbuf; /usr/include/linux/rds.h:145:2: error: unknown type name 'uint64_t' uint64_t inum; /usr/include/linux/rds.h:153:2: error: unknown type name 'uint64_t' uint64_t hdr_rem; /usr/include/linux/rds.h:154:2: error: unknown type name 'uint64_t' uint64_t data_rem; /usr/include/linux/rds.h:155:2: error: unknown type name 'uint32_t' uint32_t last_sent_nxt; /usr/include/linux/rds.h:156:2: error: unknown type name 'uint32_t' uint32_t last_expected_una; /usr/include/linux/rds.h:157:2: error: unknown type name 'uint32_t' uint32_t last_seen_una; /usr/include/linux/rds.h:164:2: error: unknown type name 'uint8_t' uint8_t src_gid[RDS_IB_GID_LEN]; /usr/include/linux/rds.h:165:2: error: unknown type name 'uint8_t' uint8_t dst_gid[RDS_IB_GID_LEN]; /usr/include/linux/rds.h:167:2: error: unknown type name 'uint32_t' uint32_t max_send_wr; /usr/include/linux/rds.h:168:2: error: unknown type name 'uint32_t' uint32_t max_recv_wr; /usr/include/linux/rds.h:169:2: error: unknown type name 'uint32_t' uint32_t max_send_sge; /usr/include/linux/rds.h:170:2: error: unknown type name 'uint32_t' uint32_t rdma_mr_max; /usr/include/linux/rds.h:171:2: error: unknown type name 'uint32_t' uint32_t rdma_mr_size; /usr/include/linux/rds.h:212:9: error: unknown type name 'uint64_t' typedef uint64_t rds_rdma_cookie_t; /usr/include/linux/rds.h:215:2: error: unknown type name 'uint64_t' uint64_t addr; /usr/include/linux/rds.h:216:2: error: unknown type name 'uint64_t' uint64_t bytes; /usr/include/linux/rds.h:221:2: error: unknown type name 'uint64_t' uint64_t cookie_addr; /usr/include/linux/rds.h:222:2: error: unknown type name 'uint64_t' uint64_t flags; /usr/include/linux/rds.h:228:2: error: unknown type name 'uint64_t' uint64_t cookie_addr; /usr/include/linux/rds.h:229:2: error: unknown type name 'uint64_t' uint64_t flags; /usr/include/linux/rds.h:234:2: error: unknown type name 'uint64_t' uint64_t flags; /usr/include/linux/rds.h:240:2: error: unknown type name 'uint64_t' uint64_t local_vec_addr; /usr/include/linux/rds.h:241:2: error: unknown type name 'uint64_t' uint64_t nr_local; /usr/include/linux/rds.h:242:2: error: unknown type name 'uint64_t' uint64_t flags; /usr/include/linux/rds.h:243:2: error: unknown type name 'uint64_t' uint64_t user_token; /usr/include/linux/rds.h:248:2: error: unknown type name 'uint64_t' uint64_t local_addr; /usr/include/linux/rds.h:249:2: error: unknown type name 'uint64_t' uint64_t remote_addr; /usr/include/linux/rds.h:252:4: error: unknown type name 'uint64_t' uint64_t compare; /usr/include/linux/rds.h:253:4: error: unknown type name 'uint64_t' uint64_t swap; /usr/include/linux/rds.h:256:4: error: unknown type name 'uint64_t' uint64_t add; /usr/include/linux/rds.h:259:4: error: unknown type name 'uint64_t' uint64_t compare; /usr/include/linux/rds.h:260:4: error: unknown type name 'uint64_t' uint64_t swap; /usr/include/linux/rds.h:261:4: error: unknown type name 'uint64_t' uint64_t compare_mask; /usr/include/linux/rds.h:262:4: error: unknown type name 'uint64_t' uint64_t swap_mask; /usr/include/linux/rds.h:265:4: error: unknown type name 'uint64_t' uint64_t add; /usr/include/linux/rds.h:266:4: error: unknown type name 'uint64_t' uint64_t nocarry_mask; /usr/include/linux/rds.h:269:2: error: unknown type name 'uint64_t' uint64_t flags; /usr/include/linux/rds.h:270:2: error: unknown type name 'uint64_t' uint64_t user_token; /usr/include/linux/rds.h:274:2: error: unknown type name 'uint64_t' uint64_t user_token; /usr/include/linux/rds.h:275:2: error: unknown type name 'int32_t' int32_t status; Signed-off-by: Dmitry V. Levin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit ffc669a3af00c8c3a9ceeb1b30a55f89cf38d2ce Author: Dan Carpenter Date: Tue Feb 21 21:46:37 2017 +0300 scsi: scsi_dh_emc: return success in clariion_std_inquiry() [ Upstream commit 4d7d39a18b8b81511f0b893b7d2203790bf8a58b ] We accidentally return an uninitialized variable on success. Fixes: b6ff1b14cdf4 ("[SCSI] scsi_dh: Update EMC handler") Signed-off-by: Dan Carpenter Reviewed-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit b370a0378091e31f8c6a4eb708dfd22be314f7c0 Author: Eric Ren Date: Wed Feb 22 15:40:41 2017 -0800 ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock [ Upstream commit 439a36b8ef38657f765b80b775e2885338d72451 ] We are in the situation that we have to avoid recursive cluster locking, but there is no way to check if a cluster lock has been taken by a precess already. Mostly, we can avoid recursive locking by writing code carefully. However, we found that it's very hard to handle the routines that are invoked directly by vfs code. For instance: const struct inode_operations ocfs2_file_iops = { .permission = ocfs2_permission, .get_acl = ocfs2_iop_get_acl, .set_acl = ocfs2_iop_set_acl, }; Both ocfs2_permission() and ocfs2_iop_get_acl() call ocfs2_inode_lock(PR): do_sys_open may_open inode_permission ocfs2_permission ocfs2_inode_lock() <=== first time generic_permission get_acl ocfs2_iop_get_acl ocfs2_inode_lock() <=== recursive one A deadlock will occur if a remote EX request comes in between two of ocfs2_inode_lock(). Briefly describe how the deadlock is formed: On one hand, OCFS2_LOCK_BLOCKED flag of this lockres is set in BAST(ocfs2_generic_handle_bast) when downconvert is started on behalf of the remote EX lock request. Another hand, the recursive cluster lock (the second one) will be blocked in in __ocfs2_cluster_lock() because of OCFS2_LOCK_BLOCKED. But, the downconvert never complete, why? because there is no chance for the first cluster lock on this node to be unlocked - we block ourselves in the code path. The idea to fix this issue is mostly taken from gfs2 code. 1. introduce a new field: struct ocfs2_lock_res.l_holders, to keep track of the processes' pid who has taken the cluster lock of this lock resource; 2. introduce a new flag for ocfs2_inode_lock_full: OCFS2_META_LOCK_GETBH; it means just getting back disk inode bh for us if we've got cluster lock. 3. export a helper: ocfs2_is_locked_by_me() is used to check if we have got the cluster lock in the upper code path. The tracking logic should be used by some of the ocfs2 vfs's callbacks, to solve the recursive locking issue cuased by the fact that vfs routines can call into each other. The performance penalty of processing the holder list should only be seen at a few cases where the tracking logic is used, such as get/set acl. You may ask what if the first time we got a PR lock, and the second time we want a EX lock? fortunately, this case never happens in the real world, as far as I can see, including permission check, (get|set)_(acl|attr), and the gfs2 code also do so. [sfr@canb.auug.org.au remove some inlines] Link: http://lkml.kernel.org/r/20170117100948.11657-2-zren@suse.com Signed-off-by: Eric Ren Reviewed-by: Junxiao Bi Reviewed-by: Joseph Qi Cc: Stephen Rothwell Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 7cab7260362320e26243925c60d731552a0280a7 Author: Milan Broz Date: Thu Feb 23 08:38:26 2017 +0100 crypto: xts - Add ECB dependency [ Upstream commit 12cb3a1c4184f891d965d1f39f8cfcc9ef617647 ] Since the commit f1c131b45410a202eb45cc55980a7a9e4e4b4f40 crypto: xts - Convert to skcipher the XTS mode is based on ECB, so the mode must select ECB otherwise it can fail to initialize. Signed-off-by: Milan Broz Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit bcd17067a2311847ea05bc62aae0ba297e740e0d Author: Majd Dibbiny Date: Thu Feb 23 12:02:43 2017 +0200 net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs [ Upstream commit 95f1ba9a24af9769f6e20dfe9a77c863f253f311 ] In the VF driver, module parameter mlx4_log_num_mgm_entry_size was mistakenly overwritten -- and in a manner which overrode the device-managed flow steering option encoded in the parameter. log_num_mgm_entry_size is a global module parameter which affects all ConnectX-3 PFs installed on that host. If a VF changes log_num_mgm_entry_size, this will affect all PFs which are probed subsequent to the change (by disabling DMFS for those PFs). Fixes: 3c439b5586e9 ("mlx4_core: Allow choosing flow steering mode") Signed-off-by: Majd Dibbiny Reviewed-by: Jack Morgenstein Signed-off-by: Tariq Toukan Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 2d9e4b59c8fa8e0983bff7432ddf26c42762cf6f Author: Robbie Ko Date: Thu Jan 5 16:24:55 2017 +0800 Btrfs: send, fix failure to rename top level inode due to name collision [ Upstream commit 4dd9920d991745c4a16f53a8f615f706fbe4b3f7 ] Under certain situations, an incremental send operation can fail due to a premature attempt to create a new top level inode (a direct child of the subvolume/snapshot root) whose name collides with another inode that was removed from the send snapshot. Consider the following example scenario. Parent snapshot: . (ino 256, gen 8) |---- a1/ (ino 257, gen 9) |---- a2/ (ino 258, gen 9) Send snapshot: . (ino 256, gen 3) |---- a2/ (ino 257, gen 7) In this scenario, when receiving the incremental send stream, the btrfs receive command fails like this (ran in verbose mode, -vv argument): rmdir a1 mkfile o257-7-0 rename o257-7-0 -> a2 ERROR: rename o257-7-0 -> a2 failed: Is a directory What happens when computing the incremental send stream is: 1) An operation to remove the directory with inode number 257 and generation 9 is issued. 2) An operation to create the inode with number 257 and generation 7 is issued. This creates the inode with an orphanized name of "o257-7-0". 3) An operation rename the new inode 257 to its final name, "a2", is issued. This is incorrect because inode 258, which has the same name and it's a child of the same parent (root inode 256), was not yet processed and therefore no rmdir operation for it was yet issued. The rename operation is issued because we fail to detect that the name of the new inode 257 collides with inode 258, because their parent, a subvolume/snapshot root (inode 256) has a different generation in both snapshots. So fix this by ignoring the generation value of a parent directory that matches a root inode (number 256) when we are checking if the name of the inode currently being processed collides with the name of some other inode that was not yet processed. We can achieve this scenario of different inodes with the same number but different generation values either by mounting a filesystem with the inode cache option (-o inode_cache) or by creating and sending snapshots across different filesystems, like in the following example: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ mkdir /mnt/a1 $ mkdir /mnt/a2 $ btrfs subvolume snapshot -r /mnt /mnt/snap1 $ btrfs send /mnt/snap1 -f /tmp/1.snap $ umount /mnt $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ touch /mnt/a2 $ btrfs subvolume snapshot -r /mnt /mnt/snap2 $ btrfs receive /mnt -f /tmp/1.snap # Take note that once the filesystem is created, its current # generation has value 7 so the inode from the second snapshot has # a generation value of 7. And after receiving the first snapshot # the filesystem is at a generation value of 10, because the call to # create the second snapshot bumps the generation to 8 (the snapshot # creation ioctl does a transaction commit), the receive command calls # the snapshot creation ioctl to create the first snapshot, which bumps # the filesystem's generation to 9, and finally when the receive # operation finishes it calls an ioctl to transition the first snapshot # (snap1) from RW mode to RO mode, which does another transaction commit # and bumps the filesystem's generation to 10. $ rm -f /tmp/1.snap $ btrfs send /mnt/snap1 -f /tmp/1.snap $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.snap $ umount /mnt $ mkfs.btrfs -f /dev/sdd $ mount /dev/sdd /mnt $ btrfs receive /mnt /tmp/1.snap # Receive of snapshot snap2 used to fail. $ btrfs receive /mnt /tmp/2.snap Signed-off-by: Robbie Ko Reviewed-by: Filipe Manana [Rewrote changelog to be more precise and clear] Signed-off-by: Filipe Manana Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 21bf570719a5e5c9c8082601175fa32be83e43d5 Author: Christophe JAILLET Date: Tue Feb 21 07:34:00 2017 +0100 iio: adc: xilinx: Fix error handling [ Upstream commit ca1c39ef76376b67303d01f94fe98bb68bb3861a ] Reorder error handling labels in order to match the way resources have been allocated. Signed-off-by: Christophe JAILLET Acked-by: Lars-Peter Clausen Signed-off-by: Jonathan Cameron Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 9a5043741109f5318638cc79de0a1a28b78b25b9 Author: Jarno Rajahalme Date: Thu Feb 23 17:08:54 2017 -0800 netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value. [ Upstream commit 4b86c459c7bee3acaf92f0e2b4c6ac803eaa1a58 ] Commit 4dee62b1b9b4 ("netfilter: nf_ct_expect: nf_ct_expect_insert() returns void") inadvertently changed the successful return value of nf_ct_expect_related_report() from 0 to 1 due to __nf_ct_expect_check() returning 1 on success. Prevent this regression in the future by changing the return value of __nf_ct_expect_check() to 0 on success. Signed-off-by: Jarno Rajahalme Acked-by: Joe Stringer Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit ef0423167be6c11d68bfcbd1190f508727f1d042 Author: Franck Demathieu Date: Thu Feb 23 10:48:55 2017 +0100 irqchip/crossbar: Fix incorrect type of local variables [ Upstream commit b28ace12661fbcfd90959c1e84ff5a85113a82a1 ] The max and entry variables are unsigned according to the dt-bindings. Fix following 3 sparse issues (-Wtypesign): drivers/irqchip/irq-crossbar.c:222:52: warning: incorrect type in argument 3 (different signedness) drivers/irqchip/irq-crossbar.c:222:52: expected unsigned int [usertype] *out_value drivers/irqchip/irq-crossbar.c:222:52: got int * drivers/irqchip/irq-crossbar.c:245:56: warning: incorrect type in argument 4 (different signedness) drivers/irqchip/irq-crossbar.c:245:56: expected unsigned int [usertype] *out_value drivers/irqchip/irq-crossbar.c:245:56: got int * drivers/irqchip/irq-crossbar.c:263:56: warning: incorrect type in argument 4 (different signedness) drivers/irqchip/irq-crossbar.c:263:56: expected unsigned int [usertype] *out_value drivers/irqchip/irq-crossbar.c:263:56: got int * Signed-off-by: Franck Demathieu Cc: marc.zyngier@arm.com Cc: jason@lakedaemon.net Link: http://lkml.kernel.org/r/20170223094855.6546-1-fdemathieu@gmail.com Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 5961adefd0f361deaf925d3c4bff80bc9e87de50 Author: Arnd Bergmann Date: Wed Mar 1 10:15:29 2017 +0100 watchdog: kempld: fix gcc-4.3 build [ Upstream commit 3736d4eb6af37492aeded7fec0072dedd959c842 ] gcc-4.3 can't decide whether the constant value in kempld_prescaler[PRESCALER_21] is built-time constant or not, and gets confused by the logic in do_div(): drivers/watchdog/kempld_wdt.o: In function `kempld_wdt_set_stage_timeout': kempld_wdt.c:(.text.kempld_wdt_set_stage_timeout+0x130): undefined reference to `__aeabi_uldivmod' This adds a call to ACCESS_ONCE() to force it to not consider it to be constant, and leaves the more efficient normal case in place for modern compilers, using an #ifdef to annotate why we do this hack. Signed-off-by: Arnd Bergmann Signed-off-by: Guenter Roeck Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 981199076d96effa8f26321372240a2dbaa3561e Author: Peter Zijlstra Date: Wed Mar 1 16:23:30 2017 +0100 locking/lockdep: Add nest_lock integrity test [ Upstream commit 7fb4a2cea6b18dab56d609530d077f168169ed6b ] Boqun reported that hlock->references can overflow. Add a debug test for that to generate a clear error when this happens. Without this, lockdep is likely to report a mysterious failure on unlock. Reported-by: Boqun Feng Signed-off-by: Peter Zijlstra (Intel) Cc: Andrew Morton Cc: Chris Wilson Cc: Linus Torvalds Cc: Nicolai Hähnle Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman commit 4a8a916d932e719fba2725ea144e5c5803ee6cb0 Author: Greg Kroah-Hartman Date: Thu Oct 19 14:55:29 2017 +0200 Revert "bsg-lib: don't free job in bsg_prepare_job" This reverts commit d9100405a20a71dd620843e0380e38fc50731108 which was commit f507b54dccfd8000c517d740bc45f20c74532d18 upstream. Ben reports: That function doesn't exist here (it was introduced in 4.13). Instead, this backport has modified bsg_create_job(), creating a leak. Please revert this on the 3.18, 4.4 and 4.9 stable branches. So I'm dropping it from here. Reported-by: Ben Hutchings Cc: Christoph Hellwig Cc: Ming Lei Cc: Jens Axboe Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org commit bc8a5a45208d335de143643e51358c8299bce0f3 Author: Christoph Paasch Date: Tue Sep 26 17:38:50 2017 -0700 net: Set sk_prot_creator when cloning sockets to the right proto [ Upstream commit 9d538fa60bad4f7b23193c89e843797a1cf71ef3 ] sk->sk_prot and sk->sk_prot_creator can differ when the app uses IPV6_ADDRFORM (transforming an IPv6-socket to an IPv4-one). Which is why sk_prot_creator is there to make sure that sk_prot_free() does the kmem_cache_free() on the right kmem_cache slab. Now, if such a socket gets transformed back to a listening socket (using connect() with AF_UNSPEC) we will allocate an IPv4 tcp_sock through sk_clone_lock() when a new connection comes in. But sk_prot_creator will still point to the IPv6 kmem_cache (as everything got copied in sk_clone_lock()). When freeing, we will thus put this memory back into the IPv6 kmem_cache although it was allocated in the IPv4 cache. I have seen memory corruption happening because of this. With slub-debugging and MEMCG_KMEM enabled this gives the warning "cache_from_obj: Wrong slab cache. TCPv6 but object is from TCP" A C-program to trigger this: void main(void) { int fd = socket(AF_INET6, SOCK_STREAM, IPPROTO_TCP); int new_fd, newest_fd, client_fd; struct sockaddr_in6 bind_addr; struct sockaddr_in bind_addr4, client_addr1, client_addr2; struct sockaddr unsp; int val; memset(&bind_addr, 0, sizeof(bind_addr)); bind_addr.sin6_family = AF_INET6; bind_addr.sin6_port = ntohs(42424); memset(&client_addr1, 0, sizeof(client_addr1)); client_addr1.sin_family = AF_INET; client_addr1.sin_port = ntohs(42424); client_addr1.sin_addr.s_addr = inet_addr("127.0.0.1"); memset(&client_addr2, 0, sizeof(client_addr2)); client_addr2.sin_family = AF_INET; client_addr2.sin_port = ntohs(42421); client_addr2.sin_addr.s_addr = inet_addr("127.0.0.1"); memset(&unsp, 0, sizeof(unsp)); unsp.sa_family = AF_UNSPEC; bind(fd, (struct sockaddr *)&bind_addr, sizeof(bind_addr)); listen(fd, 5); client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); connect(client_fd, (struct sockaddr *)&client_addr1, sizeof(client_addr1)); new_fd = accept(fd, NULL, NULL); close(fd); val = AF_INET; setsockopt(new_fd, SOL_IPV6, IPV6_ADDRFORM, &val, sizeof(val)); connect(new_fd, &unsp, sizeof(unsp)); memset(&bind_addr4, 0, sizeof(bind_addr4)); bind_addr4.sin_family = AF_INET; bind_addr4.sin_port = ntohs(42421); bind(new_fd, (struct sockaddr *)&bind_addr4, sizeof(bind_addr4)); listen(new_fd, 5); client_fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); connect(client_fd, (struct sockaddr *)&client_addr2, sizeof(client_addr2)); newest_fd = accept(new_fd, NULL, NULL); close(new_fd); close(client_fd); close(new_fd); } As far as I can see, this bug has been there since the beginning of the git-days. Signed-off-by: Christoph Paasch Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit b0763909b4538894bb47614656c75f2a233c40d2 Author: Willem de Bruijn Date: Tue Sep 26 12:19:37 2017 -0400 packet: in packet_do_bind, test fanout with bind_lock held [ Upstream commit 4971613c1639d8e5f102c4e797c3bf8f83a5a69e ] Once a socket has po->fanout set, it remains a member of the group until it is destroyed. The prot_hook must be constant and identical across sockets in the group. If fanout_add races with packet_do_bind between the test of po->fanout and taking the lock, the bind call may make type or dev inconsistent with that of the fanout group. Hold po->bind_lock when testing po->fanout to avoid this race. I had to introduce artificial delay (local_bh_enable) to actually observe the race. Fixes: dc99f600698d ("packet: Add fanout support.") Signed-off-by: Willem de Bruijn Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 5eaafe649af94c12164bb43f81d85cbc7cfe6778 Author: Sabrina Dubroca Date: Tue Sep 26 16:16:43 2017 +0200 l2tp: fix race condition in l2tp_tunnel_delete [ Upstream commit 62b982eeb4589b2e6d7c01a90590e3a4c2b2ca19 ] If we try to delete the same tunnel twice, the first delete operation does a lookup (l2tp_tunnel_get), finds the tunnel, calls l2tp_tunnel_delete, which queues it for deletion by l2tp_tunnel_del_work. The second delete operation also finds the tunnel and calls l2tp_tunnel_delete. If the workqueue has already fired and started running l2tp_tunnel_del_work, then l2tp_tunnel_delete will queue the same tunnel a second time, and try to free the socket again. Add a dead flag to prevent firing the workqueue twice. Then we can remove the check of queue_work's result that was meant to prevent that race but doesn't. Reproducer: ip l2tp add tunnel tunnel_id 3000 peer_tunnel_id 4000 local 192.168.0.2 remote 192.168.0.1 encap udp udp_sport 5000 udp_dport 6000 ip l2tp add session name l2tp1 tunnel_id 3000 session_id 1000 peer_session_id 2000 ip link set l2tp1 up ip l2tp del tunnel tunnel_id 3000 ip l2tp del tunnel tunnel_id 3000 Fixes: f8ccac0e4493 ("l2tp: put tunnel socket release on a workqueue") Reported-by: Jianlin Shi Signed-off-by: Sabrina Dubroca Acked-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 88db37f8e4de97702da5f53c09cdbff0382f9bdd Author: Ridge Kennedy Date: Wed Feb 22 14:59:49 2017 +1300 l2tp: Avoid schedule while atomic in exit_net [ Upstream commit 12d656af4e3d2781b9b9f52538593e1717e7c979 ] While destroying a network namespace that contains a L2TP tunnel a "BUG: scheduling while atomic" can be observed. Enabling lockdep shows that this is happening because l2tp_exit_net() is calling l2tp_tunnel_closeall() (via l2tp_tunnel_delete()) from within an RCU critical section. l2tp_exit_net() takes rcu_read_lock_bh() << list_for_each_entry_rcu() >> l2tp_tunnel_delete() l2tp_tunnel_closeall() __l2tp_session_unhash() synchronize_rcu() << Illegal inside RCU critical section >> BUG: sleeping function called from invalid context in_atomic(): 1, irqs_disabled(): 0, pid: 86, name: kworker/u16:2 INFO: lockdep is turned off. CPU: 2 PID: 86 Comm: kworker/u16:2 Tainted: G W O 4.4.6-at1 #2 Hardware name: Xen HVM domU, BIOS 4.6.1-xs125300 05/09/2016 Workqueue: netns cleanup_net 0000000000000000 ffff880202417b90 ffffffff812b0013 ffff880202410ac0 ffffffff81870de8 ffff880202417bb8 ffffffff8107aee8 ffffffff81870de8 0000000000000c51 0000000000000000 ffff880202417be0 ffffffff8107b024 Call Trace: [] dump_stack+0x85/0xc2 [] ___might_sleep+0x148/0x240 [] __might_sleep+0x44/0x80 [] synchronize_sched+0x2d/0xe0 [] ? trace_hardirqs_on+0xd/0x10 [] ? __local_bh_enable_ip+0x6b/0xc0 [] ? _raw_spin_unlock_bh+0x30/0x40 [] __l2tp_session_unhash+0x172/0x220 [] ? __l2tp_session_unhash+0x87/0x220 [] l2tp_tunnel_closeall+0x9b/0x140 [] l2tp_tunnel_delete+0x14/0x60 [] l2tp_exit_net+0x110/0x270 [] ? l2tp_exit_net+0x9c/0x270 [] ops_exit_list.isra.6+0x33/0x60 [] cleanup_net+0x1b6/0x280 ... This bug can easily be reproduced with a few steps: $ sudo unshare -n bash # Create a shell in a new namespace # ip link set lo up # ip addr add 127.0.0.1 dev lo # ip l2tp add tunnel remote 127.0.0.1 local 127.0.0.1 tunnel_id 1 \ peer_tunnel_id 1 udp_sport 50000 udp_dport 50000 # ip l2tp add session name foo tunnel_id 1 session_id 1 \ peer_session_id 1 # ip link set foo up # exit # Exit the shell, in turn exiting the namespace $ dmesg ... [942121.089216] BUG: scheduling while atomic: kworker/u16:3/13872/0x00000200 ... To fix this, move the call to l2tp_tunnel_closeall() out of the RCU critical section, and instead call it from l2tp_tunnel_del_work(), which is running from the l2tp_wq workqueue. Fixes: 2b551c6e7d5b ("l2tp: close sessions before initiating tunnel delete") Signed-off-by: Ridge Kennedy Acked-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 6132b4ff8fbd56613a8342e75a8b82b6b011a837 Author: Alexey Kodanev Date: Tue Sep 26 15:14:29 2017 +0300 vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit [ Upstream commit 36f6ee22d2d66046e369757ec6bbe1c482957ba6 ] When running LTP IPsec tests, KASan might report: BUG: KASAN: use-after-free in vti_tunnel_xmit+0xeee/0xff0 [ip_vti] Read of size 4 at addr ffff880dc6ad1980 by task swapper/0/0 ... Call Trace: dump_stack+0x63/0x89 print_address_description+0x7c/0x290 kasan_report+0x28d/0x370 ? vti_tunnel_xmit+0xeee/0xff0 [ip_vti] __asan_report_load4_noabort+0x19/0x20 vti_tunnel_xmit+0xeee/0xff0 [ip_vti] ? vti_init_net+0x190/0x190 [ip_vti] ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 dev_hard_start_xmit+0x147/0x510 ? icmp_echo.part.24+0x1f0/0x210 __dev_queue_xmit+0x1394/0x1c60 ... Freed by task 0: save_stack_trace+0x1b/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x70/0xc0 kmem_cache_free+0x81/0x1e0 kfree_skbmem+0xb1/0xe0 kfree_skb+0x75/0x170 kfree_skb_list+0x3e/0x60 __dev_queue_xmit+0x1298/0x1c60 dev_queue_xmit+0x10/0x20 neigh_resolve_output+0x3a8/0x740 ip_finish_output2+0x5c0/0xe70 ip_finish_output+0x4ba/0x680 ip_output+0x1c1/0x3a0 xfrm_output_resume+0xc65/0x13d0 xfrm_output+0x1e4/0x380 xfrm4_output_finish+0x5c/0x70 Can be fixed if we get skb->len before dst_output(). Fixes: b9959fd3b0fa ("vti: switch to new ip tunnel code") Fixes: 22e1b23dafa8 ("vti6: Support inter address family tunneling.") Signed-off-by: Alexey Kodanev Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit ba554fc7dc2b4189ccfc412853c578cf0d613c3b Author: Meng Xu Date: Tue Sep 19 21:49:55 2017 -0400 isdn/i4l: fetch the ppp_write buffer in one shot [ Upstream commit 02388bf87f72e1d47174cd8f81c34443920eb5a0 ] In isdn_ppp_write(), the header (i.e., protobuf) of the buffer is fetched twice from userspace. The first fetch is used to peek at the protocol of the message and reset the huptimer if necessary; while the second fetch copies in the whole buffer. However, given that buf resides in userspace memory, a user process can race to change its memory content across fetches. By doing so, we can either avoid resetting the huptimer for any type of packets (by first setting proto to PPP_LCP and later change to the actual type) or force resetting the huptimer for LCP packets. This patch changes this double-fetch behavior into two single fetches decided by condition (lp->isdn_device < 0 || lp->isdn_channel <0). A more detailed discussion can be found at https://marc.info/?l=linux-kernel&m=150586376926123&w=2 Signed-off-by: Meng Xu Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit e4ffdf9ead59a909f2824a4270356909d6d64380 Author: Willem de Bruijn Date: Thu Sep 14 17:14:41 2017 -0400 packet: hold bind lock when rebinding to fanout hook [ Upstream commit 008ba2a13f2d04c947adc536d19debb8fe66f110 ] Packet socket bind operations must hold the po->bind_lock. This keeps po->running consistent with whether the socket is actually on a ptype list to receive packets. fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then binds the fanout object to receive through packet_rcv_fanout. Make it hold the po->bind_lock when testing po->running and rebinding. Else, it can race with other rebind operations, such as that in packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates can result in a socket being added to a fanout group twice, causing use-after-free KASAN bug reports, among others. Reported independently by both trinity and syzkaller. Verified that the syzkaller reproducer passes after this patch. Fixes: dc99f600698d ("packet: Add fanout support.") Reported-by: nixioaming Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 63692f8b33bd8167169e600350b1d5a3102a71ad Author: Edward Cree Date: Fri Sep 15 14:37:38 2017 +0100 bpf/verifier: reject BPF_ALU64|BPF_END [ Upstream commit e67b8a685c7c984e834e3181ef4619cd7025a136 ] Neither ___bpf_prog_run nor the JITs accept it. Also adds a new test case. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Edward Cree Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit db27397942aa3f3c7b02a28d52878468923834bd Author: Dan Carpenter Date: Thu Sep 14 02:00:54 2017 +0300 sctp: potential read out of bounds in sctp_ulpevent_type_enabled() [ Upstream commit fa5f7b51fc3080c2b195fa87c7eca7c05e56f673 ] This code causes a static checker warning because Smatch doesn't trust anything that comes from skb->data. I've reviewed this code and I do think skb->data can be controlled by the user here. The sctp_event_subscribe struct has 13 __u8 fields and we want to see if ours is non-zero. sn_type can be any value in the 0-USHRT_MAX range. We're subtracting SCTP_SN_TYPE_BASE which is 1 << 15 so we could read either before the start of the struct or after the end. This is a very old bug and it's surprising that it would go undetected for so long but my theory is that it just doesn't have a big impact so it would be hard to notice. Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit 1512129452b835f342c7e0bffac71172d0ac39f3 Author: Jan Kara Date: Thu Aug 11 12:38:55 2016 -0400 ext4: avoid deadlock when expanding inode size [ Upstream commit 2e81a4eeedcaa66e35f58b81e0755b87057ce392 ] When we need to move xattrs into external xattr block, we call ext4_xattr_block_set() from ext4_expand_extra_isize_ea(). That may end up calling ext4_mark_inode_dirty() again which will recurse back into the inode expansion code leading to deadlocks. Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move its management into ext4_expand_extra_isize_ea() since its manipulation is safe there (due to xattr_sem) from possible races with ext4_xattr_set_handle() which plays with it as well. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 9b7e3d75f8c3a704d8bba9ee9049e41371489815 Author: Harry Wentland Date: Mon Dec 7 13:55:52 2015 -0500 drm/dp/mst: save vcpi with payloads commit 6cecdf7a161d2b909dc7c8979176bbc4f0669968 upstream. This makes it possibly for drivers to find the associated mst_port by looking at the payload allocation table. Signed-off-by: Harry Wentland Reviewed-by: Alex Deucher Link: http://patchwork.freedesktop.org/patch/msgid/1449514552-10236-3-git-send-email-harry.wentland@amd.com Signed-off-by: Daniel Vetter Cc: Kai Heng Feng Signed-off-by: Greg Kroah-Hartman commit 608c3080b00944769a3cf898d5e54dca64511dea Author: Sebastian Andrzej Siewior Date: Fri Aug 5 15:37:39 2016 +0200 x86/mm: Disable preemption during CR3 read+write commit 5cf0791da5c162ebc14b01eb01631cfa7ed4fa6e upstream. There's a subtle preemption race on UP kernels: Usually current->mm (and therefore mm->pgd) stays the same during the lifetime of a task so it does not matter if a task gets preempted during the read and write of the CR3. But then, there is this scenario on x86-UP: TaskA is in do_exit() and exit_mm() sets current->mm = NULL followed by: -> mmput() -> exit_mmap() -> tlb_finish_mmu() -> tlb_flush_mmu() -> tlb_flush_mmu_tlbonly() -> tlb_flush() -> flush_tlb_mm_range() -> __flush_tlb_up() -> __flush_tlb() -> __native_flush_tlb() At this point current->mm is NULL but current->active_mm still points to the "old" mm. Let's preempt taskA _after_ native_read_cr3() by taskB. TaskB has its own mm so CR3 has changed. Now preempt back to taskA. TaskA has no ->mm set so it borrows taskB's mm and so CR3 remains unchanged. Once taskA gets active it continues where it was interrupted and that means it writes its old CR3 value back. Everything is fine because userland won't need its memory anymore. Now the fun part: Let's preempt taskA one more time and get back to taskB. This time switch_mm() won't do a thing because oldmm (->active_mm) is the same as mm (as per context_switch()). So we remain with a bad CR3 / PGD and return to userland. The next thing that happens is handle_mm_fault() with an address for the execution of its code in userland. handle_mm_fault() realizes that it has a PTE with proper rights so it returns doing nothing. But the CPU looks at the wrong PGD and insists that something is wrong and faults again. And again. And one more time… This pagefault circle continues until the scheduler gets tired of it and puts another task on the CPU. It gets little difficult if the task is a RT task with a high priority. The system will either freeze or it gets fixed by the software watchdog thread which usually runs at RT-max prio. But waiting for the watchdog will increase the latency of the RT task which is no good. Fix this by disabling preemption across the critical code section. Signed-off-by: Sebastian Andrzej Siewior Acked-by: Peter Zijlstra (Intel) Acked-by: Rik van Riel Acked-by: Andy Lutomirski Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Zijlstra Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1470404259-26290-1-git-send-email-bigeasy@linutronix.de [ Prettified the changelog. ] Signed-off-by: Ingo Molnar Cc: Bernhard Kaindl Signed-off-by: Greg Kroah-Hartman