commit 670d6552eda8ff0c5f396d3d6f0174237917c66c Author: Greg Kroah-Hartman Date: Wed Mar 24 11:05:06 2021 +0100 Linux 4.14.227 Tested-by: Jon Hunter Tested-by: Guenter Roeck Tested-by: Jason Self Tested-by: Linux Kernel Functional Testing Tested-by: Hulk Robot Link: https://lore.kernel.org/r/20210322121920.053255560@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman commit 2d5025afb05b0bddaec0fea8ac12d3fb3c3d9c74 Author: Thomas Gleixner Date: Wed Mar 17 15:38:52 2021 +0100 genirq: Disable interrupts for force threaded handlers commit 81e2073c175b887398e5bca6c004efa89983f58d upstream. With interrupt force threading all device interrupt handlers are invoked from kernel threads. Contrary to hard interrupt context the invocation only disables bottom halfs, but not interrupts. This was an oversight back then because any code like this will have an issue: thread(irq_A) irq_handler(A) spin_lock(&foo->lock); interrupt(irq_B) irq_handler(B) spin_lock(&foo->lock); This has been triggered with networking (NAPI vs. hrtimers) and console drivers where printk() happens from an interrupt which interrupted the force threaded handler. Now people noticed and started to change the spin_lock() in the handler to spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the interrupt request which in turn breaks RT. Fix the root cause and not the symptom and disable interrupts before invoking the force threaded handler which preserves the regular semantics and the usefulness of the interrupt force threading as a general debugging tool. For not RT this is not changing much, except that during the execution of the threaded handler interrupts are delayed until the handler returns. Vs. scheduling and softirq processing there is no difference. For RT kernels there is no issue. Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading") Reported-by: Johan Hovold Signed-off-by: Thomas Gleixner Reviewed-by: Johan Hovold Acked-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de Signed-off-by: Greg Kroah-Hartman commit fe778c33aaa9c6ca0d9e90c16d07dd01cd3d8785 Author: Shijie Luo Date: Fri Mar 12 01:50:51 2021 -0500 ext4: fix potential error in ext4_do_update_inode commit 7d8bd3c76da1d94b85e6c9b7007e20e980bfcfe6 upstream. If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(), the error code will be overridden, go to out_brelse to avoid this situation. Signed-off-by: Shijie Luo Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com Cc: stable@kernel.org Reviewed-by: Jan Kara Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 470f69cb3742e4c774ff2c89f6dbc691ba702926 Author: zhangyi (F) Date: Fri Mar 5 20:05:08 2021 +0800 ext4: do not try to set xattr into ea_inode if value is empty commit 6b22489911b726eebbf169caee52fea52013fbdd upstream. Syzbot report a warning that ext4 may create an empty ea_inode if set an empty extent attribute to a file on the file system which is no free blocks left. WARNING: CPU: 6 PID: 10667 at fs/ext4/xattr.c:1640 ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640 ... Call trace: ext4_xattr_set_entry+0x10f8/0x1114 fs/ext4/xattr.c:1640 ext4_xattr_block_set+0x1d0/0x1b1c fs/ext4/xattr.c:1942 ext4_xattr_set_handle+0x8a0/0xf1c fs/ext4/xattr.c:2390 ext4_xattr_set+0x120/0x1f0 fs/ext4/xattr.c:2491 ext4_xattr_trusted_set+0x48/0x5c fs/ext4/xattr_trusted.c:37 __vfs_setxattr+0x208/0x23c fs/xattr.c:177 ... Now, ext4 try to store extent attribute into an external inode if ext4_xattr_block_set() return -ENOSPC, but for the case of store an empty extent attribute, store the extent entry into the extent attribute block is enough. A simple reproduce below. fallocate test.img -l 1M mkfs.ext4 -F -b 2048 -O ea_inode test.img mount test.img /mnt dd if=/dev/zero of=/mnt/foo bs=2048 count=500 setfattr -n "user.test" /mnt/foo Reported-by: syzbot+98b881fdd8ebf45ab4ae@syzkaller.appspotmail.com Fixes: 9c6e7853c531 ("ext4: reserve space for xattr entries/names") Cc: stable@kernel.org Signed-off-by: zhangyi (F) Link: https://lore.kernel.org/r/20210305120508.298465-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 1b46d1d9d02e529f6f2856795d8bae1669253a39 Author: zhangyi (F) Date: Wed Mar 3 21:17:02 2021 +0800 ext4: find old entry again if failed to rename whiteout commit b7ff91fd030dc9d72ed91b1aab36e445a003af4f upstream. If we failed to add new entry on rename whiteout, we cannot reset the old->de entry directly, because the old->de could have moved from under us during make indexed dir. So find the old entry again before reset is needed, otherwise it may corrupt the filesystem as below. /dev/sda: Entry '00000001' in ??? (12) has deleted/unused inode 15. CLEARED. /dev/sda: Unattached inode 75 /dev/sda: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY. Fixes: 6b4b8e6b4ad ("ext4: fix bug for rename with RENAME_WHITEOUT") Cc: stable@vger.kernel.org Signed-off-by: zhangyi (F) Link: https://lore.kernel.org/r/20210303131703.330415-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit c6159407348eb567958a87123a9a68d48228bf42 Author: Oleg Nesterov Date: Mon Feb 1 18:47:09 2021 +0100 x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall() commit 8c150ba2fb5995c84a7a43848250d444a3329a7d upstream. The comment in get_nr_restart_syscall() says: * The problem is that we can get here when ptrace pokes * syscall-like values into regs even if we're not in a syscall * at all. Yes, but if not in a syscall then the status & (TS_COMPAT|TS_I386_REGS_POKED) check below can't really help: - TS_COMPAT can't be set - TS_I386_REGS_POKED is only set if regs->orig_ax was changed by 32bit debugger; and even in this case get_nr_restart_syscall() is only correct if the tracee is 32bit too. Suppose that a 64bit debugger plays with a 32bit tracee and * Tracee calls sleep(2) // TS_COMPAT is set * User interrupts the tracee by CTRL-C after 1 sec and does "(gdb) call func()" * gdb saves the regs by PTRACE_GETREGS * does PTRACE_SETREGS to set %rip='func' and %orig_rax=-1 * PTRACE_CONT // TS_COMPAT is cleared * func() hits int3. * Debugger catches SIGTRAP. * Restore original regs by PTRACE_SETREGS. * PTRACE_CONT get_nr_restart_syscall() wrongly returns __NR_restart_syscall==219, the tracee calls ia32_sys_call_table[219] == sys_madvise. Add the sticky TS_COMPAT_RESTART flag which survives after return to user mode. It's going to be removed in the next step again by storing the information in the restart block. As a further cleanup it might be possible to remove also TS_I386_REGS_POKED with that. Test-case: $ cvs -d :pserver:anoncvs:anoncvs@sourceware.org:/cvs/systemtap co ptrace-tests $ gcc -o erestartsys-trap-debuggee ptrace-tests/tests/erestartsys-trap-debuggee.c --m32 $ gcc -o erestartsys-trap-debugger ptrace-tests/tests/erestartsys-trap-debugger.c -lutil $ ./erestartsys-trap-debugger Unexpected: retval 1, errno 22 erestartsys-trap-debugger: ptrace-tests/tests/erestartsys-trap-debugger.c:421 Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Reported-by: Jan Kratochvil Signed-off-by: Oleg Nesterov Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210201174709.GA17895@redhat.com Signed-off-by: Greg Kroah-Hartman commit 400e3df46a76dd8f301677daddf4c761b9248ff7 Author: Oleg Nesterov Date: Mon Feb 1 18:46:49 2021 +0100 x86: Move TS_COMPAT back to asm/thread_info.h commit 66c1b6d74cd7035e85c426f0af4aede19e805c8a upstream. Move TS_COMPAT back to asm/thread_info.h, close to TS_I386_REGS_POKED. It was moved to asm/processor.h by b9d989c7218a ("x86/asm: Move the thread_info::status field to thread_struct"), then later 37a8f7c38339 ("x86/asm: Move 'status' from thread_struct to thread_info") moved the 'status' field back but TS_COMPAT was forgotten. Preparatory patch to fix the COMPAT case for get_nr_restart_syscall() Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Signed-off-by: Oleg Nesterov Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210201174649.GA17880@redhat.com Signed-off-by: Greg Kroah-Hartman commit 591d6c21e0d2acb01a35b39061642f21cf6b7254 Author: Oleg Nesterov Date: Mon Feb 1 18:46:41 2021 +0100 kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data() commit 5abbe51a526253b9f003e9a0a195638dc882d660 upstream. Preparation for fixing get_nr_restart_syscall() on X86 for COMPAT. Add a new helper which sets restart_block->fn and calls a dummy arch_set_restart_data() helper. Fixes: 609c19a385c8 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Signed-off-by: Oleg Nesterov Signed-off-by: Thomas Gleixner Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210201174641.GA17871@redhat.com Signed-off-by: Greg Kroah-Hartman commit d20c7a7b2490f98aad0c5a97910d32978e7845c5 Author: Thomas Gleixner Date: Thu Mar 18 20:26:47 2021 +0100 x86/ioapic: Ignore IRQ2 again commit a501b048a95b79e1e34f03cac3c87ff1e9f229ad upstream. Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where the matrix allocator claimed to be out of vectors. He analyzed it down to the point that IRQ2, the PIC cascade interrupt, which is supposed to be not ever routed to the IO/APIC ended up having an interrupt vector assigned which got moved during unplug of CPU0. The underlying issue is that IRQ2 for various reasons (see commit af174783b925 ("x86: I/O APIC: Never configure IRQ2" for details) is treated as a reserved system vector by the vector core code and is not accounted as a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2 which causes the IO/APIC setup to claim that interrupt which is granted by the vector domain because there is no sanity check. As a consequence the allocation counter of CPU0 underflows which causes a subsequent unplug to fail with: [ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU There is another sanity check missing in the matrix allocator, but the underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic during the conversion to irqdomains. For almost 6 years nobody complained about this wreckage, which might indicate that this requirement could be lifted, but for any system which actually has a PIC IRQ2 is unusable by design so any routing entry has no effect and the interrupt cannot be connected to a device anyway. Due to that and due to history biased paranoia reasons restore the IRQ2 ignore logic and treat it as non existent despite a routing entry claiming otherwise. Fixes: d32932d02e18 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Reported-by: Vitaly Kuznetsov Signed-off-by: Thomas Gleixner Tested-by: Vitaly Kuznetsov Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de Signed-off-by: Greg Kroah-Hartman commit 403fdabcc1bcd0d31f9fcb9b9b2e831214ab2192 Author: Kan Liang Date: Fri Mar 12 05:21:37 2021 -0800 perf/x86/intel: Fix a crash caused by zero PEBS status commit d88d05a9e0b6d9356e97129d4ff9942d765f46ea upstream. A repeatable crash can be triggered by the perf_fuzzer on some Haswell system. https://lore.kernel.org/lkml/7170d3b-c17f-1ded-52aa-cc6d9ae999f4@maine.edu/ For some old CPUs (HSW and earlier), the PEBS status in a PEBS record may be mistakenly set to 0. To minimize the impact of the defect, the commit was introduced to try to avoid dropping the PEBS record for some cases. It adds a check in the intel_pmu_drain_pebs_nhm(), and updates the local pebs_status accordingly. However, it doesn't correct the PEBS status in the PEBS record, which may trigger the crash, especially for the large PEBS. It's possible that all the PEBS records in a large PEBS have the PEBS status 0. If so, the first get_next_pebs_record_by_bit() in the __intel_pmu_pebs_event() returns NULL. The at = NULL. Since it's a large PEBS, the 'count' parameter must > 1. The second get_next_pebs_record_by_bit() will crash. Besides the local pebs_status, correct the PEBS status in the PEBS record as well. Fixes: 01330d7288e0 ("perf/x86: Allow zero PEBS status with only single active event") Reported-by: Vince Weaver Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1615555298-140216-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman commit 6d7724c9c507d5b526991dcdef861c6b28c45eb2 Author: Tyrel Datwyler Date: Mon Mar 15 15:48:21 2021 -0600 PCI: rpadlpar: Fix potential drc_name corruption in store functions commit cc7a0bb058b85ea03db87169c60c7cfdd5d34678 upstream. Both add_slot_store() and remove_slot_store() try to fix up the drc_name copied from the store buffer by placing a NUL terminator at nbyte + 1 or in place of a '\n' if present. However, the static buffer that we copy the drc_name data into is not zeroed and can contain anything past the n-th byte. This is problematic if a '\n' byte appears in that buffer after nbytes and the string copied into the store buffer was not NUL terminated to start with as the strchr() search for a '\n' byte will mark this incorrectly as the end of the drc_name string resulting in a drc_name string that contains garbage data after the n-th byte. Additionally it will cause us to overwrite that '\n' byte on the stack with NUL, potentially corrupting data on the stack. The following debugging shows an example of the drmgr utility writing "PHB 4543" to the add_slot sysfs attribute, but add_slot_store() logging a corrupted string value. drmgr: drmgr: -c phb -a -s PHB 4543 -d 1 add_slot_store: drc_name = PHB 4543°|<82>!, rc = -19 Fix this by using strscpy() instead of memcpy() to ensure the string is NUL terminated when copied into the static drc_name buffer. Further, since the string is now NUL terminated the code only needs to change '\n' to '\0' when present. Cc: stable@vger.kernel.org Signed-off-by: Tyrel Datwyler [mpe: Reformat change log and add mention of possible stack corruption] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210315214821.452959-1-tyreld@linux.ibm.com Signed-off-by: Greg Kroah-Hartman commit 521f802e20bdfe0e76ef68042ef9d22f61bc39f2 Author: Ye Xiang Date: Wed Mar 3 14:36:14 2021 +0800 iio: hid-sensor-temperature: Fix issues of timestamp channel commit 141e7633aa4d2838d1f6ad5c74cccc53547c16ac upstream. This patch fixes 2 issues of timestamp channel: 1. This patch ensures that there is sufficient space and correct alignment for the timestamp. 2. Correct the timestamp channel scan index. Fixes: 59d0f2da3569 ("iio: hid: Add temperature sensor support") Signed-off-by: Ye Xiang Cc: Link: https://lore.kernel.org/r/20210303063615.12130-4-xiang.ye@intel.com Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 4a418afad258f348e8fcc31f8c463e47f4391adb Author: Ye Xiang Date: Sat Jan 30 18:25:30 2021 +0800 iio: hid-sensor-prox: Fix scale not correct issue commit d68c592e02f6f49a88e705f13dfc1883432cf300 upstream. Currently, the proxy sensor scale is zero because it just return the exponent directly. To fix this issue, this patch use hid_sensor_format_scale to process the scale first then return the output. Fixes: 39a3a0138f61 ("iio: hid-sensors: Added Proximity Sensor Driver") Signed-off-by: Ye Xiang Link: https://lore.kernel.org/r/20210130102530.31064-1-xiang.ye@intel.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 8e97b9cae1e64b36707acc8bd4bc96f26f940b21 Author: Ye Xiang Date: Wed Mar 3 14:36:12 2021 +0800 iio: hid-sensor-humidity: Fix alignment issue of timestamp channel commit 37e89e574dc238a4ebe439543c5ab4fbb2f0311b upstream. This patch ensures that, there is sufficient space and correct alignment for the timestamp. Fixes: d7ed89d5aadf ("iio: hid: Add humidity sensor support") Signed-off-by: Ye Xiang Cc: Link: https://lore.kernel.org/r/20210303063615.12130-2-xiang.ye@intel.com Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 7ffb3fa5b774a8931a86b7ee047387329eaf95d9 Author: Dinghao Liu Date: Mon Mar 1 16:04:21 2021 +0800 iio: gyro: mpu3050: Fix error handling in mpu3050_trigger_handler commit 6dbbbe4cfd398704b72b21c1d4a5d3807e909d60 upstream. There is one regmap_bulk_read() call in mpu3050_trigger_handler that we have caught its return value bug lack further handling. Check and terminate the execution flow just like the other three regmap_bulk_read() calls in this function. Fixes: 3904b28efb2c7 ("iio: gyro: Add driver for the MPU-3050 gyroscope") Signed-off-by: Dinghao Liu Reviewed-by: Linus Walleij Link: https://lore.kernel.org/r/20210301080421.13436-1-dinghao.liu@zju.edu.cn Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 9a429caa80b044915df720b36ee92c4416911513 Author: Dan Carpenter Date: Tue Feb 16 22:42:13 2021 +0300 iio: adis16400: Fix an error code in adis16400_initial_setup() commit a71266e454b5df10d019b06f5ebacd579f76be28 upstream. This is to silence a new Smatch warning: drivers/iio/imu/adis16400.c:492 adis16400_initial_setup() warn: sscanf doesn't return error codes If the condition "if (st->variant->flags & ADIS16400_HAS_SLOW_MODE) {" is false then we return 1 instead of returning 0 and probe will fail. Fixes: 72a868b38bdd ("iio: imu: check sscanf return value") Signed-off-by: Dan Carpenter Cc: Link: https://lore.kernel.org/r/YCwgFb3JVG6qrlQ+@mwanda Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit cfdaec38d6045def9a4bc3a0cf96d0f089871361 Author: Jonathan Albrieux Date: Wed Jan 13 16:18:07 2021 +0100 iio:adc:qcom-spmi-vadc: add default scale to LR_MUX2_BAT_ID channel commit 7d200b283aa049fcda0d43dd6e03e9e783d2799c upstream. Checking at both msm8909-pm8916.dtsi and msm8916.dtsi from downstream it is indicated that "batt_id" channel has to be scaled with the default function: chan@31 { label = "batt_id"; reg = <0x31>; qcom,decimation = <0>; qcom,pre-div-channel-scaling = <0>; qcom,calibration-type = "ratiometric"; qcom,scale-function = <0>; qcom,hw-settle-time = <0xb>; qcom,fast-avg-setup = <0>; }; Change LR_MUX2_BAT_ID scaling accordingly. Signed-off-by: Jonathan Albrieux Acked-by: Bjorn Andersson Fixes: 7c271eea7b8a ("iio: adc: spmi-vadc: Changes to support different scaling") Link: https://lore.kernel.org/r/20210113151808.4628-2-jonathan.albrieux@gmail.com Cc: Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman commit 5171d595e362e934807b3e4dc98827458d91d330 Author: Jonathan Cameron Date: Sun Jan 24 19:50:34 2021 +0000 iio:adc:stm32-adc: Add HAS_IOMEM dependency commit 121875b28e3bd7519a675bf8ea2c2e793452c2bd upstream. Seems that there are config combinations in which this driver gets enabled and hence selects the MFD, but with out HAS_IOMEM getting pulled in via some other route. MFD is entirely contained in an if HAS_IOMEM block, leading to the build issue in this bugzilla. https://bugzilla.kernel.org/show_bug.cgi?id=209889 Cc: Signed-off-by: Jonathan Cameron Link: https://lore.kernel.org/r/20210124195034.22576-1-jic23@kernel.org Signed-off-by: Greg Kroah-Hartman commit 27b028b9c373815f43a59409159da4500aec2a3a Author: Jim Lin Date: Thu Mar 11 14:42:41 2021 +0800 usb: gadget: configfs: Fix KASAN use-after-free commit 98f153a10da403ddd5e9d98a3c8c2bb54bb5a0b6 upstream. When gadget is disconnected, running sequence is like this. . composite_disconnect . Call trace: usb_string_copy+0xd0/0x128 gadget_config_name_configuration_store+0x4 gadget_config_name_attr_store+0x40/0x50 configfs_write_file+0x198/0x1f4 vfs_write+0x100/0x220 SyS_write+0x58/0xa8 . configfs_composite_unbind . configfs_composite_bind In configfs_composite_bind, it has "cn->strings.s = cn->configuration;" When usb_string_copy is invoked. it would allocate memory, copy input string, release previous pointed memory space, and use new allocated memory. When gadget is connected, host sends down request to get information. Call trace: usb_gadget_get_string+0xec/0x168 lookup_string+0x64/0x98 composite_setup+0xa34/0x1ee8 If gadget is disconnected and connected quickly, in the failed case, cn->configuration memory has been released by usb_string_copy kfree but configfs_composite_bind hasn't been run in time to assign new allocated "cn->configuration" pointer to "cn->strings.s". When "strlen(s->s) of usb_gadget_get_string is being executed, the dangling memory is accessed, "BUG: KASAN: use-after-free" error occurs. Cc: stable@vger.kernel.org Signed-off-by: Jim Lin Signed-off-by: Macpaul Lin Link: https://lore.kernel.org/r/1615444961-13376-1-git-send-email-macpaul.lin@mediatek.com Signed-off-by: Greg Kroah-Hartman commit e93575764f70a5e59813d02a8f7cf7e1497d8185 Author: Macpaul Lin Date: Thu Jun 18 17:13:38 2020 +0800 USB: replace hardcode maximum usb string length by definition commit 81c7462883b0cc0a4eeef0687f80ad5b5baee5f6 upstream. Replace hardcoded maximum USB string length (126 bytes) by definition "USB_MAX_STRING_LEN". Signed-off-by: Macpaul Lin Acked-by: Alan Stern Link: https://lore.kernel.org/r/1592471618-29428-1-git-send-email-macpaul.lin@mediatek.com Signed-off-by: Greg Kroah-Hartman commit 8ca5de14229b084c5eee88fad56e894c6c1bd059 Author: Alan Stern Date: Wed Mar 17 15:06:54 2021 -0400 usb-storage: Add quirk to defeat Kindle's automatic unload commit 546aa0e4ea6ed81b6c51baeebc4364542fa3f3a7 upstream. Matthias reports that the Amazon Kindle automatically removes its emulated media if it doesn't receive another SCSI command within about one second after a SYNCHRONIZE CACHE. It does so even when the host has sent a PREVENT MEDIUM REMOVAL command. The reason for this behavior isn't clear, although it's not hard to make some guesses. At any rate, the results can be unexpected for anyone who tries to access the Kindle in an unusual fashion, and in theory they can lead to data loss (for example, if one file is closed and synchronized while other files are still in the middle of being written). To avoid such problems, this patch creates a new usb-storage quirks flag telling the driver always to issue a REQUEST SENSE following a SYNCHRONIZE CACHE command, and adds an unusual_devs entry for the Kindle with the flag set. This is sufficient to prevent the Kindle from doing its automatic unload, without interfering with proper operation. Another possible way to deal with this would be to increase the frequency of TEST UNIT READY polling that the kernel normally carries out for removable-media storage devices. However that would increase the overall load on the system and it is not as reliable, because the user can override the polling interval. Changing the driver's behavior is safer and has minimal overhead. CC: Reported-and-tested-by: Matthias Schwarzott Signed-off-by: Alan Stern Link: https://lore.kernel.org/r/20210317190654.GA497856@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman commit 2fd9191081769137d08b15e423cfbd2c3859c991 Author: Sagi Grimberg Date: Mon Mar 15 14:04:27 2021 -0700 nvme-rdma: fix possible hang when failing to set io queues [ Upstream commit c4c6df5fc84659690d4391d1fba155cd94185295 ] We only setup io queues for nvme controllers, and it makes absolutely no sense to allow a controller (re)connect without any I/O queues. If we happen to fail setting the queue count for any reason, we should not allow this to be a successful reconnect as I/O has no chance in going through. Instead just fail and schedule another reconnect. Reported-by: Chao Leng Fixes: 711023071960 ("nvme-rdma: add a NVMe over Fabrics RDMA host driver") Signed-off-by: Sagi Grimberg Reviewed-by: Chao Leng Signed-off-by: Christoph Hellwig Signed-off-by: Sasha Levin commit b6b7158beb0f21d79707d43238a31c109808d0e2 Author: Dan Carpenter Date: Fri Mar 12 10:42:11 2021 +0300 scsi: lpfc: Fix some error codes in debugfs commit 19f1bc7edf0f97186810e13a88f5b62069d89097 upstream. If copy_from_user() or kstrtoull() fail then the correct behavior is to return a negative error code. Link: https://lore.kernel.org/r/YEsbU/UxYypVrC7/@mwanda Fixes: f9bb2da11db8 ("[SCSI] lpfc 8.3.27: T10 additions for SLI4") Signed-off-by: Dan Carpenter Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman commit 84cee271cfdbea847c184a78cf82f57d225ce76e Author: Pavel Skripkin Date: Mon Mar 1 02:22:40 2021 +0300 net/qrtr: fix __netdev_alloc_skb call commit 093b036aa94e01a0bea31a38d7f0ee28a2749023 upstream. syzbot found WARNING in __alloc_pages_nodemask()[1] when order >= MAX_ORDER. It was caused by a huge length value passed from userspace to qrtr_tun_write_iter(), which tries to allocate skb. Since the value comes from the untrusted source there is no need to raise a warning in __alloc_pages_nodemask(). [1] WARNING in __alloc_pages_nodemask+0x5f8/0x730 mm/page_alloc.c:5014 Call Trace: __alloc_pages include/linux/gfp.h:511 [inline] __alloc_pages_node include/linux/gfp.h:524 [inline] alloc_pages_node include/linux/gfp.h:538 [inline] kmalloc_large_node+0x60/0x110 mm/slub.c:3999 __kmalloc_node_track_caller+0x319/0x3f0 mm/slub.c:4496 __kmalloc_reserve net/core/skbuff.c:150 [inline] __alloc_skb+0x4e4/0x5a0 net/core/skbuff.c:210 __netdev_alloc_skb+0x70/0x400 net/core/skbuff.c:446 netdev_alloc_skb include/linux/skbuff.h:2832 [inline] qrtr_endpoint_post+0x84/0x11b0 net/qrtr/qrtr.c:442 qrtr_tun_write_iter+0x11f/0x1a0 net/qrtr/tun.c:98 call_write_iter include/linux/fs.h:1901 [inline] new_sync_write+0x426/0x650 fs/read_write.c:518 vfs_write+0x791/0xa30 fs/read_write.c:605 ksys_write+0x12d/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Reported-by: syzbot+80dccaee7c6630fa9dcf@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin Acked-by: Alexander Lobakin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman commit a1507d7f825dd45758f569c340a3b2fc7bfb5220 Author: Daniel Kobras Date: Sat Feb 27 00:04:37 2021 +0100 sunrpc: fix refcount leak for rpc auth modules commit f1442d6349a2e7bb7a6134791bdc26cb776c79af upstream. If an auth module's accept op returns SVC_CLOSE, svc_process_common() enters a call path that does not call svc_authorise() before leaving the function, and thus leaks a reference on the auth module's refcount. Hence, make sure calls to svc_authenticate() and svc_authorise() are paired for all call paths, to make sure rpc auth modules can be unloaded. Signed-off-by: Daniel Kobras Fixes: 4d712ef1db05 ("svcauth_gss: Close connection when dropping an incoming message") Link: https://lore.kernel.org/linux-nfs/3F1B347F-B809-478F-A1E9-0BE98E22B0F0@oracle.com/T/#t Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit 28e89394dde121a05ef2139d85eaf4fe54018a96 Author: Timo Rothenpieler Date: Tue Feb 23 00:36:19 2021 +0100 svcrdma: disable timeouts on rdma backchannel commit 6820bf77864d5894ff67b5c00d7dba8f92011e3d upstream. This brings it in line with the regular tcp backchannel, which also has all those timeouts disabled. Prevents the backchannel from timing out, getting some async operations like server side copying getting stuck indefinitely on the client side. Signed-off-by: Timo Rothenpieler Fixes: 5d252f90a800 ("svcrdma: Add class for RDMA backwards direction transport") Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit 39e2d3ebcabf60b8a694281fa306899bdb77612b Author: Joe Korty Date: Fri Feb 26 09:38:20 2021 -0500 NFSD: Repair misuse of sv_lock in 5.10.16-rt30. commit c7de87ff9dac5f396f62d584f3908f80ddc0e07b upstream. [ This problem is in mainline, but only rt has the chops to be able to detect it. ] Lockdep reports a circular lock dependency between serv->sv_lock and softirq_ctl.lock on system shutdown, when using a kernel built with CONFIG_PREEMPT_RT=y, and a nfs mount exists. This is due to the definition of spin_lock_bh on rt: local_bh_disable(); rt_spin_lock(lock); which forces a softirq_ctl.lock -> serv->sv_lock dependency. This is not a problem as long as _every_ lock of serv->sv_lock is a: spin_lock_bh(&serv->sv_lock); but there is one of the form: spin_lock(&serv->sv_lock); This is what is causing the circular dependency splat. The spin_lock() grabs the lock without first grabbing softirq_ctl.lock via local_bh_disable. If later on in the critical region, someone does a local_bh_disable, we get a serv->sv_lock -> softirq_ctrl.lock dependency established. Deadlock. Fix is to make serv->sv_lock be locked with spin_lock_bh everywhere, no exceptions. [ OK ] Stopped target NFS client services. Stopping Logout off all iSCSI sessions on shutdown... Stopping NFS server and services... [ 109.442380] [ 109.442385] ====================================================== [ 109.442386] WARNING: possible circular locking dependency detected [ 109.442387] 5.10.16-rt30 #1 Not tainted [ 109.442389] ------------------------------------------------------ [ 109.442390] nfsd/1032 is trying to acquire lock: [ 109.442392] ffff994237617f60 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xd9/0x270 [ 109.442405] [ 109.442405] but task is already holding lock: [ 109.442406] ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442415] [ 109.442415] which lock already depends on the new lock. [ 109.442415] [ 109.442416] [ 109.442416] the existing dependency chain (in reverse order) is: [ 109.442417] [ 109.442417] -> #1 (&serv->sv_lock){+.+.}-{0:0}: [ 109.442421] rt_spin_lock+0x2b/0xc0 [ 109.442428] svc_add_new_perm_xprt+0x42/0xa0 [ 109.442430] svc_addsock+0x135/0x220 [ 109.442434] write_ports+0x4b3/0x620 [ 109.442438] nfsctl_transaction_write+0x45/0x80 [ 109.442440] vfs_write+0xff/0x420 [ 109.442444] ksys_write+0x4f/0xc0 [ 109.442446] do_syscall_64+0x33/0x40 [ 109.442450] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 109.442454] [ 109.442454] -> #0 ((softirq_ctrl.lock).lock){+.+.}-{2:2}: [ 109.442457] __lock_acquire+0x1264/0x20b0 [ 109.442463] lock_acquire+0xc2/0x400 [ 109.442466] rt_spin_lock+0x2b/0xc0 [ 109.442469] __local_bh_disable_ip+0xd9/0x270 [ 109.442471] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442474] svc_close_list+0x60/0x90 [ 109.442476] svc_close_net+0x49/0x1a0 [ 109.442478] svc_shutdown_net+0x12/0x40 [ 109.442480] nfsd_destroy+0xc5/0x180 [ 109.442482] nfsd+0x1bc/0x270 [ 109.442483] kthread+0x194/0x1b0 [ 109.442487] ret_from_fork+0x22/0x30 [ 109.442492] [ 109.442492] other info that might help us debug this: [ 109.442492] [ 109.442493] Possible unsafe locking scenario: [ 109.442493] [ 109.442493] CPU0 CPU1 [ 109.442494] ---- ---- [ 109.442495] lock(&serv->sv_lock); [ 109.442496] lock((softirq_ctrl.lock).lock); [ 109.442498] lock(&serv->sv_lock); [ 109.442499] lock((softirq_ctrl.lock).lock); [ 109.442501] [ 109.442501] *** DEADLOCK *** [ 109.442501] [ 109.442501] 3 locks held by nfsd/1032: [ 109.442503] #0: ffffffff93b49258 (nfsd_mutex){+.+.}-{3:3}, at: nfsd+0x19a/0x270 [ 109.442508] #1: ffff994245cb00b0 (&serv->sv_lock){+.+.}-{0:0}, at: svc_close_list+0x1f/0x90 [ 109.442512] #2: ffffffff93a81b20 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xc0 [ 109.442518] [ 109.442518] stack backtrace: [ 109.442519] CPU: 0 PID: 1032 Comm: nfsd Not tainted 5.10.16-rt30 #1 [ 109.442522] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.2 09/22/2015 [ 109.442524] Call Trace: [ 109.442527] dump_stack+0x77/0x97 [ 109.442533] check_noncircular+0xdc/0xf0 [ 109.442546] __lock_acquire+0x1264/0x20b0 [ 109.442553] lock_acquire+0xc2/0x400 [ 109.442564] rt_spin_lock+0x2b/0xc0 [ 109.442570] __local_bh_disable_ip+0xd9/0x270 [ 109.442573] svc_xprt_do_enqueue+0xc0/0x4d0 [ 109.442577] svc_close_list+0x60/0x90 [ 109.442581] svc_close_net+0x49/0x1a0 [ 109.442585] svc_shutdown_net+0x12/0x40 [ 109.442588] nfsd_destroy+0xc5/0x180 [ 109.442590] nfsd+0x1bc/0x270 [ 109.442595] kthread+0x194/0x1b0 [ 109.442600] ret_from_fork+0x22/0x30 [ 109.518225] nfsd: last server has exited, flushing export cache [ OK ] Stopped NFSv4 ID-name mapping service. [ OK ] Stopped GSSAPI Proxy Daemon. [ OK ] Stopped NFS Mount Daemon. [ OK ] Stopped NFS status monitor for NFSv2/3 locking.. Fixes: 719f8bcc883e ("svcrpc: fix xpt_list traversal locking on shutdown") Signed-off-by: Joe Korty Signed-off-by: Chuck Lever Signed-off-by: Greg Kroah-Hartman commit 09537286477e32b47bd920d810f8056989bd04da Author: Sagi Grimberg Date: Mon Mar 15 15:34:51 2021 -0700 nvmet: don't check iosqes,iocqes for discovery controllers commit d218a8a3003e84ab136e69a4e30dd4ec7dab2d22 upstream. From the base spec, Figure 78: "Controller Configuration, these fields are defined as parameters to configure an "I/O Controller (IOC)" and not to configure a "Discovery Controller (DC). ... If the controller does not support I/O queues, then this field shall be read-only with a value of 0h Just perform this check for I/O controllers. Fixes: a07b4970f464 ("nvmet: add a generic NVMe target") Reported-by: Belanger, Martin Signed-off-by: Sagi Grimberg Reviewed-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig Signed-off-by: Greg Kroah-Hartman commit 0fbf41006d8c850963049c35563e7775fe7c2164 Author: Filipe Manana Date: Thu Mar 11 14:31:05 2021 +0000 btrfs: fix race when cloning extent buffer during rewind of an old root commit dbcc7d57bffc0c8cac9dac11bec548597d59a6a5 upstream. While resolving backreferences, as part of a logical ino ioctl call or fiemap, we can end up hitting a BUG_ON() when replaying tree mod log operations of a root, triggering a stack trace like the following: ------------[ cut here ]------------ kernel BUG at fs/btrfs/ctree.c:1210! invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 19054 Comm: crawl_335 Tainted: G W 5.11.0-2d11c0084b02-misc-next+ #89 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 RIP: 0010:__tree_mod_log_rewind+0x3b1/0x3c0 Code: 05 48 8d 74 10 (...) RSP: 0018:ffffc90001eb70b8 EFLAGS: 00010297 RAX: 0000000000000000 RBX: ffff88812344e400 RCX: ffffffffb28933b6 RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff88812344e42c RBP: ffffc90001eb7108 R08: 1ffff11020b60a20 R09: ffffed1020b60a20 R10: ffff888105b050f9 R11: ffffed1020b60a1f R12: 00000000000000ee R13: ffff8880195520c0 R14: ffff8881bc958500 R15: ffff88812344e42c FS: 00007fd1955e8700(0000) GS:ffff8881f5600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007efdb7928718 CR3: 000000010103a006 CR4: 0000000000170ee0 Call Trace: btrfs_search_old_slot+0x265/0x10d0 ? lock_acquired+0xbb/0x600 ? btrfs_search_slot+0x1090/0x1090 ? free_extent_buffer.part.61+0xd7/0x140 ? free_extent_buffer+0x13/0x20 resolve_indirect_refs+0x3e9/0xfc0 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? add_prelim_ref.part.11+0x150/0x150 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? do_raw_spin_unlock+0xa8/0x140 ? rb_insert_color+0x30/0x360 ? prelim_ref_insert+0x12d/0x430 find_parent_nodes+0x5c3/0x1830 ? resolve_indirect_refs+0xfc0/0xfc0 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x160/0x210 ? lock_release+0xc8/0x620 ? fs_reclaim_acquire+0x67/0xf0 ? lock_acquire+0xc7/0x510 ? poison_range+0x38/0x40 ? unpoison_range+0x14/0x40 ? trace_hardirqs_on+0x55/0x120 btrfs_find_all_roots_safe+0x142/0x1e0 ? find_parent_nodes+0x1830/0x1830 ? btrfs_inode_flags_to_xflags+0x50/0x50 iterate_extent_inodes+0x20e/0x580 ? tree_backref_for_extent+0x230/0x230 ? lock_downgrade+0x3d0/0x3d0 ? read_extent_buffer+0xdd/0x110 ? lock_downgrade+0x3d0/0x3d0 ? __kasan_check_read+0x11/0x20 ? lock_acquired+0xbb/0x600 ? __kasan_check_write+0x14/0x20 ? _raw_spin_unlock+0x22/0x30 ? __kasan_check_write+0x14/0x20 iterate_inodes_from_logical+0x129/0x170 ? iterate_inodes_from_logical+0x129/0x170 ? btrfs_inode_flags_to_xflags+0x50/0x50 ? iterate_extent_inodes+0x580/0x580 ? __vmalloc_node+0x92/0xb0 ? init_data_container+0x34/0xb0 ? init_data_container+0x34/0xb0 ? kvmalloc_node+0x60/0x80 btrfs_ioctl_logical_to_ino+0x158/0x230 btrfs_ioctl+0x205e/0x4040 ? __might_sleep+0x71/0xe0 ? btrfs_ioctl_get_supported_features+0x30/0x30 ? getrusage+0x4b6/0x9c0 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __might_fault+0x64/0xd0 ? lock_acquire+0xc7/0x510 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? do_vfs_ioctl+0xfc/0x9d0 ? ioctl_file_clone+0xe0/0xe0 ? lock_downgrade+0x3d0/0x3d0 ? lockdep_hardirqs_on_prepare+0x210/0x210 ? __kasan_check_read+0x11/0x20 ? lock_release+0xc8/0x620 ? __task_pid_nr_ns+0xd3/0x250 ? lock_acquire+0xc7/0x510 ? __fget_files+0x160/0x230 ? __fget_light+0xf2/0x110 __x64_sys_ioctl+0xc3/0x100 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7fd1976e2427 Code: 00 00 90 48 8b 05 (...) RSP: 002b:00007fd1955e5cf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007fd1955e5f40 RCX: 00007fd1976e2427 RDX: 00007fd1955e5f48 RSI: 00000000c038943b RDI: 0000000000000004 RBP: 0000000001000000 R08: 0000000000000000 R09: 00007fd1955e6120 R10: 0000557835366b00 R11: 0000000000000246 R12: 0000000000000004 R13: 00007fd1955e5f48 R14: 00007fd1955e5f40 R15: 00007fd1955e5ef8 Modules linked in: ---[ end trace ec8931a1c36e57be ]--- (gdb) l *(__tree_mod_log_rewind+0x3b1) 0xffffffff81893521 is in __tree_mod_log_rewind (fs/btrfs/ctree.c:1210). 1205 * the modification. as we're going backwards, we do the 1206 * opposite of each operation here. 1207 */ 1208 switch (tm->op) { 1209 case MOD_LOG_KEY_REMOVE_WHILE_FREEING: 1210 BUG_ON(tm->slot < n); 1211 fallthrough; 1212 case MOD_LOG_KEY_REMOVE_WHILE_MOVING: 1213 case MOD_LOG_KEY_REMOVE: 1214 btrfs_set_node_key(eb, &tm->key, tm->slot); Here's what happens to hit that BUG_ON(): 1) We have one tree mod log user (through fiemap or the logical ino ioctl), with a sequence number of 1, so we have fs_info->tree_mod_seq == 1; 2) Another task is at ctree.c:balance_level() and we have eb X currently as the root of the tree, and we promote its single child, eb Y, as the new root. Then, at ctree.c:balance_level(), we call: tree_mod_log_insert_root(eb X, eb Y, 1); 3) At tree_mod_log_insert_root() we create tree mod log elements for each slot of eb X, of operation type MOD_LOG_KEY_REMOVE_WHILE_FREEING each with a ->logical pointing to ebX->start. These are placed in an array named tm_list. Lets assume there are N elements (N pointers in eb X); 4) Then, still at tree_mod_log_insert_root(), we create a tree mod log element of operation type MOD_LOG_ROOT_REPLACE, ->logical set to ebY->start, ->old_root.logical set to ebX->start, ->old_root.level set to the level of eb X and ->generation set to the generation of eb X; 5) Then tree_mod_log_insert_root() calls tree_mod_log_free_eb() with tm_list as argument. After that, tree_mod_log_free_eb() calls __tree_mod_log_insert() for each member of tm_list in reverse order, from highest slot in eb X, slot N - 1, to slot 0 of eb X; 6) __tree_mod_log_insert() sets the sequence number of each given tree mod log operation - it increments fs_info->tree_mod_seq and sets fs_info->tree_mod_seq as the sequence number of the given tree mod log operation. This means that for the tm_list created at tree_mod_log_insert_root(), the element corresponding to slot 0 of eb X has the highest sequence number (1 + N), and the element corresponding to the last slot has the lowest sequence number (2); 7) Then, after inserting tm_list's elements into the tree mod log rbtree, the MOD_LOG_ROOT_REPLACE element is inserted, which gets the highest sequence number, which is N + 2; 8) Back to ctree.c:balance_level(), we free eb X by calling btrfs_free_tree_block() on it. Because eb X was created in the current transaction, has no other references and writeback did not happen for it, we add it back to the free space cache/tree; 9) Later some other task T allocates the metadata extent from eb X, since it is marked as free space in the space cache/tree, and uses it as a node for some other btree; 10) The tree mod log user task calls btrfs_search_old_slot(), which calls get_old_root(), and finally that calls __tree_mod_log_oldest_root() with time_seq == 1 and eb_root == eb Y; 11) First iteration of the while loop finds the tree mod log element with sequence number N + 2, for the logical address of eb Y and of type MOD_LOG_ROOT_REPLACE; 12) Because the operation type is MOD_LOG_ROOT_REPLACE, we don't break out of the loop, and set root_logical to point to tm->old_root.logical which corresponds to the logical address of eb X; 13) On the next iteration of the while loop, the call to tree_mod_log_search_oldest() returns the smallest tree mod log element for the logical address of eb X, which has a sequence number of 2, an operation type of MOD_LOG_KEY_REMOVE_WHILE_FREEING and corresponds to the old slot N - 1 of eb X (eb X had N items in it before being freed); 14) We then break out of the while loop and return the tree mod log operation of type MOD_LOG_ROOT_REPLACE (eb Y), and not the one for slot N - 1 of eb X, to get_old_root(); 15) At get_old_root(), we process the MOD_LOG_ROOT_REPLACE operation and set "logical" to the logical address of eb X, which was the old root. We then call tree_mod_log_search() passing it the logical address of eb X and time_seq == 1; 16) Then before calling tree_mod_log_search(), task T adds a key to eb X, which results in adding a tree mod log operation of type MOD_LOG_KEY_ADD to the tree mod log - this is done at ctree.c:insert_ptr() - but after adding the tree mod log operation and before updating the number of items in eb X from 0 to 1... 17) The task at get_old_root() calls tree_mod_log_search() and gets the tree mod log operation of type MOD_LOG_KEY_ADD just added by task T. Then it enters the following if branch: if (old_root && tm && tm->op != MOD_LOG_KEY_REMOVE_WHILE_FREEING) { (...) } (...) Calls read_tree_block() for eb X, which gets a reference on eb X but does not lock it - task T has it locked. Then it clones eb X while it has nritems set to 0 in its header, before task T sets nritems to 1 in eb X's header. From hereupon we use the clone of eb X which no other task has access to; 18) Then we call __tree_mod_log_rewind(), passing it the MOD_LOG_KEY_ADD mod log operation we just got from tree_mod_log_search() in the previous step and the cloned version of eb X; 19) At __tree_mod_log_rewind(), we set the local variable "n" to the number of items set in eb X's clone, which is 0. Then we enter the while loop, and in its first iteration we process the MOD_LOG_KEY_ADD operation, which just decrements "n" from 0 to (u32)-1, since "n" is declared with a type of u32. At the end of this iteration we call rb_next() to find the next tree mod log operation for eb X, that gives us the mod log operation of type MOD_LOG_KEY_REMOVE_WHILE_FREEING, for slot 0, with a sequence number of N + 1 (steps 3 to 6); 20) Then we go back to the top of the while loop and trigger the following BUG_ON(): (...) switch (tm->op) { case MOD_LOG_KEY_REMOVE_WHILE_FREEING: BUG_ON(tm->slot < n); fallthrough; (...) Because "n" has a value of (u32)-1 (4294967295) and tm->slot is 0. Fix this by taking a read lock on the extent buffer before cloning it at ctree.c:get_old_root(). This should be done regardless of the extent buffer having been freed and reused, as a concurrent task might be modifying it (while holding a write lock on it). Reported-by: Zygo Blaxell Link: https://lore.kernel.org/linux-btrfs/20210227155037.GN28049@hungrycats.org/ Fixes: 834328a8493079 ("Btrfs: tree mod log's old roots could still be part of the tree") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman commit bb1256630524bcd57886b2ae76294313d65de744 Author: Arnaldo Carvalho de Melo Date: Tue Dec 5 10:14:42 2017 -0300 tools build feature: Check if pthread_barrier_t is available commit 25ab5abf5b141d7fd13eed506c7458aa04749c29 upstream. As 'perf bench futex wake-parallel" will use this, which is not available in older systems such as versions of the android NDK used in my container build tests (r12b and r15c at the moment). Cc: Adrian Hunter Cc: David Ahern Cc: Davidlohr Bueso Cc: James Yang Cc: Kim Phillips Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-1i7iv54in4wj08lwo55b0pzv@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 635a002105dd3b03eadcd8401cbbb7054bbf1b69 Author: Changbin Du Date: Tue Jan 28 23:29:38 2020 +0800 perf: Make perf able to build with latest libbfd commit 0ada120c883d4f1f6aafd01cf0fbb10d8bbba015 upstream. libbfd has changed the bfd_section_* macros to inline functions bfd_section_ since 2019-09-18. See below two commits: o http://www.sourceware.org/ml/gdb-cvs/2019-09/msg00064.html o https://www.sourceware.org/ml/gdb-cvs/2019-09/msg00072.html This fix make perf able to build with both old and new libbfd. Signed-off-by: Changbin Du Acked-by: Jiri Olsa Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20200128152938.31413-1-changbin.du@gmail.com Signed-off-by: Arnaldo Carvalho de Melo Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit aa8a33764bd53cc2ac9a6c341b5d10fd94145b48 Author: Arnaldo Carvalho de Melo Date: Thu Jun 13 12:04:19 2019 -0300 tools build: Check if gettid() is available before providing helper commit 4541a8bb13a86e504416a13360c8dc64d2fd612a upstream. Laura reported that the perf build failed in fedora when we got a glibc that provides gettid(), which I reproduced using fedora rawhide with the glibc-devel-2.29.9000-26.fc31.x86_64 package. Add a feature check to avoid providing a gettid() helper in such systems. On a fedora rawhide system with this patch applied we now get: [root@7a5f55352234 perf]# grep gettid /tmp/build/perf/FEATURE-DUMP feature-gettid=1 [root@7a5f55352234 perf]# cat /tmp/build/perf/feature/test-gettid.make.output [root@7a5f55352234 perf]# ldd /tmp/build/perf/feature/test-gettid.bin linux-vdso.so.1 (0x00007ffc6b1f6000) libc.so.6 => /lib64/libc.so.6 (0x00007f04e0a74000) /lib64/ld-linux-x86-64.so.2 (0x00007f04e0c47000) [root@7a5f55352234 perf]# nm /tmp/build/perf/feature/test-gettid.bin | grep -w gettid U gettid@@GLIBC_2.30 [root@7a5f55352234 perf]# While on a fedora:29 system: [acme@quaco perf]$ grep gettid /tmp/build/perf/FEATURE-DUMP feature-gettid=0 [acme@quaco perf]$ cat /tmp/build/perf/feature/test-gettid.make.output test-gettid.c: In function ‘main’: test-gettid.c:8:9: error: implicit declaration of function ‘gettid’; did you mean ‘getgid’? [-Werror=implicit-function-declaration] return gettid(); ^~~~~~ getgid cc1: all warnings being treated as errors [acme@quaco perf]$ Reported-by: Laura Abbott Tested-by: Laura Abbott Acked-by: Jiri Olsa Cc: Adrian Hunter Cc: Florian Weimer Cc: Namhyung Kim Cc: Stephane Eranian Link: https://lkml.kernel.org/n/tip-yfy3ch53agmklwu9o7rlgf9c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Greg Kroah-Hartman commit 45c4df37e6c920242e63f736e48fc42cf0d69298 Author: Arnaldo Carvalho de Melo Date: Wed Nov 21 17:42:00 2018 -0300 tools build feature: Check if eventfd() is available commit 11c6cbe706f218a8dc7e1f962f12b3a52ddd33a9 upstream. A new 'perf bench epoll' will use this, and to disable it for older systems, add a feature test for this API. This is just a simple program that if successfully compiled, means that the feature is present, at least at the library level, in a build that sets the output directory to /tmp/build/perf (using O=/tmp/build/perf), we end up with: $ ls -la /tmp/build/perf/feature/test-eventfd* -rwxrwxr-x. 1 acme acme 8176 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.bin -rw-rw-r--. 1 acme acme 588 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.d -rw-rw-r--. 1 acme acme 0 Nov 21 15:58 /tmp/build/perf/feature/test-eventfd.make.output $ ldd /tmp/build/perf/feature/test-eventfd.bin linux-vdso.so.1 (0x00007fff3bf3f000) libc.so.6 => /lib64/libc.so.6 (0x00007fa984061000) /lib64/ld-linux-x86-64.so.2 (0x00007fa984417000) $ grep eventfd -A 2 -B 2 /tmp/build/perf/FEATURE-DUMP feature-dwarf=1 feature-dwarf_getlocations=1 feature-eventfd=1 feature-fortify-source=1 feature-sync-compare-and-swap=1 $ The main thing here is that in the end we'll have -DHAVE_EVENTFD in CFLAGS, and then the 'perf bench' entry needing that API can be selectively pruned. Cc: Adrian Hunter Cc: Andrew Morton Cc: David Ahern Cc: Davidlohr Bueso Cc: Jason Baron Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-wkeldwob7dpx6jvtuzl8164k@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit e95a27359babea91956331539849e730378e505f Author: Arnaldo Carvalho de Melo Date: Mon Nov 19 16:56:22 2018 -0300 tools build feature: Check if get_current_dir_name() is available commit 8feb8efef97a134933620071e0b6384cb3238b4e upstream. As the namespace support code will use this, which is not available in some non _GNU_SOURCE libraries such as Android's bionic used in my container build tests (r12b and r15c at the moment). Cc: Adrian Hunter Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Wang Nan Link: https://lkml.kernel.org/n/tip-x56ypm940pwclwu45d7jfj47@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 3c9decd4fc68d06e2b26e598ad26473a7e4aad09 Author: Jiri Olsa Date: Sun Jan 12 20:22:59 2020 +0100 perf tools: Use %define api.pure full instead of %pure-parser commit fc8c0a99223367b071c83711259d754b6bb7a379 upstream. bison deprecated the "%pure-parser" directive in favor of "%define api.pure full". The api.pure got introduced in bison 2.3 (Oct 2007), so it seems safe to use it without any version check. Signed-off-by: Jiri Olsa Cc: Adrian Hunter Cc: Clark Williams Cc: Jiri Olsa Cc: Namhyung Kim Cc: Ravi Bangoria Cc: Thomas Gleixner Link: http://lore.kernel.org/lkml/20200112192259.GA35080@krava Signed-off-by: Arnaldo Carvalho de Melo Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman commit 866f290149fb46cbabe897fc64f569abed4b348e Author: Rafael J. Wysocki Date: Fri Mar 19 15:47:25 2021 +0100 Revert "PM: runtime: Update device status before letting suppliers suspend" commit 0cab893f409c53634d0d818fa414641cbcdb0dab upstream. Revert commit 44cc89f76464 ("PM: runtime: Update device status before letting suppliers suspend") that introduced a race condition into __rpm_callback() which allowed a concurrent rpm_resume() to run and resume the device prematurely after its status had been changed to RPM_SUSPENDED by __rpm_callback(). Fixes: 44cc89f76464 ("PM: runtime: Update device status before letting suppliers suspend") Link: https://lore.kernel.org/linux-pm/24dfb6fc-5d54-6ee2-9195-26428b7ecf8a@intel.com/ Reported-by: Adrian Hunter Cc: 4.10+ # 4.10+ Signed-off-by: Rafael J. Wysocki Reviewed-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman commit c49e70a5e7f24b8be2969e5ccf9ba6acd779abbb Author: Piotr Krysiuk Date: Tue Mar 16 09:47:02 2021 +0100 bpf: Prohibit alu ops for pointer types not defining ptr_limit commit f232326f6966cf2a1d1db7bc917a4ce5f9f55f76 upstream. The purpose of this patch is to streamline error propagation and in particular to propagate retrieve_ptr_limit() errors for pointer types that are not defining a ptr_limit such that register-based alu ops against these types can be rejected. The main rationale is that a gap has been identified by Piotr in the existing protection against speculatively out-of-bounds loads, for example, in case of ctx pointers, unprivileged programs can still perform pointer arithmetic. This can be abused to execute speculatively out-of-bounds loads without restrictions and thus extract contents of kernel memory. Fix this by rejecting unprivileged programs that attempt any pointer arithmetic on unprotected pointer types. The two affected ones are pointer to ctx as well as pointer to map. Field access to a modified ctx' pointer is rejected at a later point in time in the verifier, and 7c6967326267 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") only relevant for root-only use cases. Risk of unprivileged program breakage is considered very low. Fixes: 7c6967326267 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Piotr Krysiuk Co-developed-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 4d8880bd8255e599d5f203036242fbe3948d70b4 Author: Florian Fainelli Date: Mon Feb 22 14:30:10 2021 -0800 net: dsa: b53: Support setting learning on port commit f9b3827ee66cfcf297d0acd6ecf33653a5f297ef upstream. Add support for being able to set the learning attribute on port, and make sure that the standalone ports start up with learning disabled. We can remove the code in bcm_sf2 that configured the ports learning attribute because we want the standalone ports to have learning disabled by default and port 7 cannot be bridged, so its learning attribute will not change past its initial configuration. Signed-off-by: Florian Fainelli Reviewed-by: Vladimir Oltean Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman commit b4aa37d9632de0c61fe0ea14569f9debf56d4e38 Author: Piotr Krysiuk Date: Tue Mar 16 09:47:02 2021 +0100 bpf: Add sanity check for upper ptr_limit commit 1b1597e64e1a610c7a96710fc4717158e98a08b3 upstream. Given we know the max possible value of ptr_limit at the time of retrieving the latter, add basic assertions, so that the verifier can bail out if anything looks odd and reject the program. Nothing triggered this so far, but it also does not hurt to have these. Signed-off-by: Piotr Krysiuk Co-developed-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 59ce8e5ecf2dc409736f564d80dd25f7101739b3 Author: Piotr Krysiuk Date: Tue Mar 16 08:26:25 2021 +0100 bpf: Simplify alu_limit masking for pointer arithmetic commit b5871dca250cd391885218b99cc015aca1a51aea upstream. Instead of having the mov32 with aux->alu_limit - 1 immediate, move this operation to retrieve_ptr_limit() instead to simplify the logic and to allow for subsequent sanity boundary checks inside retrieve_ptr_limit(). This avoids in future that at the time of the verifier masking rewrite we'd run into an underflow which would not sign extend due to the nature of mov32 instruction. Signed-off-by: Piotr Krysiuk Co-developed-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 92df5a174c6e7a3078e81b919c4d859d4f9f3f2e Author: Piotr Krysiuk Date: Tue Mar 16 08:20:16 2021 +0100 bpf: Fix off-by-one for area size in creating mask to left commit 10d2bb2e6b1d8c4576c56a748f697dbeb8388899 upstream. retrieve_ptr_limit() computes the ptr_limit for registers with stack and map_value type. ptr_limit is the size of the memory area that is still valid / in-bounds from the point of the current position and direction of the operation (add / sub). This size will later be used for masking the operation such that attempting out-of-bounds access in the speculative domain is redirected to remain within the bounds of the current map value. When masking to the right the size is correct, however, when masking to the left, the size is off-by-one which would lead to an incorrect mask and thus incorrect arithmetic operation in the non-speculative domain. Piotr found that if the resulting alu_limit value is zero, then the BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading 0xffffffff into AX instead of sign-extending to the full 64 bit range, and as a result, this allows abuse for executing speculatively out-of- bounds loads against 4GB window of address space and thus extracting the contents of kernel memory via side-channel. Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Piotr Krysiuk Co-developed-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman commit 95080b446ad686c6e0c3d19a12cb5e6692ecb77a Author: Jan Kara Date: Wed Mar 17 17:44:14 2021 +0100 ext4: check journal inode extents more carefully commit ce9f24cccdc019229b70a5c15e2b09ad9c0ab5d1 upstream. Currently, system zones just track ranges of block, that are "important" fs metadata (bitmaps, group descriptors, journal blocks, etc.). This however complicates how extent tree (or indirect blocks) can be checked for inodes that actually track such metadata - currently the journal inode but arguably we should be treating quota files or resize inode similarly. We cannot run __ext4_ext_check() on such metadata inodes when loading their extents as that would immediately trigger the validity checks and so we just hack around that and special-case the journal inode. This however leads to a situation that a journal inode which has extent tree of depth at least one can have invalid extent tree that gets unnoticed until ext4_cache_extents() crashes. To overcome this limitation, track inode number each system zone belongs to (0 is used for zones not belonging to any inode). We can then verify inode number matches the expected one when verifying extent tree and thus avoid the false errors. With this there's no need to to special-case journal inode during extent tree checking anymore so remove it. Fixes: 0a944e8a6c66 ("ext4: don't perform block validity checks on the journal inode") Reported-by: Wolfgang Frisch Reviewed-by: Lukas Czerner Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20200728130437.7804-4-jack@suse.cz Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit 495729f8e3f719b25637d3618303ae8b8322d4ea Author: Jan Kara Date: Wed Mar 17 17:44:13 2021 +0100 ext4: don't allow overlapping system zones commit bf9a379d0980e7413d94cb18dac73db2bfc5f470 upstream. Currently, add_system_zone() just silently merges two added system zones that overlap. However the overlap should not happen and it generally suggests that some unrelated metadata overlap which indicates the fs is corrupted. We should have caught such problems earlier (e.g. in ext4_check_descriptors()) but add this check as another line of defense. In later patch we also use this for stricter checking of journal inode extent tree. Reviewed-by: Lukas Czerner Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20200728130437.7804-3-jack@suse.cz Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman commit da4933175692d1764f3e42027cc6cfeb79647f6c Author: Jan Kara Date: Wed Mar 17 17:44:12 2021 +0100 ext4: handle error of ext4_setup_system_zone() on remount commit d176b1f62f242ab259ff665a26fbac69db1aecba upstream. ext4_setup_system_zone() can fail. Handle the failure in ext4_remount(). Reviewed-by: Lukas Czerner Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20200728130437.7804-2-jack@suse.cz Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman