ChangeSet@1.1608.1.64, 2004-03-12 13:54:32-05:00, jgarzik@redhat.com Merge redhat.com:/spare/repo/linux-2.5 into redhat.com:/spare/repo/libata-2.5 ChangeSet@1.1608.76.43, 2004-03-12 13:52:04-05:00, jgarzik@redhat.com Add Promise SX8 (carmel) block driver. ChangeSet@1.1608.81.2, 2004-03-12 13:02:56-05:00, jgarzik@redhat.com [wireless prism54] remove WIRELESS_EXT ifdefs ChangeSet@1.1608.81.1, 2004-03-12 12:55:33-05:00, jgarzik@redhat.com [wireless] Add new Prism54 wireless driver. ChangeSet@1.1608.78.99, 2004-03-12 09:08:58-08:00, torvalds@ppc970.osdl.org Revert attribute_used changes in module.h. They were wrong. Cset exclude: akpm@osdl.org|ChangeSet|20040312161945|47751 ChangeSet@1.1608.78.98, 2004-03-12 08:25:56-08:00, akpm@osdl.org [PATCH] slab: avoid higher-order allocations From: Manfred Spraul At present slab is using 2-order allocations for the size-2048 cache. Of course, this can affect networking quite seriously. The patch ensures that slab will never use more than a 1-order allocation for objects which have a size of less than 2*PAGE_SIZE. ChangeSet@1.1608.78.97, 2004-03-12 08:25:47-08:00, akpm@osdl.org [PATCH] vmscan: add lru_to_page() helper From: Nick Piggin Add a little helper macro for a common list extraction operation in vmscan.c ChangeSet@1.1608.78.96, 2004-03-12 08:25:36-08:00, akpm@osdl.org [PATCH] vm: balance inactive zone refill rates The current refill logic in refill_inactive_zone() takes an arbitrarily large number of pages and chops it down to SWAP_CLUSTER_MAX*4, regardless of the size of the zone. This has the effect of reducing the amount of refilling of large zones proportionately much more than of small zones. We made this change in may 2003 and I'm damned if I remember why. let's put it back so we don't truncate the refill count and see what happens. ChangeSet@1.1608.78.95, 2004-03-12 08:25:24-08:00, akpm@osdl.org [PATCH] fix vm-batch-inactive-scanning.patch - prevent nr_scan_inactive from going negative - compare `count' with SWAP_CLUSTER_MAX, not `max_scan' - Use ">= SWAP_CLUSTER_MAX", not "> SWAP_CLUSTER_MAX". ChangeSet@1.1608.78.94, 2004-03-12 08:25:12-08:00, akpm@osdl.org [PATCH] vmscan: batch up inactive list scanning work From: Nick Piggin Use a "refill_counter" for inactive list scanning, similar to the one used for active list scanning. This batches up scanning now that we precisely balance ratios, and don't round up the amount to be done. No observed benefits, but I imagine it would lower the acquisition frequency of the lru locks in some cases, and make codepaths more efficient in general due to cache niceness. ChangeSet@1.1608.78.93, 2004-03-12 08:25:03-08:00, akpm@osdl.org [PATCH] vmscan: less throttling of page allocators and kswapd This is just a random unsubstantiated tuning tweak: don't immediately throttle page allocators and kwapd when the going is getting heavier: scan a bit more of the LRU before throttling. ChangeSet@1.1608.78.92, 2004-03-12 08:24:52-08:00, akpm@osdl.org [PATCH] fix the kswapd zone scanning algorithm This removes a vestige of the old algorithm. We don't want to skip zones if all_zones_ok is true: we've already precalculated which zones need scanning and this just stops us from ever performing kswapd reclaim from the DMA zone. ChangeSet@1.1608.78.91, 2004-03-12 08:24:40-08:00, akpm@osdl.org [PATCH] kswapd: fix lumpy page reclaim As kswapd is now scanning zones in the highmem->normal->dma direction it can get into competition with the page allocator: kswapd keep on trying to free pages from highmem, then kswapd moves onto lowmem. By the time kswapd has done proportional scanning in lowmem, someone has come in and allocated a few pages from highmem. So kswapd goes back and frees some highmem, then some lowmem again. But nobody has allocated any lowmem yet. So we keep on and on scanning lowmem in response to highmem page allocations. With a simple `dd' on a 1G box we get: r b swpd free buff cache si so bi bo in cs us sy wa id 0 3 0 59340 4628 922348 0 0 4 28188 1072 808 0 10 46 44 0 3 0 29932 4660 951760 0 0 0 30752 1078 441 1 6 30 64 0 3 0 57568 4556 924052 0 0 0 30748 1075 478 0 8 43 49 0 3 0 29664 4584 952176 0 0 0 30752 1075 472 0 6 34 60 0 3 0 5304 4620 976280 0 0 4 40484 1073 456 1 7 52 41 0 3 0 104856 4508 877112 0 0 0 18452 1074 97 0 7 67 26 0 3 0 70768 4540 911488 0 0 0 35876 1078 746 0 7 34 59 1 2 0 42544 4568 939680 0 0 0 21524 1073 556 0 5 43 51 0 3 0 5520 4608 976428 0 0 4 37924 1076 836 0 7 41 51 0 2 0 4848 4632 976812 0 0 32 12308 1092 94 0 1 33 66 Simple fix: go back to scanning the zones in the dma->normal->highmem direction so we meet the page allocator in the middle somewhere. r b swpd free buff cache si so bi bo in cs us sy wa id 1 3 0 5152 3468 976548 0 0 4 37924 1071 650 0 8 64 28 1 2 0 4888 3496 976588 0 0 0 23576 1075 726 0 6 66 27 0 3 0 5336 3532 976348 0 0 0 31264 1072 708 0 8 60 32 0 3 0 6168 3560 975504 0 0 0 40992 1072 683 0 6 63 31 0 3 0 4560 3580 976844 0 0 0 18448 1073 233 0 4 59 37 0 3 0 5840 3624 975712 0 0 4 26660 1072 800 1 8 46 45 0 3 0 4816 3648 976640 0 0 0 40992 1073 526 0 6 47 47 0 3 0 5456 3672 976072 0 0 0 19984 1070 320 0 5 60 35 ChangeSet@1.1608.78.90, 2004-03-12 08:24:29-08:00, akpm@osdl.org [PATCH] kswapd: avoid unnecessary reclaiming from higher zones Currently kswapd walks across all zones in dma->normal->highmem order, performing proportional scanning until all zones are OK. This means that pressure against ZONE_NORMAL causes unnecessary reclaim of ZONE_HIGHMEM. To fix that up we change kswapd so that it walks the zones in the high->normal->dma direction, skipping zones which are OK. Once it encounters a zone which needs some reclaim kswapd will perform proportional scanning against that zone as well as all the succeeding lower zones. We scan the lower zones even if they have sufficient free pages. This is because a) the lower zone may be above pages_high, but because of the incremental min, the lower zone may still not be eligible for allocations. That's bad because cache in that lower zone will then not be scanned at the correct rate. b) pages in this lower zone are usable for allocations against the higher zone. So we do want to san all the relevant zones at an equal rate. ChangeSet@1.1608.78.89, 2004-03-12 08:24:20-08:00, akpm@osdl.org [PATCH] vmscan: avoid bogus throttling - If max_scan evaluates to zero due to a very small inactive list and high `priority' numbers, we don't want to thrlttle yet. - In balance_pgdat(), we may end up not scanning any pages because all zones happened to be above pages_high. Avoid throttling in this case too. ChangeSet@1.1608.78.88, 2004-03-12 08:24:10-08:00, akpm@osdl.org [PATCH] Balance inter-zone scan rates When page reclaim is working out how many pages to san in a zone (max-scan) it presently rounds that number up if it looks too small - for work batching. Problem is, this can result in excessive scanning against small zones which have few inactive pages. So remove it. Not that it is possible for max_scan to be zero. That's OK - it'll become non-zero as the priority increases. ChangeSet@1.1608.78.87, 2004-03-12 08:24:01-08:00, akpm@osdl.org [PATCH] vmscan: drive everything via nr_to_scan Page reclaim is currently a bit schitzo: sometimes we say "go and scan this many pages and tell me how many pages were freed" and at other times we say "go and scan this many pages, but stop if you freed this many". It makes the logic harder to control and to understand. This patch coverts everything into the "go and scan this many pages and tell me how many pages were freed" model. It doesn't seem to affect performance much either way. ChangeSet@1.1608.78.86, 2004-03-12 08:23:50-08:00, akpm@osdl.org [PATCH] vmscan: zone balancing fix We currently have a problem with the balancing of reclaim between zones: much more reclaim happens against highmem than against lowmem. This patch partially fixes this by changing the direct reclaim path so it does not bale out of the zone walk after having reclaimed sufficient pages from highmem: go on to reclaim from lowmem regardless of how many pages we reclaimed from lowmem. ChangeSet@1.1608.78.85, 2004-03-12 08:23:38-08:00, akpm@osdl.org [PATCH] vm: scan slab in response to highmem scanning The patch which went in six months or so back which said "only reclaim slab if we're scanning lowmem pagecache" was wrong. I must have been asleep at the time. We do need to scan slab in response to highmem page reclaim as well. Because all the math is based around the total amount of memory in the machine, and we know that if we're performing highmem page reclaim then the lower zones have no free memory. ChangeSet@1.1608.78.84, 2004-03-12 08:23:28-08:00, akpm@osdl.org [PATCH] vmscan: fix calculation of number of pages scanned From: Nick Piggin The logic which calculates the numberof pages which were scanned is mucked up. Fix. ChangeSet@1.1608.78.83, 2004-03-12 08:23:19-08:00, akpm@osdl.org [PATCH] vm: shrink slab evenly in try_to_free_pages() From: Nick Piggin In try_to_free_pages(), put even pressure on the slab even if we have reclaimed enough pages from the LRU. ChangeSet@1.1608.78.82, 2004-03-12 08:23:10-08:00, akpm@osdl.org [PATCH] shrink_slab: math precision fix From: Nick Piggin In shrink_slab(), do the multiply before the divide to avoid losing precision. ChangeSet@1.1608.78.81, 2004-03-12 08:23:01-08:00, akpm@osdl.org [PATCH] vmscan: preserve page referenced info in refill_inactive() From: Nick Piggin If refill_inactive_zone() is running in its dont-reclaim-mapped-memory mode we are tossing away the referenced infomation on active mapped pages. So put that info back if we're not going to deactivate the page. ChangeSet@1.1608.78.80, 2004-03-12 08:22:50-08:00, akpm@osdl.org [PATCH] kswapd throttling fixes The logic in balance_pgdat() is all bollixed up. - the incoming arg `nr_pages' should be used to determine if we're being asked to free a specific number of pages, not `to_free'. - local variable `to_free' is not appropriate for the determination of whether we failed to bring all zones to appropriate free pages levels. Fix this by correctly calculating `all_zones_ok' and then use all_zones_ok to determine whether we need to throttle kswapd. So the logic now is: for (increasing priority) { all_zones_ok = 1; for (all zones) { to_reclaim = number of pages to try to reclaim from this zone; max_scan = number of pages to scan in this pass (gets larger as `priority' decreases) /* * set `reclaimed' to the number of pages which were * actually freed up */ reclaimed = scan(max_scan pages); reclaimed += shrink_slab(); to_free -= reclaimed; /* for the `nr_pages>0' case */ /* * If this scan failed to reclaim `to_reclaim' or more * pages, we're getting into trouble. Need to scan * some more, and throttle kswapd. Note that this * zone may now have sufficient free pages due to * freeing activity by some other process. That's * OK - we'll pick that info up on the next pass * through the loop. */ if (reclaimed < to_reclaim) all_zones_ok = 0; } if (to_free > 0) continue; /* swsusp: need to do more work */ if (all_zones_ok) break; /* kswapd is done */ /* * OK, kswapd is getting into trouble. Take a nap, then take * another pass across the zones. */ blk_congestion_wait(); } ChangeSet@1.1608.78.79, 2004-03-12 08:22:39-08:00, akpm@osdl.org [PATCH] mm/vmscan.c: remove unused priority argument. From: Nikita Danilov Now that decision to reclaim mapped memory is taken on the basis of zone->prev_priority, priority argument is no longer needed. ChangeSet@1.1608.78.78, 2004-03-12 08:22:27-08:00, akpm@osdl.org [PATCH] Narrow blk_congestion_wait races From: Nick Piggin The addition of the smp_mb and the other change is to try to close the window for races a bit. Obviously they can still happen, it's a racy interface and it doesn't matter much. ChangeSet@1.1608.78.77, 2004-03-12 08:22:15-08:00, akpm@osdl.org [PATCH] return remaining jiffies from blk_congestion_wait() Teach blk_congestion_wait() to return the number of jiffies remaining. This is for debug, but it is also nicely consistent. ChangeSet@1.1608.78.76, 2004-03-12 08:22:06-08:00, akpm@osdl.org [PATCH] vm: per-zone vmscan instrumentation To check on zone balancing, split the /proc/vmstat:pgsteal, pgreclaim pgalloc and pgscan stats into per-zone counters. Additionally, split the pgscan stats into pgscan_direct and pgscan_kswapd to see who's doing how much scanning. And add a metric for the number of slab objects which were scanned. ChangeSet@1.1608.78.75, 2004-03-12 08:21:56-08:00, akpm@osdl.org [PATCH] synclink.c update From: Paul Fulghum * track driver API changes * remove cast (kernel janitor) ChangeSet@1.1608.78.74, 2004-03-12 08:21:45-08:00, akpm@osdl.org [PATCH] synclink_cs.c update From: Paul Fulghum * Track driver API changes * Remove cast (kernel janitor) ChangeSet@1.1608.78.73, 2004-03-12 08:21:34-08:00, akpm@osdl.org [PATCH] synclinkmp.c update From: Paul Fulghum Patch for synclinkmp.c * Track driver API changes * Remove cast (kernel janitor) * Replace page_free call with kfree (to match kmalloc allocation) ChangeSet@1.1608.78.72, 2004-03-12 08:21:24-08:00, akpm@osdl.org [PATCH] Add barriers to avoid race in mempool_alloc/free From: Chris Mason mempool_alloc() and mempool_free() check pool->curr_nr without any locks held. This can lead to skipping a wakeup when there are people waiting, and sleeping when there are free elements in the pool. I can't trigger this reliably, but sooner or later someone on ppc is probably going to hit it. ChangeSet@1.1608.78.71, 2004-03-12 08:21:15-08:00, akpm@osdl.org [PATCH] m68k: interrupt management cleanups From: Geert Uytterhoeven M68k interrupt management: rename routines to not confuse them with syscalls - sys_{request,free}_irq() -> cpu_{request,free}_irq() - q40_sys_default_handler[] -> q40_default_handler - sys_default_handler() -> default_handler() ChangeSet@1.1608.78.70, 2004-03-12 08:21:04-08:00, akpm@osdl.org [PATCH] m68k: Macintosh IDE fixes From: Geert Uytterhoeven Mac IDE: Make sure the core IDE driver doesn't try to request the MMIO ports a second time, since this will fail. ChangeSet@1.1608.78.69, 2004-03-12 08:20:52-08:00, akpm@osdl.org [PATCH] Apollo fb sysfsification From: Geert Uytterhoeven Apollo fb: Add sysfs support (from James Simmons) ChangeSet@1.1608.78.68, 2004-03-12 08:20:42-08:00, akpm@osdl.org [PATCH] m68k: Amiga Framemaster II fb sysfsification From: Geert Uytterhoeven Amiga Framemaster II fb: Add sysfs support (from James Simmons) ChangeSet@1.1608.78.67, 2004-03-12 08:20:33-08:00, akpm@osdl.org [PATCH] m68k: __test_and_set_bit() From: Geert Uytterhoeven Add missing implementation for non-atomic __test_and_set_bit() ChangeSet@1.1608.78.66, 2004-03-12 08:20:22-08:00, akpm@osdl.org [PATCH] fbdev: monitor detection fixes From: James Simmons , Kronos Various fixes and enhancements to the monitor hardware detection code. The only driver that uses it is the radeon driver. Old EDID parsing code was very verbose, half of the patch address this (ie. print lots of stuff iff DEBUG). The other big change is the FB_MODE_IS_* stuff: we really need a way to know the origin of a video mode. In this way we can select video mode that comes from EDID instead of VESA or GTF. Drivers other than radeonfb won't be affected because they cannot (yet) get EDID from the monitor and don't use EDID related code. ChangeSet@1.1608.78.65, 2004-03-12 08:20:08-08:00, akpm@osdl.org [PATCH] Fix NULL pointer dereference in blkmtd.c From: Michel Marti The blkmtd driver oopses in add_device(). The following trivial patch fixes this. ChangeSet@1.1608.78.64, 2004-03-12 08:19:57-08:00, akpm@osdl.org [PATCH] fix raid0 readahead size From: Arjan van de Ven Readahead of raid0 was suboptimal; it read only 1 stride ahead. The problem with this is that while it will keep all spindles busy, it will not actually manage to make larger IO's, eg each disk would just do the chunk size IO. Doing at least 2 chunks is more than appropriate so that each spindle will get a chance to merge IO's. (Neil fixed raid6 and raid6 too) ChangeSet@1.1608.78.63, 2004-03-12 08:19:45-08:00, akpm@osdl.org [PATCH] module.h __attribute_used__ fix From: Rusty Russell Someone added __attribute_used__ throughout module.h, but didn't remove the ", unused". Looks like some arch/gcc combos still consider it unused, and discard the fn. ChangeSet@1.1608.78.62, 2004-03-12 08:19:34-08:00, akpm@osdl.org [PATCH] Fix CONFIG_NVRAM dependencies From: Geert Uytterhoeven Make CONFIG_NVRAM depend on the prerequisites that are explicitly checked for in drivers/char/nvram.c, or on CONFIG_GENERIC_NVRAM (for PPC). ChangeSet@1.1608.78.61, 2004-03-12 08:19:26-08:00, akpm@osdl.org [PATCH] Applicom warning From: Geert Uytterhoeven Add missing include (needed for struct inode) ChangeSet@1.1608.78.60, 2004-03-12 08:19:15-08:00, akpm@osdl.org [PATCH] Disable Macintosh device drivers for all but PPC || MAC From: Marc-Christian Petersen The attached patch is needed to stop showing us "Macintosh device drivers" for all architectures via menuconfig || xconfig || gconfig. It's only necessary for PPC and/or MAC. ACKed by benh. ChangeSet@1.1608.78.59, 2004-03-12 08:19:04-08:00, akpm@osdl.org [PATCH] add nowarn to a few pte chain allocators From: Arjan van de Ven Several of the pte_chain_alloc() allocators that use GFP_ATOMIC have a fallback for failure that sleeps; they thus need to not warn on failure.. Seen during a big fork on a busy system. ChangeSet@1.1608.78.58, 2004-03-12 08:18:53-08:00, akpm@osdl.org [PATCH] cciss: init section fix From: "Randy.Dunlap" cciss_scsi_detect() can be called after init (for TAPE support). ChangeSet@1.1608.78.57, 2004-03-12 08:18:43-08:00, akpm@osdl.org [PATCH] EDD: Get Legacy Parameters From: Matt Domsch Patch below from Patrick J. LoPresti and myself. Patrick describes: Why this patch? The problem is that the legacy BIOS interface (INT13/AH=3D08) for querying the disk geometry returns different values than the extended INT13 interface which the EDD code currently uses. This is because the legacy interface only provides a 10-bit cylinder field, so modern BIOSes "lie" about the head/sector counts in order to make more of the disk visible within the first 1024 cylinders. Many non-Linux applications, including the stock Windows boot loader, DOS fdisk, etc., rely upon the legacy interface and geometry. So it is useful to be able to obtain the legacy values from a running Linux kernel. What this patch does is to add new entries under /sys/firmware/edd/int13_devXX named "legacy_cylinders", "legacy_heads", and "legacy_sectors". These provide the geometry given by the legacy INT13/AH=3D08 BIOS interface, just like the current "default_cylinders" etc. provide the the geometry given by the INT13/AH=3D48 interface. Without this patch, I cannot use Linux to partition a drive and install Windows, which happens to be my application. - Pat http://unattended.sourceforge.net/ In addition, this adds two buggy BIOS workarounds in the EDD int13 calls as suggested by Ralf Brown's interrupt list. I'm also interested in moving this code out of arch/i386/kernel/edd.c and include/asm-i386/edd.h, as I believe it is applicable on x86-64 as well. However, there's no good place under drivers/ to put edd.c when it's not tied to a bus, but to several CPU architectures and their firmwares... Maybe a new directory drivers/firmware? ChangeSet@1.1608.78.56, 2004-03-12 08:18:34-08:00, akpm@osdl.org [PATCH] wavfront.c needs syscalls.h sound/oss/wavfront.c: In function `wavefront_download_firmware': sound/oss/wavfront.c:2524: warning: implicit declaration of function `sys_open' sound/oss/wavfront.c:2533: warning: implicit declaration of function `sys_read' sound/oss/wavfront.c:2582: warning: implicit declaration of function `sys_close ChangeSet@1.1608.78.55, 2004-03-12 08:18:23-08:00, akpm@osdl.org [PATCH] Fix reading the last block on a bdev From: Chris Mason This patch fixes a problem we're hitting on ia64 with page sizes > 4k. When the page size is greater than the block size, and parts of the page fall past the end of the device, readpage will fail because blkdev_get_block returns -EIO for blocks past i_size. The attached patch changes blkdev_get_block to return holes when reading past the end of the device, which allows us to read that last valid 4k block and then fill the rest of the page with zeros. Writes will still fail with -EIO. ChangeSet@1.1608.78.54, 2004-03-12 08:18:12-08:00, akpm@osdl.org [PATCH] Fix rootfs on ramdisk From: vda Add a missing test for the "root=/dev/ram" kernel boot option. It's just an alias for /dev/ram0, but it worked in 2.4... ChangeSet@1.1608.78.53, 2004-03-12 08:18:00-08:00, akpm@osdl.org [PATCH] current_is_keventd() speedup From: Srivatsa Vaddagiri current_is_keventd() doesn't need to search across all the CPUs to identify itself. ChangeSet@1.1608.78.52, 2004-03-12 08:17:51-08:00, akpm@osdl.org [PATCH] Fix and harden validate_mm From: Andi Kleen I was debugging some code that corrupted the vma rb lists and for that I fixed validate_mm to not be recursive and do some more checks. It's slower now, but that shouldn't be a problem. Also make it non static to allow easier checks elsewhere. ChangeSet@1.1608.78.51, 2004-03-12 08:17:41-08:00, akpm@osdl.org [PATCH] fadvise(POSIX_FADV_DONTNEED) fixups From: WU Fengguang - In sys_fadvise64_64(): if the start and/or end offsets do not fall on page boundaries, preserve the partial pages. The thinking here is that it is better to preserve needed memory than to not shoot down unneeded memory. - In invalidate_mapping_pages(): we were invalidating an entire pagevec's worth of pages each time around, even if that went beyond the part of the file which the caller asked to be invalidated. Fix that up. ChangeSet@1.1608.78.50, 2004-03-12 08:17:30-08:00, akpm@osdl.org [PATCH] AMD ELAN Kconfig fix From: Adrian Bunk - remove an MELAN entry that was forgotten in the i386 processor selection menu - s/CONFIG_MELAN/CONFIG_X86_ELAN/ was missing in module.h ChangeSet@1.1608.78.49, 2004-03-12 08:17:21-08:00, akpm@osdl.org [PATCH] watchdog: moduleparam-patches From: Wim Van Sebroeck Convert last set of watchdog drivers to new moduleparam system. ChangeSet@1.1608.78.48, 2004-03-12 08:17:10-08:00, akpm@osdl.org [PATCH] Remove arbitrary #acl entries limits on ext[23] when reading From: Andreas Gruenbacher Remove the arbitrary limit of 32 ACL entries on ext[23] when reading from disk. This change is backward compatible; we need to have this change in to be able to also allow writing big ACLs. The second patch that removes the ACL entry limit for writes is not included. I don't want to push that patch now, because large ACLs would cause 2.4 and current 2.6 kernels to fail. My plan is to remove the second limit later, in a half-year or year or so. ChangeSet@1.1608.78.47, 2004-03-12 08:16:59-08:00, akpm@osdl.org [PATCH] Enable i810 fb on x86-64 From: Andi Kleen i810fb most likely is needed on x86-64 too because there are Intel chipsets for it now. So far it only linked on i386, fix this. ChangeSet@1.1608.78.46, 2004-03-12 08:16:50-08:00, akpm@osdl.org [PATCH] /proc data corruption check From: Arjan van de Ven If someone removes a /proc directory which still has subdirectories it will lead to very nasty things (dentries remaining on hash chains etc etc etc). The BUG_ON in the patch below will catch this nasty situation. ChangeSet@1.1608.78.45, 2004-03-12 08:16:39-08:00, akpm@osdl.org [PATCH] Remove unneeded unlock in ipc/sem.c From: Manfred Spraul sem_revalidate checks that a semaphore array didn't disappear while the code was running without the semaphore array spinlock. If the array disappeared, then it will return without holding a lock. find_undo calls sem_revalidate and then sem_unlock, even if sem_revalidate failed. The sem_unlock call must be removed. Mingming Cao reported a spinlock deadlock with sysv semaphores. A superflous unlock doesn't explain the deadlock, but it's obviously a bug. ChangeSet@1.1608.78.44, 2004-03-12 08:16:28-08:00, akpm@osdl.org [PATCH] kbuild: fix usage with directories containing '.o' From: Sam Ravnborg From: Daniel Mack , me modpost unconditionally searched for ".o" assuming this is always the suffix of the module. This fails in two cases: a) when building external modules where any directory include ".o" in the name. One example is a directory named: .../cvs.alsa.org/... b) when someone names a kernel directory so it contains ".o". One example is drivers/scsi/aic.ok/... case b) was triggered by renaming the directory for aic7xxx, and modifying Makefile and Kconfig. This caused make modules to fail. ChangeSet@1.1608.78.43, 2004-03-12 08:16:17-08:00, akpm@osdl.org [PATCH] loop setup race fix From: Chris Mason There's a race in loopback setup, it's easiest to trigger with one or more procs doing loopback mounts at the same time. The problem is that fs/block_dev.c:do_open() only calls bdev_set_size on the first open. Picture two procs: proc1: mount -o loop file1 mnt1 proc2: mount -o loop file2 mnt2 proc1 proc2 open /dev/loop0 # bd_openers now 1 do_open bd_set_size(bdev, 0) # loop unbound, so bdev size is 0 open /dev/loop0 # bd_openers now 2 loop_set_fd # disk capacity now correct, but # bdev not updated mount /dev/loop0 /mnt do_open Because bd_openers != 0 for the last do_open, bd_set_size is not called again and a size of 0 is used. This eventually leads to an oops when the loop device is unmounted, because fsync_bdev calls block_write_full_page who decides every page on the block device is outside i_size and unmaps them. When ext2 or reiserfs try to sync a metadata buffer, we get an oops on because the buffers are no longer mapped. The patch below changes loop_set_fd and loop_clr_fd to also manipulate the size of the block device, which fixes things for me. ChangeSet@1.1608.78.42, 2004-03-12 08:16:06-08:00, akpm@osdl.org [PATCH] LOOP_CHANGE_FD ioctl From: Arjan van de Ven The patch below (written by Al Viro) solves a nasty chicken-and-egg issue for operating system installers (well at least anaconda but the problem domain is not exclusive to that) The basic problem is this: - The small first stage installer locates the image file of the second stage installer (which has X and all the graphical stuff); this image can be on the same CD, but it can come via NFS, http or ftp or ... as well. - The first stage installer loop-back mounts this image and gives control to the second stage installer by calling some binary there. - The graphical installer then asks the user all those questions and starts installing packages. Again the packages can come from the CD but also from NFS or http or ... Now in case of a CD install, once all requested packages from the first CD are installed, the installer wants to unmount and eject the CD and prompt the user to put CD 2 in....... EXCEPT that the unmount can't work since the installer is actually running from a loopback mount of this cd. The solution is a "LOOP_CHANGE_FD" ioctl, where basically the installer copies the image to the harddisk (which can only be done late since only late the target harddisk is mkfs'd) and then magically switches the backing store FD from underneath the loop device to the one on the target harddisk (and thus unbusying the CD mount). This is obviously only allowed if the size of the new image is identical and if the loop image is read-only in the first place. It's the responsibility of root to make sure the contents is the same (but that's of the give-root-enough-rope kind) ChangeSet@1.1608.78.41, 2004-03-12 08:15:54-08:00, akpm@osdl.org [PATCH] kbuild: Cause `make clean' to remove more files From: Sam Ravnborg Make the difference between 'make clean' and 'make distclean/mrproper' more explicit. make clean now removes all generated files except .config* and .version. The result is much easier to understand now. make clean deletes all generated files (except .config* and .version). make mrproper deletes configuration and all temporary files left by patch, editors and the like. Example output: > make mrproper CLEAN init CLEAN usr CLEAN scripts/kconfig CLEAN scripts CLEAN .tmp_versions include/config CLEAN include/asm-i386/asm_offsets.h include/linux/autoconf.h include/linux/version.h include/asm .tmp_versions CLEAN .version .config Form the list of files/directories deleted during make clean, removed all references that is no longer relevant for the current kernel. ChangeSet@1.1608.78.40, 2004-03-12 08:15:45-08:00, akpm@osdl.org [PATCH] Fix elf mapping of the zero page From: William Lee Irwin III Using PAGE_SIZE rather than 4096 so that mmap() granularity is honored by whatever non-i386 architectures use MMAP_PAGE_ZERO. ChangeSet@1.1608.78.39, 2004-03-12 08:15:34-08:00, akpm@osdl.org [PATCH] compiler.h scoping fixes From: Ville Nuorvala There are a few kernel-only things in compiler.h which should have been placed inside __KERNEL__. ChangeSet@1.1608.78.38, 2004-03-12 08:15:19-08:00, akpm@osdl.org [PATCH] Redundant unplug_timer deletion From: "Chen, Kenneth W" The only path to get to del_timer call in __generic_unplug_device() is when blk_remove_plug() returns 1, and in that case it already removed the unplug_timer. Patch to remove this redundant call. ChangeSet@1.1608.78.37, 2004-03-12 08:15:08-08:00, akpm@osdl.org [PATCH] NUMA-aware zonelist builder From: The attached patch is NUMA-aware zonelist builder patch, which sorts zonelist in the order that near-node first, far-node last. In lse-tech and linux-ia64, where most of NUMA people resides, no objections are raised so far. The patch adds NUMA-specific version of build_zonelists which calls find_next_best_node to select the next-nearest node to add to zonelist. The patch has no effect on flat NUMA platform. ChangeSet@1.1608.78.36, 2004-03-12 08:14:55-08:00, akpm@osdl.org [PATCH] kbuild: Remove CFLAGS assignment in i386/mach-*/Makefile From: Sam Ravnborg The EXTRA_CFLAGS assignments in the following files are a left-over from the early 2.5 days where the source was not compiled from the root of the source tree. Removing these wrong assignments fixes http://bugme.osdl.org/show_bug.cgi?id=2210 A script named 'kernel' in the .. directory no longer halt compilation. ChangeSet@1.1608.78.35, 2004-03-12 08:14:43-08:00, akpm@osdl.org [PATCH] UDF filesystem update From: Ben Fennema - added udf 2.5 #defines - fixed prealloc discard race - fixed several bugs in inode_getblk - added S_IFSOCK support - fix unicode encoding bug - change partition allocation from kmalloc to vmalloc for large allocations ChangeSet@1.1608.78.34, 2004-03-12 08:14:32-08:00, akpm@osdl.org [PATCH] selinux: clean up binary mount data From: James Morris selinux is currently inspecting the filesystem name ("nfs" vs "coda" vs watever) to work out whether it needs to hanbdle binary mount data. Eliminate all that by adding a flag to file_system_type.fs_flags. ChangeSet@1.1608.78.33, 2004-03-12 08:14:23-08:00, akpm@osdl.org [PATCH] dm: stripe width fix dm-stripe.c: The stripe width must be at least the page size. ChangeSet@1.1608.78.32, 2004-03-12 08:14:14-08:00, akpm@osdl.org [PATCH] dm: list targets cmd From: Joe Thornber List targets ioctl. [Patrick Caulfield] ChangeSet@1.1608.78.31, 2004-03-12 08:14:02-08:00, akpm@osdl.org [PATCH] dm: default queue limits From: Joe Thornber Fill in missing queue limitations when table is complete instead of enforcing the "default" limits on every dm device. Problem noticed by Mike Christie. [Christophe Saout] ChangeSet@1.1608.78.30, 2004-03-12 08:13:51-08:00, akpm@osdl.org [PATCH] dm: list_for_each_entry audit From: Joe Thornber Audit for list_for_each_*entry* ChangeSet@1.1608.78.29, 2004-03-12 08:13:40-08:00, akpm@osdl.org [PATCH] dm: endio method From: Joe Thornber Add an endio method to targets. This method is allowed to request another shot at failed ios (think multipath). Context can be passed between the map method and the endio method. ChangeSet@1.1608.78.28, 2004-03-12 08:13:26-08:00, akpm@osdl.org [PATCH] Allow X86_MCE_NONFATAL to be a module From: Herbert Xu By allowing X86_MCE_NONFATAL to be a module, it can be included in distribution kernels without upsetting those with strange hardware. ChangeSet@1.1608.78.27, 2004-03-12 08:13:16-08:00, akpm@osdl.org [PATCH] i386 very early memory detection cleanup patch From: "H. Peter Anvin" This patch cleans up the very early memory setup on the i386 platform. In particular, it removes the hard-coded 8 MB limit completely by dynamically creating the early-boot pagetables rather than having them hard coded. While I was at it, I changed head.S so that it always sets up a local GDT; this means among other things that SMP and VISWS are no longer special cases, and is conceptually cleaner to boot. The VISWS people have confirmed it works on VISWS. It also uses a separate entrypoint for non-boot processors since this is completely kernel-internal anyway. This eliminates the need to set %bx on boot. (If you think this is a bad idea I can eliminate this change; it just seemed cleaner to me to do it this way.) Additionally, zero bss with rep;stosl rather that rep;stosb. ChangeSet@1.1608.78.26, 2004-03-12 08:13:07-08:00, akpm@osdl.org [PATCH] genrtc: cleanups From: "Randy.Dunlap" From: Luiz Fernando Capitulino remove ifdef/endif in rtc_generic_init(). use returned error code; ChangeSet@1.1608.78.25, 2004-03-12 08:12:57-08:00, akpm@osdl.org [PATCH] remove __io_virt_debug From: Brian Gerst Drivers should all be converted to use ioremap() or isa_*() by now. ChangeSet@1.1608.78.24, 2004-03-12 08:12:47-08:00, akpm@osdl.org [PATCH] teach /proc/kmsg about O_NONBLOCK If there's nothing available and the file is O_NONBLOCK, return -EAGAIN. This is a bit grubby - really we should push the file* down into do_syslog() and handle it inside the spinlock. ChangeSet@1.1608.78.23, 2004-03-12 08:12:38-08:00, akpm@osdl.org [PATCH] time interpolator fix From: john stultz In developing the ia64-cyclone patch, which implements a cyclone based time interpolator, I found the following bug which could cause time inconsistencies. In update_wall_time_one_tick(), which is called each timer interrupt, we call time_interpolator_update(delta_nsec) where delta_nsec is approximately NSEC_PER_SEC/HZ. This directly correlates with the changes to xtime which occurs in update_wall_time_one_tick(). However in update_wall_time(), on a second overflow, we again call time_interpolator_update(NSEC_PER_SEC). However while the components of xtime are being changed, the overall value of xtime does not (nsec is decremented NSEC_PER_SEC and sec is incremented). Thus this call to time_interpolator_update is incorrect. This patch removes the incorrect call to time_interpolator_update and was found to resolve the time inconsistencies I had seen while developing the ia64-cyclone patch. ChangeSet@1.1608.78.22, 2004-03-12 08:12:26-08:00, akpm@osdl.org [PATCH] fb_console_init fix From: James Simmons This patch fixes fb_console_init from being called twice. I still need to fix set_con2fb but this helps but this is still important to get in. ChangeSet@1.1608.78.21, 2004-03-12 08:12:15-08:00, akpm@osdl.org [PATCH] read-only support for UFS2 From: Niraj Kumar This patch adds read-only support for ufs2 (used in FreeBSD 5.x) variant of ufs filesystem. For filesystem specific tools, see http://ufs-linux.sourceforge.com . ChangeSet@1.1608.78.20, 2004-03-12 08:12:05-08:00, akpm@osdl.org [PATCH] adaptive lazy readahead From: Suparna Bhattacharya From: Ram Pai Pipelined readahead behaviour is suitable for sequential reads, but not for large random reads (typical of database workloads), where lazy readahead provides a big performance boost. One option (suggested by Andrew Morton) would be to have the application pass hints to turn off readahead by setting the readahead window to zero using posix_fadvise64(POSIX_FADV_RANDOM), and to special-case that in do_generic_mapping_read to completely bypass the readahead logic and instead read in all the pages needed directly. This was the idea I started with. But then I thought, we can do a still better job ? How about adapting the readahead algorithm to lazy-read or non-lazy-read based on the past i/o patterns ? The overall idea is to keep track of average number of contiguous pages accessed in a file. If the average at any given time is above ra->pages the pattern is sequential. If not the pattern is random. If pattern is sequential do non-lazy-readahead( read as soon as the first page in the active window is touched) else do lazy-readahead. I have studied the behaviour of this patch using my user-level simulator. It adapts pretty well. Note from Suparna: This appears to bring streaming AIO read performance for large (64KB) random AIO reads back to sane values (since the lazy readahead backout in the mainline). ChangeSet@1.1608.78.19, 2004-03-12 08:11:54-08:00, akpm@osdl.org [PATCH] readdir() cleanups From: cramfs and freevxfs explicitly mark themselves readonly (as other r/o fs do). afs marked noatime (ACKed by maintainer) filesystems that do not do update_atime() in their ->readdir() had been explicitly marked nodiratime. NOTE: cifs, coda and ncpfs almost certainly need full noatime as we currently have in nfs and afs. update_atime() call shifted to callers of ->readdir() and out of ->readdir() instances. Bugs caught: dcache_readdir() updated atime only if it reached EOF. bfs_readdir() - ditto. qnx4_readdir() - ditto. ChangeSet@1.1608.78.18, 2004-03-12 08:11:41-08:00, akpm@osdl.org [PATCH] Clean up sys_ioperm stubs From: Brian Gerst Remove stubs for sys_ioperm for non-x86 arches, using sys_ni_syscall instead where applicable. Support for sys_ioperm is unconditionally no for non-x86 arches. ChangeSet@1.1608.78.17, 2004-03-12 08:11:31-08:00, akpm@osdl.org [PATCH] ppc64: fix initialisation of NUMA arrays From: Anton Blanchard We were hitting problems on machines with cpu_possible != cpu_online when NUMA was enabled. The debug checks would trip during scheduler init because we iterate through all possible cpus whereas we only set up NUMA information for online cpus. Longer term we should have a cpu_up hook which sets up its NUMA information but for now we initalise all possible cpus and memory to node 0. ChangeSet@1.1608.78.16, 2004-03-12 08:11:20-08:00, akpm@osdl.org [PATCH] print kernel version in oops messages From: Arjan van de Ven Unfortunatly a large portion of the oops reports lack the basic information about what kernel version the oops is for; it's trivial to just print this in the oops as well to improve the usefulness of bugreports... ChangeSet@1.1608.78.15, 2004-03-12 07:58:24-08:00, benh@kernel.crashing.org [PATCH] ppc32: Fix G5 config space access lockup Fix a typo in the code that prevents lockup on config space access to sleeping devices on ppc32/G5. Please apply. ChangeSet@1.1608.78.14, 2004-03-12 07:57:24-08:00, anton@samba.org [PATCH] fix ppc64 in kernel syscalls Thanks to some great debugging work by Olaf Hering and Marcus Meissner it has been noticed that the current ppc64 syscall code is corrupting 4 bytes past errno. Why we even bothered to set errno beats me, its unusable in the kernel. Since we had to reinstate the inline syscall code we can go back to using it for those few syscalls that we call. Especially now with Randy's syscall prototype cleanup we should be calling them directly but we can do that sometime later. ChangeSet@1.1608.78.13, 2004-03-12 07:56:51-08:00, axboe@suse.de [PATCH] CDROMREADAUDIO dma support This small patch builds on top of the blk_rq_map_user() patch just sent, and enables us to easily support DMA for CDROMREADAUDIO cdda extraction. It's quite amazing how much cool stuff you can with the new block layer :-) Patch has intelligent fall back from multi frame dma to single frame dma, and further to old-style pio ripping in case of hardware problems. ChangeSet@1.1608.78.12, 2004-03-12 07:56:40-08:00, axboe@suse.de [PATCH] user data -> request mapping This patch allows you to map a request with user data for io, similarly to what you can do with bio_map_user() already to a bio. However, this goes one step further and populates the request so the user only has to fill in the cdb (almost) and put it on the queue for execution. Patch converts sg_io() to use it, next patch I'll send adapts cdrom layer to use it for zero copy cdda dma extraction. ChangeSet@1.1608.78.11, 2004-03-12 07:56:29-08:00, sfr@canb.auug.org.au [PATCH] fix PPC64 iSeries virtual console devices While playing with udev, I discovered that the virtual console devices on iSeries had there minor numbers off by one i.e. /dev/tty1 was minor 2! This fixes it. ChangeSet@1.1608.76.39, 2004-03-11 20:48:48-08:00, jbarnes@sgi.com [PATCH] ia64: fix misc. sn2 warnings This patch fixes a few warnings that have cropped up in the sn2 code: - hwgfs function prototype mismatch - pconn uninitialized in pciio.c - printk formatting fixes in pcibr_dvr.c - kill volatile qualifier in pcibr_intr.c ChangeSet@1.1608.76.38, 2004-03-11 20:44:38-08:00, willy@debian.org [PATCH] ia64: Convert to use the generic drivers/Kconfig mechanism. ChangeSet@1.1608.76.37, 2004-03-10 23:27:39-08:00, kaneshige.kenji@jp.fujitsu.com ia64: don't unmask iosapic interrupts by default In ia64 kernel, IOSAPIC's RTEs for PCI interrupts are unmasked at the boot time before installing device drivers. I think it is very dangerous. If some PCI devices without device driver generate interrupts, interrupts are generated repeatedly because these interrupt requests are never cleared. I think RTEs for PCI interrupts should be unmasked by device driver. This patch fixes the problem. ChangeSet@1.1608.78.10, 2004-03-10 21:02:29-08:00, ak@suse.de [PATCH] Fix a 64bit bug in kobject module request From Takashi Iwai kobj_lookup had a 64bit bug, which caused the request of a unknown character device to burn CPU instead of failing quickly. ChangeSet@1.1608.78.9, 2004-03-10 21:02:19-08:00, ak@suse.de [PATCH] x86-64 merge for 2.6.4 The biggest new feature is fixed 32bit vsyscall (SYSCALL+SYSENTER) support, mostly from Jakub Jelinek. This increases 32bit syscall performance greatly (latency halved and better). The SYSENTER for Intel support required some infrastructure changes, but seems to work now too. The 64bit vsyscall vtime() just references xtime.tv_sec now. This should make it a lot faster too. A fix for some Intel IA32e systems. Also a few long standing bugs in NMI like exception handlers were fixed. And a lot of other bug fixes. Full changeLog: - Clean up 32bit address room limit handling, fix 3gb personality - Move memcpy_{from,to}io export to ksyms.c file. This seems to work around a toolchain bug (Andreas Gruenbacher) - Update defconfig - ACPI merges from i386 (SBF should work now, acpi=strict) - Implement mmconfig support based on i386 code (untested) - Fix i386/x86-64 pci source file sharing - Implement ptrace access for 32bit vsyscall page - Always initialize all 32bit SYSENTER/SYSCALL MSRs. - Export run time cache line size to generic kernel - Remove explicit CPUID in ia32 syscall code - Fill in most of boot_cpu_data early - Remove unused PER_LINUX32 setup - Fix syscall trace in fast 32bit calls (Suresh B. Siddha) - Tighten first line of the oops again. - Set up ptrace registers correctly for debug,ss,double fault exceptions - Fix 64bit bug in sys_time64 - Optimize time syscall/vsyscall to only read xtime - Fix csum_partial_copy_nocheck - Remove last traces of FPU emulation - Check properly for rescheduling in exceptions with own stack - Harden exception stack entries (#SS,#NMI,#MC,#DF,#DB) against bogus GS - Use an exception stack for machine checks - Handle TIF_SINGLESTEP properly in kernel exit - Add exception stack for debug handler - Disable X86_HT for Opteron optimized builds because it pulls in ACPI_BOOT - Fix CONFIG_ACPI_BOOT compilation without CONFIG_ACPI - Fix eflags handling in SYSENTER path (Jakub Jelinek) - Use atomic counter for enable/disable_hlt - Support 32bit SYSENTER vsyscall too (Jakub Jelinek) - Don't redefine Dprintk - Change some cpu/apic id arrays to char - Support arbitary cpu<->apicid in hard_smp_processor_id (Surresh B Sidda) - Move K8 erratum #100 workaround into slow path of page fault handler. - Fix 32bit cdrom direct access ioctls (Jens Axboe) - Enable 32bit vsyscalls by default - Fix 32bit vsyscalls (Jakub Jelinek) ChangeSet@1.1608.80.1, 2004-03-10 23:31:03-05:00, jejb@mulgrave.(none) Merge qboosh/emoore conflict ChangeSet@1.1608.78.8, 2004-03-10 20:10:29-08:00, benh@kernel.crashing.org [PATCH] G5 temperature control update This makes the temperature control code more robust, putting less pressure on i2c, and work around occasional misconfiguration of the ADC chips leading to incorrect temperature readings. ChangeSet@1.1611, 2004-03-10 23:10:00-05:00, jgarzik@redhat.com Merge redhat.com:/spare/repo/linux-2.5 into redhat.com:/spare/repo/netdev-2.6/netpoll ChangeSet@1.1608.78.7, 2004-03-10 18:52:41-08:00, torvalds@ppc970.osdl.org Linux 2.6.4 TAG: v2.6.4