Skip to content

Commit

Permalink
Merge tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.o…
Browse files Browse the repository at this point in the history
…rg/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "15 hotfixes. 9 are cc:stable and the remainder address post-6.8 issues
  or aren't considered suitable for backporting.

  There are a significant number of fixups for this cycle's page_owner
  changes (series "page_owner: print stacks and their outstanding
  allocations"). Apart from that, singleton changes all over, mainly in
  MM"

* tag 'mm-hotfixes-stable-2024-04-18-14-41' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  nilfs2: fix OOB in nilfs_set_de_type
  MAINTAINERS: update Naoya Horiguchi's email address
  fork: defer linking file vma until vma is fully initialized
  mm/shmem: inline shmem_is_huge() for disabled transparent hugepages
  mm,page_owner: defer enablement of static branch
  Squashfs: check the inode number is not the invalid value of zero
  mm,swapops: update check in is_pfn_swap_entry for hwpoison entries
  mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
  mm/userfaultfd: allow hugetlb change protection upon poison entry
  mm,page_owner: fix printing of stack records
  mm,page_owner: fix accounting of pages when migrating
  mm,page_owner: fix refcount imbalance
  mm,page_owner: update metadata for tail pages
  userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE
  mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly
  • Loading branch information
torvalds committed Apr 19, 2024
2 parents 2668e3a + c4a7dc9 commit 54c2354
Show file tree
Hide file tree
Showing 16 changed files with 280 additions and 223 deletions.
3 changes: 2 additions & 1 deletion .mailmap
Original file line number Diff line number Diff line change
Expand Up @@ -446,7 +446,8 @@ Mythri P K <[email protected]>
Nadav Amit <[email protected]> <[email protected]>
Nadav Amit <[email protected]> <[email protected]>
Nadia Yvette Chambers <[email protected]> William Lee Irwin III <[email protected]>
Naoya Horiguchi <[email protected]> <[email protected]>
Naoya Horiguchi <[email protected]> <[email protected]>
Naoya Horiguchi <[email protected]> <[email protected]>
Nathan Chancellor <[email protected]> <[email protected]>
Neeraj Upadhyay <[email protected]> <[email protected]>
Neil Armstrong <[email protected]> <[email protected]>
Expand Down
73 changes: 38 additions & 35 deletions Documentation/mm/page_owner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ fragmentation statistics can be obtained through gfp flag information of
each page. It is already implemented and activated if page owner is
enabled. Other usages are more than welcome.

It can also be used to show all the stacks and their outstanding
allocations, which gives us a quick overview of where the memory is going
without the need to screen through all the pages and match the allocation
and free operation.
It can also be used to show all the stacks and their current number of
allocated base pages, which gives us a quick overview of where the memory
is going without the need to screen through all the pages and match the
allocation and free operation.

page owner is disabled by default. So, if you'd like to use it, you need
to add "page_owner=on" to your boot cmdline. If the kernel is built
Expand Down Expand Up @@ -75,42 +75,45 @@ Usage

cat /sys/kernel/debug/page_owner_stacks/show_stacks > stacks.txt
cat stacks.txt
prep_new_page+0xa9/0x120
get_page_from_freelist+0x7e6/0x2140
__alloc_pages+0x18a/0x370
new_slab+0xc8/0x580
___slab_alloc+0x1f2/0xaf0
__slab_alloc.isra.86+0x22/0x40
kmem_cache_alloc+0x31b/0x350
__khugepaged_enter+0x39/0x100
dup_mmap+0x1c7/0x5ce
copy_process+0x1afe/0x1c90
kernel_clone+0x9a/0x3c0
__do_sys_clone+0x66/0x90
do_syscall_64+0x7f/0x160
entry_SYSCALL_64_after_hwframe+0x6c/0x74
stack_count: 234
post_alloc_hook+0x177/0x1a0
get_page_from_freelist+0xd01/0xd80
__alloc_pages+0x39e/0x7e0
allocate_slab+0xbc/0x3f0
___slab_alloc+0x528/0x8a0
kmem_cache_alloc+0x224/0x3b0
sk_prot_alloc+0x58/0x1a0
sk_alloc+0x32/0x4f0
inet_create+0x427/0xb50
__sock_create+0x2e4/0x650
inet_ctl_sock_create+0x30/0x180
igmp_net_init+0xc1/0x130
ops_init+0x167/0x410
setup_net+0x304/0xa60
copy_net_ns+0x29b/0x4a0
create_new_namespaces+0x4a1/0x820
nr_base_pages: 16
...
...
echo 7000 > /sys/kernel/debug/page_owner_stacks/count_threshold
cat /sys/kernel/debug/page_owner_stacks/show_stacks> stacks_7000.txt
cat stacks_7000.txt
prep_new_page+0xa9/0x120
get_page_from_freelist+0x7e6/0x2140
__alloc_pages+0x18a/0x370
alloc_pages_mpol+0xdf/0x1e0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb0/0x100
page_cache_ra_unbounded+0x97/0x180
filemap_fault+0x4b4/0x1200
__do_fault+0x2d/0x110
do_pte_missing+0x4b0/0xa30
__handle_mm_fault+0x7fa/0xb70
handle_mm_fault+0x125/0x300
do_user_addr_fault+0x3c9/0x840
exc_page_fault+0x68/0x150
asm_exc_page_fault+0x22/0x30
stack_count: 8248
post_alloc_hook+0x177/0x1a0
get_page_from_freelist+0xd01/0xd80
__alloc_pages+0x39e/0x7e0
alloc_pages_mpol+0x22e/0x490
folio_alloc+0xd5/0x110
filemap_alloc_folio+0x78/0x230
page_cache_ra_order+0x287/0x6f0
filemap_get_pages+0x517/0x1160
filemap_read+0x304/0x9f0
xfs_file_buffered_read+0xe6/0x1d0 [xfs]
xfs_file_read_iter+0x1f0/0x380 [xfs]
__kernel_read+0x3b9/0x730
kernel_read_file+0x309/0x4d0
__do_sys_finit_module+0x381/0x730
do_syscall_64+0x8d/0x150
entry_SYSCALL_64_after_hwframe+0x62/0x6a
nr_base_pages: 20824
...

cat /sys/kernel/debug/page_owner > page_owner_full.txt
Expand Down
2 changes: 1 addition & 1 deletion MAINTAINERS
Original file line number Diff line number Diff line change
Expand Up @@ -10024,7 +10024,7 @@ F: drivers/media/platform/st/sti/hva

HWPOISON MEMORY FAILURE HANDLING
M: Miaohe Lin <[email protected]>
R: Naoya Horiguchi <naoya.horiguchi@nec.com>
R: Naoya Horiguchi <nao.horiguchi@gmail.com>
L: [email protected]
S: Maintained
F: mm/hwpoison-inject.c
Expand Down
2 changes: 1 addition & 1 deletion fs/nilfs2/dir.c
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ nilfs_filetype_table[NILFS_FT_MAX] = {

#define S_SHIFT 12
static unsigned char
nilfs_type_by_mode[S_IFMT >> S_SHIFT] = {
nilfs_type_by_mode[(S_IFMT >> S_SHIFT) + 1] = {
[S_IFREG >> S_SHIFT] = NILFS_FT_REG_FILE,
[S_IFDIR >> S_SHIFT] = NILFS_FT_DIR,
[S_IFCHR >> S_SHIFT] = NILFS_FT_CHRDEV,
Expand Down
5 changes: 4 additions & 1 deletion fs/squashfs/inode.c
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ static int squashfs_new_inode(struct super_block *sb, struct inode *inode,
gid_t i_gid;
int err;

inode->i_ino = le32_to_cpu(sqsh_ino->inode_number);
if (inode->i_ino == 0)
return -EINVAL;

err = squashfs_get_id(sb, le16_to_cpu(sqsh_ino->uid), &i_uid);
if (err)
return err;
Expand All @@ -58,7 +62,6 @@ static int squashfs_new_inode(struct super_block *sb, struct inode *inode,

i_uid_write(inode, i_uid);
i_gid_write(inode, i_gid);
inode->i_ino = le32_to_cpu(sqsh_ino->inode_number);
inode_set_mtime(inode, le32_to_cpu(sqsh_ino->mtime), 0);
inode_set_atime(inode, inode_get_mtime_sec(inode), 0);
inode_set_ctime(inode, inode_get_mtime_sec(inode), 0);
Expand Down
9 changes: 9 additions & 0 deletions include/linux/shmem_fs.h
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,17 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
int shmem_unuse(unsigned int type);

#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
struct mm_struct *mm, unsigned long vm_flags);
#else
static __always_inline bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
struct mm_struct *mm, unsigned long vm_flags)
{
return false;
}
#endif

#ifdef CONFIG_SHMEM
extern unsigned long shmem_swap_usage(struct vm_area_struct *vma);
#else
Expand Down
65 changes: 33 additions & 32 deletions include/linux/swapops.h
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,35 @@ static inline bool is_migration_entry_dirty(swp_entry_t entry)
}
#endif /* CONFIG_MIGRATION */

#ifdef CONFIG_MEMORY_FAILURE

/*
* Support for hardware poisoned pages
*/
static inline swp_entry_t make_hwpoison_entry(struct page *page)
{
BUG_ON(!PageLocked(page));
return swp_entry(SWP_HWPOISON, page_to_pfn(page));
}

static inline int is_hwpoison_entry(swp_entry_t entry)
{
return swp_type(entry) == SWP_HWPOISON;
}

#else

static inline swp_entry_t make_hwpoison_entry(struct page *page)
{
return swp_entry(0, 0);
}

static inline int is_hwpoison_entry(swp_entry_t swp)
{
return 0;
}
#endif

typedef unsigned long pte_marker;

#define PTE_MARKER_UFFD_WP BIT(0)
Expand Down Expand Up @@ -483,16 +512,17 @@ static inline struct folio *pfn_swap_entry_folio(swp_entry_t entry)

/*
* A pfn swap entry is a special type of swap entry that always has a pfn stored
* in the swap offset. They are used to represent unaddressable device memory
* and to restrict access to a page undergoing migration.
* in the swap offset. They can either be used to represent unaddressable device
* memory, to restrict access to a page undergoing migration or to represent a
* pfn which has been hwpoisoned and unmapped.
*/
static inline bool is_pfn_swap_entry(swp_entry_t entry)
{
/* Make sure the swp offset can always store the needed fields */
BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS);

return is_migration_entry(entry) || is_device_private_entry(entry) ||
is_device_exclusive_entry(entry);
is_device_exclusive_entry(entry) || is_hwpoison_entry(entry);
}

struct page_vma_mapped_walk;
Expand Down Expand Up @@ -561,35 +591,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
}
#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */

#ifdef CONFIG_MEMORY_FAILURE

/*
* Support for hardware poisoned pages
*/
static inline swp_entry_t make_hwpoison_entry(struct page *page)
{
BUG_ON(!PageLocked(page));
return swp_entry(SWP_HWPOISON, page_to_pfn(page));
}

static inline int is_hwpoison_entry(swp_entry_t entry)
{
return swp_type(entry) == SWP_HWPOISON;
}

#else

static inline swp_entry_t make_hwpoison_entry(struct page *page)
{
return swp_entry(0, 0);
}

static inline int is_hwpoison_entry(swp_entry_t swp)
{
return 0;
}
#endif

static inline int non_swap_entry(swp_entry_t entry)
{
return swp_type(entry) >= MAX_SWAPFILES;
Expand Down
33 changes: 17 additions & 16 deletions kernel/fork.c
Original file line number Diff line number Diff line change
Expand Up @@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
/*
* Copy/update hugetlb private vma information.
*/
if (is_vm_hugetlb_page(tmp))
hugetlb_dup_vma_private(tmp);

/*
* Link the vma into the MT. After using __mt_dup(), memory
* allocation is not necessary here, so it cannot fail.
*/
vma_iter_bulk_store(&vmi, tmp);

mm->map_count++;

if (tmp->vm_ops && tmp->vm_ops->open)
tmp->vm_ops->open(tmp);

file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
Expand All @@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
i_mmap_unlock_write(mapping);
}

/*
* Copy/update hugetlb private vma information.
*/
if (is_vm_hugetlb_page(tmp))
hugetlb_dup_vma_private(tmp);

/*
* Link the vma into the MT. After using __mt_dup(), memory
* allocation is not necessary here, so it cannot fail.
*/
vma_iter_bulk_store(&vmi, tmp);

mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
retval = copy_page_range(tmp, mpnt);

if (tmp->vm_ops && tmp->vm_ops->open)
tmp->vm_ops->open(tmp);

if (retval) {
mpnt = vma_next(&vmi);
goto loop_out;
Expand Down
54 changes: 32 additions & 22 deletions mm/gup.c
Original file line number Diff line number Diff line change
Expand Up @@ -1206,6 +1206,22 @@ static long __get_user_pages(struct mm_struct *mm,

/* first iteration or cross vma bound */
if (!vma || start >= vma->vm_end) {
/*
* MADV_POPULATE_(READ|WRITE) wants to handle VMA
* lookups+error reporting differently.
*/
if (gup_flags & FOLL_MADV_POPULATE) {
vma = vma_lookup(mm, start);
if (!vma) {
ret = -ENOMEM;
goto out;
}
if (check_vma_flags(vma, gup_flags)) {
ret = -EINVAL;
goto out;
}
goto retry;
}
vma = gup_vma_lookup(mm, start);
if (!vma && in_gate_area(mm, start)) {
ret = get_gate_page(mm, start & PAGE_MASK,
Expand Down Expand Up @@ -1685,35 +1701,35 @@ long populate_vma_page_range(struct vm_area_struct *vma,
}

/*
* faultin_vma_page_range() - populate (prefault) page tables inside the
* given VMA range readable/writable
* faultin_page_range() - populate (prefault) page tables inside the
* given range readable/writable
*
* This takes care of mlocking the pages, too, if VM_LOCKED is set.
*
* @vma: target vma
* @mm: the mm to populate page tables in
* @start: start address
* @end: end address
* @write: whether to prefault readable or writable
* @locked: whether the mmap_lock is still held
*
* Returns either number of processed pages in the vma, or a negative error
* code on error (see __get_user_pages()).
* Returns either number of processed pages in the MM, or a negative error
* code on error (see __get_user_pages()). Note that this function reports
* errors related to VMAs, such as incompatible mappings, as expected by
* MADV_POPULATE_(READ|WRITE).
*
* vma->vm_mm->mmap_lock must be held. The range must be page-aligned and
* covered by the VMA. If it's released, *@locked will be set to 0.
* The range must be page-aligned.
*
* mm->mmap_lock must be held. If it's released, *@locked will be set to 0.
*/
long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, bool write, int *locked)
long faultin_page_range(struct mm_struct *mm, unsigned long start,
unsigned long end, bool write, int *locked)
{
struct mm_struct *mm = vma->vm_mm;
unsigned long nr_pages = (end - start) / PAGE_SIZE;
int gup_flags;
long ret;

VM_BUG_ON(!PAGE_ALIGNED(start));
VM_BUG_ON(!PAGE_ALIGNED(end));
VM_BUG_ON_VMA(start < vma->vm_start, vma);
VM_BUG_ON_VMA(end > vma->vm_end, vma);
mmap_assert_locked(mm);

/*
Expand All @@ -1725,19 +1741,13 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
* a poisoned page.
* !FOLL_FORCE: Require proper access permissions.
*/
gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE;
gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE |
FOLL_MADV_POPULATE;
if (write)
gup_flags |= FOLL_WRITE;

/*
* We want to report -EINVAL instead of -EFAULT for any permission
* problems or incompatible mappings.
*/
if (check_vma_flags(vma, gup_flags))
return -EINVAL;

ret = __get_user_pages(mm, start, nr_pages, gup_flags,
NULL, locked);
ret = __get_user_pages_locked(mm, start, nr_pages, NULL, locked,
gup_flags);
lru_add_drain();
return ret;
}
Expand Down
Loading

0 comments on commit 54c2354

Please sign in to comment.