Skip to content

Commit

Permalink
Merge tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache
Browse files Browse the repository at this point in the history
Pull memory folios from Matthew Wilcox:
 "Add memory folios, a new type to represent either order-0 pages or the
  head page of a compound page. This should be enough infrastructure to
  support filesystems converting from pages to folios.

  The point of all this churn is to allow filesystems and the page cache
  to manage memory in larger chunks than PAGE_SIZE. The original plan
  was to use compound pages like THP does, but I ran into problems with
  some functions expecting only a head page while others expect the
  precise page containing a particular byte.

  The folio type allows a function to declare that it's expecting only a
  head page. Almost incidentally, this allows us to remove various calls
  to VM_BUG_ON(PageTail(page)) and compound_head().

  This converts just parts of the core MM and the page cache. For 5.17,
  we intend to convert various filesystems (XFS and AFS are ready; other
  filesystems may make it) and also convert more of the MM and page
  cache to folios. For 5.18, multi-page folios should be ready.

  The multi-page folios offer some improvement to some workloads. The
  80% win is real, but appears to be an artificial benchmark (postgres
  startup, which isn't a serious workload). Real workloads (eg building
  the kernel, running postgres in a steady state, etc) seem to benefit
  between 0-10%. I haven't heard of any performance losses as a result
  of this series. Nobody has done any serious performance tuning; I
  imagine that tweaking the readahead algorithm could provide some more
  interesting wins. There are also other places where we could choose to
  create large folios and currently do not, such as writes that are
  larger than PAGE_SIZE.

  I'd like to thank all my reviewers who've offered review/ack tags:
  Christoph Hellwig, David Howells, Jan Kara, Jeff Layton, Johannes
  Weiner, Kirill A. Shutemov, Michal Hocko, Mike Rapoport, Vlastimil
  Babka, William Kucharski, Yu Zhao and Zi Yan.

  I'd also like to thank those who gave feedback I incorporated but
  haven't offered up review tags for this part of the series: Nick
  Piggin, Mel Gorman, Ming Lei, Darrick Wong, Ted Ts'o, John Hubbard,
  Hugh Dickins, and probably a few others who I forget"

* tag 'folio-5.16' of git://git.infradead.org/users/willy/pagecache: (90 commits)
  mm/writeback: Add folio_write_one
  mm/filemap: Add FGP_STABLE
  mm/filemap: Add filemap_get_folio
  mm/filemap: Convert mapping_get_entry to return a folio
  mm/filemap: Add filemap_add_folio()
  mm/filemap: Add filemap_alloc_folio
  mm/page_alloc: Add folio allocation functions
  mm/lru: Add folio_add_lru()
  mm/lru: Convert __pagevec_lru_add_fn to take a folio
  mm: Add folio_evictable()
  mm/workingset: Convert workingset_refault() to take a folio
  mm/filemap: Add readahead_folio()
  mm/filemap: Add folio_mkwrite_check_truncate()
  mm/filemap: Add i_blocks_per_folio()
  mm/writeback: Add folio_redirty_for_writepage()
  mm/writeback: Add folio_account_redirty()
  mm/writeback: Add folio_clear_dirty_for_io()
  mm/writeback: Add folio_cancel_dirty()
  mm/writeback: Add folio_account_cleaned()
  mm/writeback: Add filemap_dirty_folio()
  ...
  • Loading branch information
torvalds committed Nov 1, 2021
2 parents 8bb7eca + 121703c commit 49f8275
Show file tree
Hide file tree
Showing 74 changed files with 2,914 additions and 1,703 deletions.
6 changes: 6 additions & 0 deletions Documentation/core-api/cachetlb.rst
Original file line number Diff line number Diff line change
Expand Up @@ -326,6 +326,12 @@ maps this page at its virtual address.
dirty. Again, see sparc64 for examples of how
to deal with this.

``void flush_dcache_folio(struct folio *folio)``
This function is called under the same circumstances as
flush_dcache_page(). It allows the architecture to
optimise for flushing the entire folio of pages instead
of flushing one page at a time.

``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long user_vaddr, void *dst, void *src, int len)``
``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
Expand Down
5 changes: 5 additions & 0 deletions Documentation/core-api/mm-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,11 @@ More Memory Management Functions
.. kernel-doc:: mm/mempolicy.c
.. kernel-doc:: include/linux/mm_types.h
:internal:
.. kernel-doc:: include/linux/mm_inline.h
.. kernel-doc:: include/linux/page-flags.h
.. kernel-doc:: include/linux/mm.h
:internal:
.. kernel-doc:: include/linux/page_ref.h
.. kernel-doc:: include/linux/mmzone.h
.. kernel-doc:: mm/util.c
:functions: folio_mapping
2 changes: 2 additions & 0 deletions Documentation/filesystems/netfs_library.rst
Original file line number Diff line number Diff line change
Expand Up @@ -524,3 +524,5 @@ Note that these methods are passed a pointer to the cache resource structure,
not the read request structure as they could be used in other situations where
there isn't a read request structure as well, such as writing dirty data to the
cache.

.. kernel-doc:: include/linux/netfs.h
1 change: 1 addition & 0 deletions arch/arc/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ void __flush_dcache_page(phys_addr_t paddr, unsigned long vaddr);
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1

void flush_dcache_page(struct page *page);
void flush_dcache_folio(struct folio *folio);

void dma_cache_wback_inv(phys_addr_t start, unsigned long sz);
void dma_cache_inv(phys_addr_t start, unsigned long sz);
Expand Down
1 change: 1 addition & 0 deletions arch/arm/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -290,6 +290,7 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
*/
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *);
void flush_dcache_folio(struct folio *folio);

#define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1
static inline void flush_kernel_vmap_range(void *addr, int size)
Expand Down
1 change: 1 addition & 0 deletions arch/m68k/include/asm/cacheflush_mm.h
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ static inline void __flush_page_to_ram(void *vaddr)

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
#define flush_dcache_page(page) __flush_page_to_ram(page_address(page))
void flush_dcache_folio(struct folio *folio);
#define flush_dcache_mmap_lock(mapping) do { } while (0)
#define flush_dcache_mmap_unlock(mapping) do { } while (0)
#define flush_icache_page(vma, page) __flush_page_to_ram(page_address(page))
Expand Down
2 changes: 2 additions & 0 deletions arch/mips/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ static inline void flush_dcache_page(struct page *page)
SetPageDcacheDirty(page);
}

void flush_dcache_folio(struct folio *folio);

#define flush_dcache_mmap_lock(mapping) do { } while (0)
#define flush_dcache_mmap_unlock(mapping) do { } while (0)

Expand Down
1 change: 1 addition & 0 deletions arch/nds32/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ void flush_cache_vunmap(unsigned long start, unsigned long end);

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
void flush_dcache_page(struct page *page);
void flush_dcache_folio(struct folio *folio);
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
unsigned long vaddr, void *dst, void *src, int len);
void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
Expand Down
3 changes: 2 additions & 1 deletion arch/nios2/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ extern void flush_cache_range(struct vm_area_struct *vma, unsigned long start,
extern void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr,
unsigned long pfn);
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *page);
void flush_dcache_page(struct page *page);
void flush_dcache_folio(struct folio *folio);

extern void flush_icache_range(unsigned long start, unsigned long end);
extern void flush_icache_page(struct vm_area_struct *vma, struct page *page);
Expand Down
3 changes: 2 additions & 1 deletion arch/parisc/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ void invalidate_kernel_vmap_range(void *vaddr, int size);
#define flush_cache_vunmap(start, end) flush_cache_all()

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *page);
void flush_dcache_page(struct page *page);
void flush_dcache_folio(struct folio *folio);

#define flush_dcache_mmap_lock(mapping) xa_lock_irq(&mapping->i_pages)
#define flush_dcache_mmap_unlock(mapping) xa_unlock_irq(&mapping->i_pages)
Expand Down
3 changes: 2 additions & 1 deletion arch/sh/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ extern void flush_cache_page(struct vm_area_struct *vma,
extern void flush_cache_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page *page);
void flush_dcache_page(struct page *page);
void flush_dcache_folio(struct folio *folio);
extern void flush_icache_range(unsigned long start, unsigned long end);
#define flush_icache_user_range flush_icache_range
extern void flush_icache_page(struct vm_area_struct *vma,
Expand Down
5 changes: 4 additions & 1 deletion arch/xtensa/include/asm/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,8 @@ void flush_cache_page(struct vm_area_struct*,
#define flush_cache_vunmap(start,end) flush_cache_all()

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1
extern void flush_dcache_page(struct page*);
void flush_dcache_page(struct page *);
void flush_dcache_folio(struct folio *);

void local_flush_cache_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
Expand All @@ -137,7 +138,9 @@ void local_flush_cache_page(struct vm_area_struct *vma,
#define flush_cache_vunmap(start,end) do { } while (0)

#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
#define flush_dcache_page(page) do { } while (0)
static inline void flush_dcache_folio(struct folio *folio) { }

#define flush_icache_range local_flush_icache_range
#define flush_cache_page(vma, addr, pfn) do { } while (0)
Expand Down
9 changes: 5 additions & 4 deletions fs/afs/write.c
Original file line number Diff line number Diff line change
Expand Up @@ -861,7 +861,8 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
*/
vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
{
struct page *page = thp_head(vmf->page);
struct folio *folio = page_folio(vmf->page);
struct page *page = &folio->page;
struct file *file = vmf->vma->vm_file;
struct inode *inode = file_inode(file);
struct afs_vnode *vnode = AFS_FS_I(inode);
Expand All @@ -884,7 +885,7 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
goto out;
#endif

if (wait_on_page_writeback_killable(page))
if (folio_wait_writeback_killable(folio))
goto out;

if (lock_page_killable(page) < 0)
Expand All @@ -894,8 +895,8 @@ vm_fault_t afs_page_mkwrite(struct vm_fault *vmf)
* details the portion of the page we need to write back and we might
* need to redirty the page if there's a problem.
*/
if (wait_on_page_writeback_killable(page) < 0) {
unlock_page(page);
if (folio_wait_writeback_killable(folio) < 0) {
folio_unlock(folio);
goto out;
}

Expand Down
16 changes: 8 additions & 8 deletions fs/cachefiles/rdwr.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,20 +25,20 @@ static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
struct cachefiles_object *object;
struct fscache_retrieval *op = monitor->op;
struct wait_page_key *key = _key;
struct page *page = wait->private;
struct folio *folio = wait->private;

ASSERT(key);

_enter("{%lu},%u,%d,{%p,%u}",
monitor->netfs_page->index, mode, sync,
key->page, key->bit_nr);
key->folio, key->bit_nr);

if (key->page != page || key->bit_nr != PG_locked)
if (key->folio != folio || key->bit_nr != PG_locked)
return 0;

_debug("--- monitor %p %lx ---", page, page->flags);
_debug("--- monitor %p %lx ---", folio, folio->flags);

if (!PageUptodate(page) && !PageError(page)) {
if (!folio_test_uptodate(folio) && !folio_test_error(folio)) {
/* unlocked, not uptodate and not erronous? */
_debug("page probably truncated");
}
Expand Down Expand Up @@ -107,7 +107,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object,
put_page(backpage2);

INIT_LIST_HEAD(&monitor->op_link);
add_page_wait_queue(backpage, &monitor->monitor);
folio_add_wait_queue(page_folio(backpage), &monitor->monitor);

if (trylock_page(backpage)) {
ret = -EIO;
Expand Down Expand Up @@ -294,7 +294,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
get_page(backpage);
monitor->back_page = backpage;
monitor->monitor.private = backpage;
add_page_wait_queue(backpage, &monitor->monitor);
folio_add_wait_queue(page_folio(backpage), &monitor->monitor);
monitor = NULL;

/* but the page may have been read before the monitor was installed, so
Expand Down Expand Up @@ -548,7 +548,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
get_page(backpage);
monitor->back_page = backpage;
monitor->monitor.private = backpage;
add_page_wait_queue(backpage, &monitor->monitor);
folio_add_wait_queue(page_folio(backpage), &monitor->monitor);
monitor = NULL;

/* but the page may have been read before the monitor was
Expand Down
2 changes: 1 addition & 1 deletion fs/io_uring.c
Original file line number Diff line number Diff line change
Expand Up @@ -3356,7 +3356,7 @@ static int io_read_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
}

/*
* This is our waitqueue callback handler, registered through lock_page_async()
* This is our waitqueue callback handler, registered through __folio_lock_async()
* when we initially tried to do the IO with the iocb armed our waitqueue.
* This gets called when the page is unlocked, and we generally expect that to
* happen when the page IO is completed and the page is now uptodate. This will
Expand Down
1 change: 1 addition & 0 deletions fs/jfs/jfs_metapage.c
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include <linux/buffer_head.h>
#include <linux/mempool.h>
#include <linux/seq_file.h>
#include <linux/writeback.h>
#include "jfs_incore.h"
#include "jfs_superblock.h"
#include "jfs_filsys.h"
Expand Down
6 changes: 6 additions & 0 deletions include/asm-generic/cacheflush.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,15 @@ static inline void flush_cache_page(struct vm_area_struct *vma,
static inline void flush_dcache_page(struct page *page)
{
}

static inline void flush_dcache_folio(struct folio *folio) { }
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 0
#define ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
#endif

#ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO
void flush_dcache_folio(struct folio *folio);
#endif

#ifndef flush_dcache_mmap_lock
static inline void flush_dcache_mmap_lock(struct address_space *mapping)
Expand Down
6 changes: 3 additions & 3 deletions include/linux/backing-dev.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,20 +64,20 @@ static inline bool bdi_has_dirty_io(struct backing_dev_info *bdi)
return atomic_long_read(&bdi->tot_write_bandwidth);
}

static inline void __add_wb_stat(struct bdi_writeback *wb,
static inline void wb_stat_mod(struct bdi_writeback *wb,
enum wb_stat_item item, s64 amount)
{
percpu_counter_add_batch(&wb->stat[item], amount, WB_STAT_BATCH);
}

static inline void inc_wb_stat(struct bdi_writeback *wb, enum wb_stat_item item)
{
__add_wb_stat(wb, item, 1);
wb_stat_mod(wb, item, 1);
}

static inline void dec_wb_stat(struct bdi_writeback *wb, enum wb_stat_item item)
{
__add_wb_stat(wb, item, -1);
wb_stat_mod(wb, item, -1);
}

static inline s64 wb_stat(struct bdi_writeback *wb, enum wb_stat_item item)
Expand Down
9 changes: 5 additions & 4 deletions include/linux/flex_proportions.h
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,10 @@ struct fprop_local_percpu {

int fprop_local_init_percpu(struct fprop_local_percpu *pl, gfp_t gfp);
void fprop_local_destroy_percpu(struct fprop_local_percpu *pl);
void __fprop_inc_percpu(struct fprop_global *p, struct fprop_local_percpu *pl);
void __fprop_inc_percpu_max(struct fprop_global *p, struct fprop_local_percpu *pl,
int max_frac);
void __fprop_add_percpu(struct fprop_global *p, struct fprop_local_percpu *pl,
long nr);
void __fprop_add_percpu_max(struct fprop_global *p,
struct fprop_local_percpu *pl, int max_frac, long nr);
void fprop_fraction_percpu(struct fprop_global *p,
struct fprop_local_percpu *pl, unsigned long *numerator,
unsigned long *denominator);
Expand All @@ -96,7 +97,7 @@ void fprop_inc_percpu(struct fprop_global *p, struct fprop_local_percpu *pl)
unsigned long flags;

local_irq_save(flags);
__fprop_inc_percpu(p, pl);
__fprop_add_percpu(p, pl, 1);
local_irq_restore(flags);
}

Expand Down
22 changes: 16 additions & 6 deletions include/linux/gfp.h
Original file line number Diff line number Diff line change
Expand Up @@ -520,15 +520,11 @@ static inline void arch_free_page(struct page *page, int order) { }
#ifndef HAVE_ARCH_ALLOC_PAGE
static inline void arch_alloc_page(struct page *page, int order) { }
#endif
#ifndef HAVE_ARCH_MAKE_PAGE_ACCESSIBLE
static inline int arch_make_page_accessible(struct page *page)
{
return 0;
}
#endif

struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);
struct folio *__folio_alloc(gfp_t gfp, unsigned int order, int preferred_nid,
nodemask_t *nodemask);

unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
nodemask_t *nodemask, int nr_pages,
Expand Down Expand Up @@ -570,6 +566,15 @@ __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order)
return __alloc_pages(gfp_mask, order, nid, NULL);
}

static inline
struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid)
{
VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES);
VM_WARN_ON((gfp & __GFP_THISNODE) && !node_online(nid));

return __folio_alloc(gfp, order, nid, NULL);
}

/*
* Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
* prefer the current CPU's closest node. Otherwise node must be valid and
Expand All @@ -586,6 +591,7 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask,

#ifdef CONFIG_NUMA
struct page *alloc_pages(gfp_t gfp, unsigned int order);
struct folio *folio_alloc(gfp_t gfp, unsigned order);
extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order,
struct vm_area_struct *vma, unsigned long addr,
int node, bool hugepage);
Expand All @@ -596,6 +602,10 @@ static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order)
{
return alloc_pages_node(numa_node_id(), gfp_mask, order);
}
static inline struct folio *folio_alloc(gfp_t gfp, unsigned int order)
{
return __folio_alloc_node(gfp, order, numa_node_id());
}
#define alloc_pages_vma(gfp_mask, order, vma, addr, node, false)\
alloc_pages(gfp_mask, order)
#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \
Expand Down
11 changes: 11 additions & 0 deletions include/linux/highmem-internal.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,12 @@ static inline void *kmap_local_page(struct page *page)
return __kmap_local_page_prot(page, kmap_prot);
}

static inline void *kmap_local_folio(struct folio *folio, size_t offset)
{
struct page *page = folio_page(folio, offset / PAGE_SIZE);
return __kmap_local_page_prot(page, kmap_prot) + offset % PAGE_SIZE;
}

static inline void *kmap_local_page_prot(struct page *page, pgprot_t prot)
{
return __kmap_local_page_prot(page, prot);
Expand Down Expand Up @@ -171,6 +177,11 @@ static inline void *kmap_local_page(struct page *page)
return page_address(page);
}

static inline void *kmap_local_folio(struct folio *folio, size_t offset)
{
return page_address(&folio->page) + offset;
}

static inline void *kmap_local_page_prot(struct page *page, pgprot_t prot)
{
return kmap_local_page(page);
Expand Down
Loading

0 comments on commit 49f8275

Please sign in to comment.