Skip to content

Commit

Permalink
mm: merge populate and nopage into fault (fixes nonlinear)
Browse files Browse the repository at this point in the history
Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping.  The hitch here
is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
 But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn.  Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache.  Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
users have hit mainline yet.

[[email protected]: cleanup]
[[email protected]: doc. fixes for readahead]
[[email protected]: build fix]
Signed-off-by: Nick Piggin <[email protected]>
Signed-off-by: Randy Dunlap <[email protected]>
Cc: Mark Fasheh <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
Nick Piggin authored and Linus Torvalds committed Jul 19, 2007
1 parent d00806b commit 54cb882
Show file tree
Hide file tree
Showing 20 changed files with 394 additions and 272 deletions.
27 changes: 27 additions & 0 deletions Documentation/feature-removal-schedule.txt
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,33 @@ Who: Greg Kroah-Hartman <[email protected]>

---------------------------

What: filemap_nopage, filemap_populate
When: April 2007
Why: These legacy interfaces no longer have any callers in the kernel and
any functionality provided can be provided with filemap_fault. The
removal schedule is short because they are a big maintainence burden
and have some bugs.
Who: Nick Piggin <[email protected]>

---------------------------

What: vm_ops.populate, install_page
When: April 2007
Why: These legacy interfaces no longer have any callers in the kernel and
any functionality provided can be provided with vm_ops.fault.
Who: Nick Piggin <[email protected]>

---------------------------

What: vm_ops.nopage
When: February 2008, provided in-kernel callers have been converted
Why: This interface is replaced by vm_ops.fault, but it has been around
forever, is used by a lot of drivers, and doesn't cost much to
maintain.
Who: Nick Piggin <[email protected]>

---------------------------

What: Interrupt only SA_* flags
When: September 2007
Why: The interrupt related SA_* flags are replaced by IRQF_* to move them
Expand Down
2 changes: 2 additions & 0 deletions Documentation/filesystems/Locking
Original file line number Diff line number Diff line change
Expand Up @@ -510,12 +510,14 @@ More details about quota locking can be found in fs/dquot.c.
prototypes:
void (*open)(struct vm_area_struct*);
void (*close)(struct vm_area_struct*);
struct page *(*fault)(struct vm_area_struct*, struct fault_data *);
struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *);

locking rules:
BKL mmap_sem
open: no yes
close: no yes
fault: no yes
nopage: no yes

================================================================================
Expand Down
2 changes: 1 addition & 1 deletion fs/gfs2/ops_address.c
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ static int gfs2_readpage(struct file *file, struct page *page)
if (file) {
gf = file->private_data;
if (test_bit(GFF_EXLOCK, &gf->f_flags))
/* gfs2_sharewrite_nopage has grabbed the ip->i_gl already */
/* gfs2_sharewrite_fault has grabbed the ip->i_gl already */
goto skip_lock;
}
gfs2_holder_init(ip->i_gl, LM_ST_SHARED, GL_ATIME|LM_FLAG_TRY_1CB, &gh);
Expand Down
2 changes: 1 addition & 1 deletion fs/gfs2/ops_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -364,7 +364,7 @@ static int gfs2_mmap(struct file *file, struct vm_area_struct *vma)
else
vma->vm_ops = &gfs2_vm_ops_private;

vma->vm_flags |= VM_CAN_INVALIDATE;
vma->vm_flags |= VM_CAN_INVALIDATE|VM_CAN_NONLINEAR;

gfs2_glock_dq_uninit(&i_gh);

Expand Down
36 changes: 19 additions & 17 deletions fs/gfs2/ops_vm.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,13 @@
#include "trans.h"
#include "util.h"

static struct page *gfs2_private_nopage(struct vm_area_struct *area,
unsigned long address, int *type)
static struct page *gfs2_private_fault(struct vm_area_struct *vma,
struct fault_data *fdata)
{
struct gfs2_inode *ip = GFS2_I(area->vm_file->f_mapping->host);
struct gfs2_inode *ip = GFS2_I(vma->vm_file->f_mapping->host);

set_bit(GIF_PAGED, &ip->i_flags);
return filemap_nopage(area, address, type);
return filemap_fault(vma, fdata);
}

static int alloc_page_backing(struct gfs2_inode *ip, struct page *page)
Expand Down Expand Up @@ -104,16 +104,14 @@ static int alloc_page_backing(struct gfs2_inode *ip, struct page *page)
return error;
}

static struct page *gfs2_sharewrite_nopage(struct vm_area_struct *area,
unsigned long address, int *type)
static struct page *gfs2_sharewrite_fault(struct vm_area_struct *vma,
struct fault_data *fdata)
{
struct file *file = area->vm_file;
struct file *file = vma->vm_file;
struct gfs2_file *gf = file->private_data;
struct gfs2_inode *ip = GFS2_I(file->f_mapping->host);
struct gfs2_holder i_gh;
struct page *result = NULL;
unsigned long index = ((address - area->vm_start) >> PAGE_CACHE_SHIFT) +
area->vm_pgoff;
int alloc_required;
int error;

Expand All @@ -124,23 +122,27 @@ static struct page *gfs2_sharewrite_nopage(struct vm_area_struct *area,
set_bit(GIF_PAGED, &ip->i_flags);
set_bit(GIF_SW_PAGED, &ip->i_flags);

error = gfs2_write_alloc_required(ip, (u64)index << PAGE_CACHE_SHIFT,
PAGE_CACHE_SIZE, &alloc_required);
if (error)
error = gfs2_write_alloc_required(ip,
(u64)fdata->pgoff << PAGE_CACHE_SHIFT,
PAGE_CACHE_SIZE, &alloc_required);
if (error) {
fdata->type = VM_FAULT_OOM; /* XXX: are these right? */
goto out;
}

set_bit(GFF_EXLOCK, &gf->f_flags);
result = filemap_nopage(area, address, type);
result = filemap_fault(vma, fdata);
clear_bit(GFF_EXLOCK, &gf->f_flags);
if (!result || result == NOPAGE_OOM)
if (!result)
goto out;

if (alloc_required) {
error = alloc_page_backing(ip, result);
if (error) {
if (area->vm_flags & VM_CAN_INVALIDATE)
if (vma->vm_flags & VM_CAN_INVALIDATE)
unlock_page(result);
page_cache_release(result);
fdata->type = VM_FAULT_OOM;
result = NULL;
goto out;
}
Expand All @@ -154,10 +156,10 @@ static struct page *gfs2_sharewrite_nopage(struct vm_area_struct *area,
}

struct vm_operations_struct gfs2_vm_ops_private = {
.nopage = gfs2_private_nopage,
.fault = gfs2_private_fault,
};

struct vm_operations_struct gfs2_vm_ops_sharewrite = {
.nopage = gfs2_sharewrite_nopage,
.fault = gfs2_sharewrite_fault,
};

23 changes: 12 additions & 11 deletions fs/ncpfs/mmap.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@
/*
* Fill in the supplied page for mmap
*/
static struct page* ncp_file_mmap_nopage(struct vm_area_struct *area,
unsigned long address, int *type)
static struct page* ncp_file_mmap_fault(struct vm_area_struct *area,
struct fault_data *fdata)
{
struct file *file = area->vm_file;
struct dentry *dentry = file->f_path.dentry;
Expand All @@ -40,15 +40,17 @@ static struct page* ncp_file_mmap_nopage(struct vm_area_struct *area,

page = alloc_page(GFP_HIGHUSER); /* ncpfs has nothing against high pages
as long as recvmsg and memset works on it */
if (!page)
return page;
if (!page) {
fdata->type = VM_FAULT_OOM;
return NULL;
}
pg_addr = kmap(page);
address &= PAGE_MASK;
pos = address - area->vm_start + (area->vm_pgoff << PAGE_SHIFT);
pos = fdata->pgoff << PAGE_SHIFT;

count = PAGE_SIZE;
if (address + PAGE_SIZE > area->vm_end) {
count = area->vm_end - address;
if (fdata->address + PAGE_SIZE > area->vm_end) {
WARN_ON(1); /* shouldn't happen? */
count = area->vm_end - fdata->address;
}
/* what we can read in one go */
bufsize = NCP_SERVER(inode)->buffer_size;
Expand Down Expand Up @@ -91,15 +93,14 @@ static struct page* ncp_file_mmap_nopage(struct vm_area_struct *area,
* fetches from the network, here the analogue of disk.
* -- wli
*/
if (type)
*type = VM_FAULT_MAJOR;
fdata->type = VM_FAULT_MAJOR;
count_vm_event(PGMAJFAULT);
return page;
}

static struct vm_operations_struct ncp_file_mmap =
{
.nopage = ncp_file_mmap_nopage,
.fault = ncp_file_mmap_fault,
};


Expand Down
2 changes: 1 addition & 1 deletion fs/ocfs2/aops.c
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ static int ocfs2_readpage(struct file *file, struct page *page)
* might now be discovering a truncate that hit on another node.
* block_read_full_page->get_block freaks out if it is asked to read
* beyond the end of a file, so we check here. Callers
* (generic_file_read, fault->nopage) are clever enough to check i_size
* (generic_file_read, vm_ops->fault) are clever enough to check i_size
* and notice that the page they just read isn't needed.
*
* XXX sys_readahead() seems to get that wrong?
Expand Down
17 changes: 8 additions & 9 deletions fs/ocfs2/mmap.c
Original file line number Diff line number Diff line change
Expand Up @@ -60,24 +60,23 @@ static inline int ocfs2_vm_op_unblock_sigs(sigset_t *oldset)
return sigprocmask(SIG_SETMASK, oldset, NULL);
}

static struct page *ocfs2_nopage(struct vm_area_struct * area,
unsigned long address,
int *type)
static struct page *ocfs2_fault(struct vm_area_struct *area,
struct fault_data *fdata)
{
struct page *page = NOPAGE_SIGBUS;
struct page *page = NULL;
sigset_t blocked, oldset;
int ret;

mlog_entry("(area=%p, address=%lu, type=%p)\n", area, address,
type);
mlog_entry("(area=%p, page offset=%lu)\n", area, fdata->pgoff);

ret = ocfs2_vm_op_block_sigs(&blocked, &oldset);
if (ret < 0) {
fdata->type = VM_FAULT_SIGBUS;
mlog_errno(ret);
goto out;
}

page = filemap_nopage(area, address, type);
page = filemap_fault(area, fdata);

ret = ocfs2_vm_op_unblock_sigs(&oldset);
if (ret < 0)
Expand Down Expand Up @@ -209,7 +208,7 @@ static int ocfs2_page_mkwrite(struct vm_area_struct *vma, struct page *page)
}

static struct vm_operations_struct ocfs2_file_vm_ops = {
.nopage = ocfs2_nopage,
.fault = ocfs2_fault,
.page_mkwrite = ocfs2_page_mkwrite,
};

Expand All @@ -226,7 +225,7 @@ int ocfs2_mmap(struct file *file, struct vm_area_struct *vma)
ocfs2_meta_unlock(file->f_dentry->d_inode, lock_level);
out:
vma->vm_ops = &ocfs2_file_vm_ops;
vma->vm_flags |= VM_CAN_INVALIDATE;
vma->vm_flags |= VM_CAN_INVALIDATE | VM_CAN_NONLINEAR;
return 0;
}

23 changes: 11 additions & 12 deletions fs/xfs/linux-2.6/xfs_file.c
Original file line number Diff line number Diff line change
Expand Up @@ -213,18 +213,19 @@ xfs_file_fsync(

#ifdef CONFIG_XFS_DMAPI
STATIC struct page *
xfs_vm_nopage(
struct vm_area_struct *area,
unsigned long address,
int *type)
xfs_vm_fault(
struct vm_area_struct *vma,
struct fault_data *fdata)
{
struct inode *inode = area->vm_file->f_path.dentry->d_inode;
struct inode *inode = vma->vm_file->f_path.dentry->d_inode;
bhv_vnode_t *vp = vn_from_inode(inode);

ASSERT_ALWAYS(vp->v_vfsp->vfs_flag & VFS_DMI);
if (XFS_SEND_MMAP(XFS_VFSTOM(vp->v_vfsp), area, 0))
if (XFS_SEND_MMAP(XFS_VFSTOM(vp->v_vfsp), vma, 0)) {
fdata->type = VM_FAULT_SIGBUS;
return NULL;
return filemap_nopage(area, address, type);
}
return filemap_fault(vma, fdata);
}
#endif /* CONFIG_XFS_DMAPI */

Expand Down Expand Up @@ -310,7 +311,7 @@ xfs_file_mmap(
struct vm_area_struct *vma)
{
vma->vm_ops = &xfs_file_vm_ops;
vma->vm_flags |= VM_CAN_INVALIDATE;
vma->vm_flags |= VM_CAN_INVALIDATE | VM_CAN_NONLINEAR;

#ifdef CONFIG_XFS_DMAPI
if (vn_from_inode(filp->f_path.dentry->d_inode)->v_vfsp->vfs_flag & VFS_DMI)
Expand Down Expand Up @@ -465,14 +466,12 @@ const struct file_operations xfs_dir_file_operations = {
};

static struct vm_operations_struct xfs_file_vm_ops = {
.nopage = filemap_nopage,
.populate = filemap_populate,
.fault = filemap_fault,
};

#ifdef CONFIG_XFS_DMAPI
static struct vm_operations_struct xfs_dmapi_file_vm_ops = {
.nopage = xfs_vm_nopage,
.populate = filemap_populate,
.fault = xfs_vm_fault,
#ifdef HAVE_VMOP_MPROTECT
.mprotect = xfs_vm_mprotect,
#endif
Expand Down
41 changes: 34 additions & 7 deletions include/linux/mm.h
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ extern unsigned int kobjsize(const void *objp);
* In this case, do_no_page must
* return with the page locked.
*/
#define VM_CAN_NONLINEAR 0x10000000 /* Has ->fault & does nonlinear pages */

#ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */
#define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS
Expand All @@ -196,6 +197,25 @@ extern unsigned int kobjsize(const void *objp);
*/
extern pgprot_t protection_map[16];

#define FAULT_FLAG_WRITE 0x01
#define FAULT_FLAG_NONLINEAR 0x02

/*
* fault_data is filled in the the pagefault handler and passed to the
* vma's ->fault function. That function is responsible for filling in
* 'type', which is the type of fault if a page is returned, or the type
* of error if NULL is returned.
*
* pgoff should be used in favour of address, if possible. If pgoff is
* used, one may set VM_CAN_NONLINEAR in the vma->vm_flags to get
* nonlinear mapping support.
*/
struct fault_data {
unsigned long address;
pgoff_t pgoff;
unsigned int flags;
int type;
};

/*
* These are the virtual MM functions - opening of an area, closing and
Expand All @@ -205,9 +225,15 @@ extern pgprot_t protection_map[16];
struct vm_operations_struct {
void (*open)(struct vm_area_struct * area);
void (*close)(struct vm_area_struct * area);
struct page * (*nopage)(struct vm_area_struct * area, unsigned long address, int *type);
unsigned long (*nopfn)(struct vm_area_struct * area, unsigned long address);
int (*populate)(struct vm_area_struct * area, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, int nonblock);
struct page *(*fault)(struct vm_area_struct *vma,
struct fault_data *fdata);
struct page *(*nopage)(struct vm_area_struct *area,
unsigned long address, int *type);
unsigned long (*nopfn)(struct vm_area_struct *area,
unsigned long address);
int (*populate)(struct vm_area_struct *area, unsigned long address,
unsigned long len, pgprot_t prot, unsigned long pgoff,
int nonblock);

/* notification that a previously read-only page is about to become
* writable, if an error is returned it will cause a SIGBUS */
Expand Down Expand Up @@ -661,7 +687,6 @@ static inline int page_mapped(struct page *page)
*/
#define NOPAGE_SIGBUS (NULL)
#define NOPAGE_OOM ((struct page *) (-1))
#define NOPAGE_REFAULT ((struct page *) (-2)) /* Return to userspace, rerun */

/*
* Error return values for the *_nopfn functions
Expand Down Expand Up @@ -1110,9 +1135,11 @@ extern void truncate_inode_pages_range(struct address_space *,
loff_t lstart, loff_t lend);

/* generic vm_area_ops exported for stackable file systems */
extern struct page *filemap_nopage(struct vm_area_struct *, unsigned long, int *);
extern int filemap_populate(struct vm_area_struct *, unsigned long,
unsigned long, pgprot_t, unsigned long, int);
extern struct page *filemap_fault(struct vm_area_struct *, struct fault_data *);
extern struct page * __deprecated_for_modules
filemap_nopage(struct vm_area_struct *, unsigned long, int *);
extern int __deprecated_for_modules filemap_populate(struct vm_area_struct *,
unsigned long, unsigned long, pgprot_t, unsigned long, int);

/* mm/page-writeback.c */
int write_one_page(struct page *page, int wait);
Expand Down
Loading

0 comments on commit 54cb882

Please sign in to comment.