Skip to content

Commit

Permalink
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel…
Browse files Browse the repository at this point in the history
…/git/viro/vfs

Pull vfs updates from Al Viro:
 "All kinds of stuff this time around; some more notable parts:

   - RCU'd vfsmounts handling
   - new primitives for coredump handling
   - files_lock is gone
   - Bruce's delegations handling series
   - exportfs fixes

  plus misc stuff all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
  ecryptfs: ->f_op is never NULL
  locks: break delegations on any attribute modification
  locks: break delegations on link
  locks: break delegations on rename
  locks: helper functions for delegation breaking
  locks: break delegations on unlink
  namei: minor vfs_unlink cleanup
  locks: implement delegations
  locks: introduce new FL_DELEG lock flag
  vfs: take i_mutex on renamed file
  vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
  vfs: don't use PARENT/CHILD lock classes for non-directories
  vfs: pull ext4's double-i_mutex-locking into common code
  exportfs: fix quadratic behavior in filehandle lookup
  exportfs: better variable name
  exportfs: move most of reconnect_path to helper function
  exportfs: eliminate unused "noprogress" counter
  exportfs: stop retrying once we race with rename/remove
  exportfs: clear DISCONNECTED on all parents sooner
  exportfs: more detailed comment for path_reconnect
  ...
  • Loading branch information
torvalds committed Nov 13, 2013
2 parents f023029 + bdd3536 commit 9bc9ccd
Show file tree
Hide file tree
Showing 159 changed files with 2,099 additions and 2,491 deletions.
31 changes: 22 additions & 9 deletions Documentation/filesystems/directory-locking
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@
kinds of locks - per-inode (->i_mutex) and per-filesystem
(->s_vfs_rename_mutex).

When taking the i_mutex on multiple non-directory objects, we
always acquire the locks in order by increasing address. We'll call
that "inode pointer" order in the following.

For our purposes all operations fall in 5 classes:

1) read access. Locking rules: caller locks directory we are accessing.
Expand All @@ -12,8 +16,9 @@ kinds of locks - per-inode (->i_mutex) and per-filesystem
locks victim and calls the method.

4) rename() that is _not_ cross-directory. Locking rules: caller locks
the parent, finds source and target, if target already exists - locks it
and then calls the method.
the parent and finds source and target. If target already exists, lock
it. If source is a non-directory, lock it. If that means we need to
lock both, lock them in inode pointer order.

5) link creation. Locking rules:
* lock parent
Expand All @@ -30,7 +35,9 @@ rules:
fail with -ENOTEMPTY
* if new parent is equal to or is a descendent of source
fail with -ELOOP
* if target exists - lock it.
* If target exists, lock it. If source is a non-directory, lock
it. In case that means we need to lock both source and target,
do so in inode pointer order.
* call the method.


Expand All @@ -56,19 +63,25 @@ objects - A < B iff A is an ancestor of B.
renames will be blocked on filesystem lock and we don't start changing
the order until we had acquired all locks).

(3) any operation holds at most one lock on non-directory object and
that lock is acquired after all other locks. (Proof: see descriptions
of operations).
(3) locks on non-directory objects are acquired only after locks on
directory objects, and are acquired in inode pointer order.
(Proof: all operations but renames take lock on at most one
non-directory object, except renames, which take locks on source and
target in inode pointer order in the case they are not directories.)

Now consider the minimal deadlock. Each process is blocked on
attempt to acquire some lock and already holds at least one lock. Let's
consider the set of contended locks. First of all, filesystem lock is
not contended, since any process blocked on it is not holding any locks.
Thus all processes are blocked on ->i_mutex.

Non-directory objects are not contended due to (3). Thus link
creation can't be a part of deadlock - it can't be blocked on source
and it means that it doesn't hold any locks.
By (3), any process holding a non-directory lock can only be
waiting on another non-directory lock with a larger address. Therefore
the process holding the "largest" such lock can always make progress, and
non-directory objects are not included in the set of contended locks.

Thus link creation can't be a part of deadlock - it can't be
blocked on source and it means that it doesn't hold any locks.

Any contended object is either held by cross-directory rename or
has a child that is also contended. Indeed, suppose that it is held by
Expand Down
8 changes: 8 additions & 0 deletions Documentation/filesystems/porting
Original file line number Diff line number Diff line change
Expand Up @@ -455,3 +455,11 @@ in your dentry operations instead.
vfs_follow_link has been removed. Filesystems must use nd_set_link
from ->follow_link for normal symlinks, or nd_jump_link for magic
/proc/<pid> style links.
--
[mandatory]
iget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
called with both ->i_lock and inode_hash_lock held; the former is *not*
taken anymore, so verify that your callbacks do not rely on it (none
of the in-tree instances did). inode_hash_lock is still held,
of course, so they are still serialized wrt removal from inode hash,
as well as wrt set() callback of iget5_locked().
2 changes: 1 addition & 1 deletion arch/arm64/kernel/signal32.c
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ static inline int get_sigset_t(sigset_t *set,
return 0;
}

int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
{
int err;

Expand Down
12 changes: 4 additions & 8 deletions arch/ia64/kernel/elfcore.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,7 @@ Elf64_Half elf_core_extra_phdrs(void)
return GATE_EHDR->e_phnum;
}

int elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
unsigned long limit)
int elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset)
{
const struct elf_phdr *const gate_phdrs =
(const struct elf_phdr *) (GATE_ADDR + GATE_EHDR->e_phoff);
Expand All @@ -35,15 +34,13 @@ int elf_core_write_extra_phdrs(struct file *file, loff_t offset, size_t *size,
phdr.p_offset += ofs;
}
phdr.p_paddr = 0; /* match other core phdrs */
*size += sizeof(phdr);
if (*size > limit || !dump_write(file, &phdr, sizeof(phdr)))
if (!dump_emit(cprm, &phdr, sizeof(phdr)))
return 0;
}
return 1;
}

int elf_core_write_extra_data(struct file *file, size_t *size,
unsigned long limit)
int elf_core_write_extra_data(struct coredump_params *cprm)
{
const struct elf_phdr *const gate_phdrs =
(const struct elf_phdr *) (GATE_ADDR + GATE_EHDR->e_phoff);
Expand All @@ -54,8 +51,7 @@ int elf_core_write_extra_data(struct file *file, size_t *size,
void *addr = (void *)gate_phdrs[i].p_vaddr;
size_t memsz = PAGE_ALIGN(gate_phdrs[i].p_memsz);

*size += memsz;
if (*size > limit || !dump_write(file, addr, memsz))
if (!dump_emit(cprm, addr, memsz))
return 0;
break;
}
Expand Down
2 changes: 1 addition & 1 deletion arch/ia64/kernel/signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ restore_sigcontext (struct sigcontext __user *sc, struct sigscratch *scr)
}

int
copy_siginfo_to_user (siginfo_t __user *to, siginfo_t *from)
copy_siginfo_to_user (siginfo_t __user *to, const siginfo_t *from)
{
if (!access_ok(VERIFY_WRITE, to, sizeof(siginfo_t)))
return -EFAULT;
Expand Down
2 changes: 1 addition & 1 deletion arch/mips/kernel/signal32.c
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ SYSCALL_DEFINE3(32_sigaction, long, sig, const struct compat_sigaction __user *,
return ret;
}

int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
{
int err;

Expand Down
2 changes: 1 addition & 1 deletion arch/parisc/kernel/signal32.c
Original file line number Diff line number Diff line change
Expand Up @@ -319,7 +319,7 @@ copy_siginfo_from_user32 (siginfo_t *to, compat_siginfo_t __user *from)
}

int
copy_siginfo_to_user32 (compat_siginfo_t __user *to, siginfo_t *from)
copy_siginfo_to_user32 (compat_siginfo_t __user *to, const siginfo_t *from)
{
compat_uptr_t addr;
compat_int_t val;
Expand Down
2 changes: 1 addition & 1 deletion arch/parisc/kernel/signal32.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ struct compat_ucontext {

/* ELF32 signal handling */

int copy_siginfo_to_user32 (compat_siginfo_t __user *to, siginfo_t *from);
int copy_siginfo_to_user32 (compat_siginfo_t __user *to, const siginfo_t *from);
int copy_siginfo_from_user32 (siginfo_t *to, compat_siginfo_t __user *from);

/* In a deft move of uber-hackery, we decide to carry the top half of all
Expand Down
3 changes: 2 additions & 1 deletion arch/powerpc/include/asm/spu.h
Original file line number Diff line number Diff line change
Expand Up @@ -235,14 +235,15 @@ extern long spu_sys_callback(struct spu_syscall_block *s);

/* syscalls implemented in spufs */
struct file;
struct coredump_params;
struct spufs_calls {
long (*create_thread)(const char __user *name,
unsigned int flags, umode_t mode,
struct file *neighbor);
long (*spu_run)(struct file *filp, __u32 __user *unpc,
__u32 __user *ustatus);
int (*coredump_extra_notes_size)(void);
int (*coredump_extra_notes_write)(struct file *file, loff_t *foffset);
int (*coredump_extra_notes_write)(struct coredump_params *cprm);
void (*notify_spus_active)(void);
struct module *owner;
};
Expand Down
2 changes: 1 addition & 1 deletion arch/powerpc/kernel/signal_32.c
Original file line number Diff line number Diff line change
Expand Up @@ -893,7 +893,7 @@ static long restore_tm_user_regs(struct pt_regs *regs,
#endif

#ifdef CONFIG_PPC64
int copy_siginfo_to_user32(struct compat_siginfo __user *d, siginfo_t *s)
int copy_siginfo_to_user32(struct compat_siginfo __user *d, const siginfo_t *s)
{
int err;

Expand Down
5 changes: 3 additions & 2 deletions arch/powerpc/platforms/cell/spu_syscalls.c
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include <linux/module.h>
#include <linux/syscalls.h>
#include <linux/rcupdate.h>
#include <linux/binfmts.h>

#include <asm/spu.h>

Expand Down Expand Up @@ -126,7 +127,7 @@ int elf_coredump_extra_notes_size(void)
return ret;
}

int elf_coredump_extra_notes_write(struct file *file, loff_t *foffset)
int elf_coredump_extra_notes_write(struct coredump_params *cprm)
{
struct spufs_calls *calls;
int ret;
Expand All @@ -135,7 +136,7 @@ int elf_coredump_extra_notes_write(struct file *file, loff_t *foffset)
if (!calls)
return 0;

ret = calls->coredump_extra_notes_write(file, foffset);
ret = calls->coredump_extra_notes_write(cprm);

spufs_calls_put(calls);

Expand Down
89 changes: 25 additions & 64 deletions arch/powerpc/platforms/cell/spufs/coredump.c
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@
#include <linux/gfp.h>
#include <linux/list.h>
#include <linux/syscalls.h>
#include <linux/coredump.h>
#include <linux/binfmts.h>

#include <asm/uaccess.h>

Expand All @@ -48,44 +50,6 @@ static ssize_t do_coredump_read(int num, struct spu_context *ctx, void *buffer,
return ++ret; /* count trailing NULL */
}

/*
* These are the only things you should do on a core-file: use only these
* functions to write out all the necessary info.
*/
static int spufs_dump_write(struct file *file, const void *addr, int nr, loff_t *foffset)
{
unsigned long limit = rlimit(RLIMIT_CORE);
ssize_t written;

if (*foffset + nr > limit)
return -EIO;

written = file->f_op->write(file, addr, nr, &file->f_pos);
*foffset += written;

if (written != nr)
return -EIO;

return 0;
}

static int spufs_dump_align(struct file *file, char *buf, loff_t new_off,
loff_t *foffset)
{
int rc, size;

size = min((loff_t)PAGE_SIZE, new_off - *foffset);
memset(buf, 0, size);

rc = 0;
while (rc == 0 && new_off > *foffset) {
size = min((loff_t)PAGE_SIZE, new_off - *foffset);
rc = spufs_dump_write(file, buf, size, foffset);
}

return rc;
}

static int spufs_ctx_note_size(struct spu_context *ctx, int dfd)
{
int i, sz, total = 0;
Expand Down Expand Up @@ -165,10 +129,10 @@ int spufs_coredump_extra_notes_size(void)
}

static int spufs_arch_write_note(struct spu_context *ctx, int i,
struct file *file, int dfd, loff_t *foffset)
struct coredump_params *cprm, int dfd)
{
loff_t pos = 0;
int sz, rc, nread, total = 0;
int sz, rc, total = 0;
const int bufsz = PAGE_SIZE;
char *name;
char fullname[80], *buf;
Expand All @@ -186,42 +150,39 @@ static int spufs_arch_write_note(struct spu_context *ctx, int i,
en.n_descsz = sz;
en.n_type = NT_SPU;

rc = spufs_dump_write(file, &en, sizeof(en), foffset);
if (rc)
goto out;
if (!dump_emit(cprm, &en, sizeof(en)))
goto Eio;

rc = spufs_dump_write(file, fullname, en.n_namesz, foffset);
if (rc)
goto out;
if (!dump_emit(cprm, fullname, en.n_namesz))
goto Eio;

rc = spufs_dump_align(file, buf, roundup(*foffset, 4), foffset);
if (rc)
goto out;
if (!dump_align(cprm, 4))
goto Eio;

do {
nread = do_coredump_read(i, ctx, buf, bufsz, &pos);
if (nread > 0) {
rc = spufs_dump_write(file, buf, nread, foffset);
if (rc)
goto out;
total += nread;
rc = do_coredump_read(i, ctx, buf, bufsz, &pos);
if (rc > 0) {
if (!dump_emit(cprm, buf, rc))
goto Eio;
total += rc;
}
} while (nread == bufsz && total < sz);
} while (rc == bufsz && total < sz);

if (nread < 0) {
rc = nread;
if (rc < 0)
goto out;
}

rc = spufs_dump_align(file, buf, roundup(*foffset - total + sz, 4),
foffset);

if (!dump_skip(cprm,
roundup(cprm->written - total + sz, 4) - cprm->written))
goto Eio;
out:
free_page((unsigned long)buf);
return rc;
Eio:
free_page((unsigned long)buf);
return -EIO;
}

int spufs_coredump_extra_notes_write(struct file *file, loff_t *foffset)
int spufs_coredump_extra_notes_write(struct coredump_params *cprm)
{
struct spu_context *ctx;
int fd, j, rc;
Expand All @@ -233,7 +194,7 @@ int spufs_coredump_extra_notes_write(struct file *file, loff_t *foffset)
return rc;

for (j = 0; spufs_coredump_read[j].name != NULL; j++) {
rc = spufs_arch_write_note(ctx, j, file, fd, foffset);
rc = spufs_arch_write_note(ctx, j, cprm, fd);
if (rc) {
spu_release_saved(ctx);
return rc;
Expand Down
3 changes: 2 additions & 1 deletion arch/powerpc/platforms/cell/spufs/spufs.h
Original file line number Diff line number Diff line change
Expand Up @@ -247,12 +247,13 @@ extern const struct spufs_tree_descr spufs_dir_debug_contents[];

/* system call implementation */
extern struct spufs_calls spufs_calls;
struct coredump_params;
long spufs_run_spu(struct spu_context *ctx, u32 *npc, u32 *status);
long spufs_create(struct path *nd, struct dentry *dentry, unsigned int flags,
umode_t mode, struct file *filp);
/* ELF coredump callbacks for writing SPU ELF notes */
extern int spufs_coredump_extra_notes_size(void);
extern int spufs_coredump_extra_notes_write(struct file *file, loff_t *foffset);
extern int spufs_coredump_extra_notes_write(struct coredump_params *cprm);

extern const struct file_operations spufs_context_fops;

Expand Down
2 changes: 1 addition & 1 deletion arch/s390/kernel/compat_signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ typedef struct
__u32 gprs_high[NUM_GPRS];
} rt_sigframe32;

int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
{
int err;

Expand Down
2 changes: 1 addition & 1 deletion arch/sparc/kernel/signal32.c
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ struct rt_signal_frame32 {
/* __siginfo_rwin_t * */u32 rwin_save;
} __attribute__((aligned(8)));

int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from)
int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from)
{
int err;

Expand Down
2 changes: 1 addition & 1 deletion arch/tile/kernel/compat_signal.c
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ struct compat_rt_sigframe {
struct compat_ucontext uc;
};

int copy_siginfo_to_user32(struct compat_siginfo __user *to, siginfo_t *from)
int copy_siginfo_to_user32(struct compat_siginfo __user *to, const siginfo_t *from)
{
int err;

Expand Down
Loading

0 comments on commit 9bc9ccd

Please sign in to comment.