forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Remove the recently-added LINUX_FADV_ASYNC_WRITE and LINUX_FADV_WRITE_WAIT fadvise() additions, do it in a new sys_sync_file_range() syscall instead. Reasons: - It's more flexible. Things which would require two or three syscalls with fadvise() can be done in a single syscall. - Using fadvise() in this manner is something not covered by POSIX. The patch wires up the syscall for x86. The sycall is implemented in the new fs/sync.c. The intention is that we can move sys_fsync(), sys_fdatasync() and perhaps sys_sync() into there later. Documentation for the syscall is in fs/sync.c. A test app (sync_file_range.c) is in http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz. The available-to-GPL-modules do_sync_file_range() is for knfsd: "A COMMIT can say NFS_DATA_SYNC or NFS_FILE_SYNC. I can skip the ->fsync call for NFS_DATA_SYNC which is hopefully the more common." Note: the `async' writeout mode SYNC_FILE_RANGE_WRITE will turn synchronous if the queue is congested. This is trivial to fix: add a new flag bit, set wbc->nonblocking. But I'm not sure that we want to expose implementation details down to that level. Note: it's notable that we can sync an fd which wasn't opened for writing. Same with fsync() and fdatasync()). Note: the code takes some care to handle attempts to sync file contents outside the 16TB offset on 32-bit machines. It makes such attempts appear to succeed, for best 32-bit/64-bit compatibility. Perhaps it should make such requests fail... Cc: Nick Piggin <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Ulrich Drepper <[email protected]> Cc: Neil Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
- Loading branch information
Andrew Morton
authored and
Linus Torvalds
committed
Mar 31, 2006
1 parent
d6dfd13
commit f79e2ab
Showing
8 changed files
with
177 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
/* | ||
* High-level sync()-related operations | ||
*/ | ||
|
||
#include <linux/kernel.h> | ||
#include <linux/file.h> | ||
#include <linux/fs.h> | ||
#include <linux/module.h> | ||
#include <linux/writeback.h> | ||
#include <linux/syscalls.h> | ||
#include <linux/linkage.h> | ||
#include <linux/pagemap.h> | ||
|
||
#define VALID_FLAGS (SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE| \ | ||
SYNC_FILE_RANGE_WAIT_AFTER) | ||
|
||
/* | ||
* sys_sync_file_range() permits finely controlled syncing over a segment of | ||
* a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is | ||
* zero then sys_sync_file_range() will operate from offset out to EOF. | ||
* | ||
* The flag bits are: | ||
* | ||
* SYNC_FILE_RANGE_WAIT_BEFORE: wait upon writeout of all pages in the range | ||
* before performing the write. | ||
* | ||
* SYNC_FILE_RANGE_WRITE: initiate writeout of all those dirty pages in the | ||
* range which are not presently under writeback. | ||
* | ||
* SYNC_FILE_RANGE_WAIT_AFTER: wait upon writeout of all pages in the range | ||
* after performing the write. | ||
* | ||
* Useful combinations of the flag bits are: | ||
* | ||
* SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages | ||
* in the range which were dirty on entry to sys_sync_file_range() are placed | ||
* under writeout. This is a start-write-for-data-integrity operation. | ||
* | ||
* SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which | ||
* are not presently under writeout. This is an asynchronous flush-to-disk | ||
* operation. Not suitable for data integrity operations. | ||
* | ||
* SYNC_FILE_RANGE_WAIT_BEFORE (or SYNC_FILE_RANGE_WAIT_AFTER): wait for | ||
* completion of writeout of all pages in the range. This will be used after an | ||
* earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait | ||
* for that operation to complete and to return the result. | ||
* | ||
* SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER: | ||
* a traditional sync() operation. This is a write-for-data-integrity operation | ||
* which will ensure that all pages in the range which were dirty on entry to | ||
* sys_sync_file_range() are committed to disk. | ||
* | ||
* | ||
* SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any | ||
* I/O errors or ENOSPC conditions and will return those to the caller, after | ||
* clearing the EIO and ENOSPC flags in the address_space. | ||
* | ||
* It should be noted that none of these operations write out the file's | ||
* metadata. So unless the application is strictly performing overwrites of | ||
* already-instantiated disk blocks, there are no guarantees here that the data | ||
* will be available after a crash. | ||
*/ | ||
asmlinkage long sys_sync_file_range(int fd, loff_t offset, loff_t nbytes, | ||
int flags) | ||
{ | ||
int ret; | ||
struct file *file; | ||
loff_t endbyte; /* inclusive */ | ||
int fput_needed; | ||
umode_t i_mode; | ||
|
||
ret = -EINVAL; | ||
if (flags & ~VALID_FLAGS) | ||
goto out; | ||
|
||
endbyte = offset + nbytes; | ||
|
||
if ((s64)offset < 0) | ||
goto out; | ||
if ((s64)endbyte < 0) | ||
goto out; | ||
if (endbyte < offset) | ||
goto out; | ||
|
||
if (sizeof(pgoff_t) == 4) { | ||
if (offset >= (0x100000000ULL << PAGE_CACHE_SHIFT)) { | ||
/* | ||
* The range starts outside a 32 bit machine's | ||
* pagecache addressing capabilities. Let it "succeed" | ||
*/ | ||
ret = 0; | ||
goto out; | ||
} | ||
if (endbyte >= (0x100000000ULL << PAGE_CACHE_SHIFT)) { | ||
/* | ||
* Out to EOF | ||
*/ | ||
nbytes = 0; | ||
} | ||
} | ||
|
||
if (nbytes == 0) | ||
endbyte = -1; | ||
else | ||
endbyte--; /* inclusive */ | ||
|
||
ret = -EBADF; | ||
file = fget_light(fd, &fput_needed); | ||
if (!file) | ||
goto out; | ||
|
||
i_mode = file->f_dentry->d_inode->i_mode; | ||
ret = -ESPIPE; | ||
if (!S_ISREG(i_mode) && !S_ISBLK(i_mode) && !S_ISDIR(i_mode) && | ||
!S_ISLNK(i_mode)) | ||
goto out_put; | ||
|
||
ret = do_sync_file_range(file, offset, endbyte, flags); | ||
out_put: | ||
fput_light(file, fput_needed); | ||
out: | ||
return ret; | ||
} | ||
|
||
/* | ||
* `endbyte' is inclusive | ||
*/ | ||
int do_sync_file_range(struct file *file, loff_t offset, loff_t endbyte, | ||
int flags) | ||
{ | ||
int ret; | ||
struct address_space *mapping; | ||
|
||
mapping = file->f_mapping; | ||
if (!mapping) { | ||
ret = -EINVAL; | ||
goto out; | ||
} | ||
|
||
ret = 0; | ||
if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { | ||
ret = wait_on_page_writeback_range(mapping, | ||
offset >> PAGE_CACHE_SHIFT, | ||
endbyte >> PAGE_CACHE_SHIFT); | ||
if (ret < 0) | ||
goto out; | ||
} | ||
|
||
if (flags & SYNC_FILE_RANGE_WRITE) { | ||
ret = __filemap_fdatawrite_range(mapping, offset, endbyte, | ||
WB_SYNC_NONE); | ||
if (ret < 0) | ||
goto out; | ||
} | ||
|
||
if (flags & SYNC_FILE_RANGE_WAIT_AFTER) { | ||
ret = wait_on_page_writeback_range(mapping, | ||
offset >> PAGE_CACHE_SHIFT, | ||
endbyte >> PAGE_CACHE_SHIFT); | ||
} | ||
out: | ||
return ret; | ||
} | ||
EXPORT_SYMBOL_GPL(do_sync_file_range); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters