-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimisation of directory type checking in file list #1381
base: dev
Are you sure you want to change the base?
Conversation
… directory-based image formats
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found a place in ProgramSettings.cc:338 which uses casacore::ImageOpener to check whether the positional folder argument is a CASA image to be opened. This should also use GuessImageDirectoryType.
@@ -10,6 +10,7 @@ | |||
#include <fstream> | |||
#include <regex> | |||
|
|||
#include "Casacore.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this include necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asking because I am looking at a file list bug and might include File.h in Casacore.h.
This is an attempt to improve the performance of generating a file list for directories with lots of subdirectories. The main performance bottlenecks in the current code are:
There are three options for displaying file lists: filtering by content, filtering by file extension, and no filtering. Filtering by content additionally slows down the file list generation because it reads the signature of each ordinary file to determine the type. Guessing the type from the file extension speeds up the processing of ordinary files, but not subdirectories, which are always checked with casacore's
ImageOpener
. Turning off filtering entirely has no additional performance impact on the image file list (because detection of directory-based images is still required), but speeds up the region file list (because if no filtering is required, all directories can be shown without additional checks).This PR is an attempt to add a less expensive heuristic for directory-based images, to be used when the option to filter by content is not selected. It performs almost the same checks as
ImageOpener
, but only for directories, by looking for files inside the directory withfs::exists
, and without distinguishing between different CASA image subtypes.The existing code assumes that there are directory image formats which we do not support, and handles them differently. However, it's clear from the
ImageOpener
code that the GIPSY format is a pair of files, not a directory (soImageOpener
would never return that type for a directory), and theCAIPS
andNEWSTAR
types are obsolete and never returned byImageOpener
(at all). So I have removed this option from the code, and not implemented it in the alternative code.The result: the alternative code appears to be slightly faster, but I don't know if it's faster enough for it to make sense to add it as an alternative to the casacore code. If this implementation is sufficient for our purposes (e.g. we don't need to read the
table.info
file because we don't need do distinguish between CASA sub-types here), then perhaps we should replace the casacore check with this (for a modest speed improvement in all cases).Other optimizations we discussed:
I think we're planning to use the last option as our long-term solution, and I would suggest applying that strategy to all files: instead of loading file information up-front when generating the file list, we could initially return just a bare list of files and directories, and then return information for lists of files and directories as the frontend requests them.
Checklist