Skip to content

Commit

Permalink
Remove IOFILTER_READ_METADATA workaround
Browse files Browse the repository at this point in the history
The IOFILTER_READ_METADATA environment variable was used to tell the
I/O filter that the reader wanted to read the JSON metadata (rather
than the actual dataset data). That was a workaround: we just wanted to
programmatically retrieve data from the first chunk, but HDF5 lacked an
API to obtain the file offset of a given chunk number.

Now that HDF5 introduced H5Dget_num_chunks() and H5Dget_chunk_info()
that limitation is gone and the implementation of the public method
HDF5_Handler::extractUDFMetadata became much simpler.
  • Loading branch information
lucasvr committed May 26, 2022
1 parent bc7bf28 commit 6e518d7
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 31 deletions.
7 changes: 1 addition & 6 deletions src/io_filter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -186,12 +186,7 @@ static size_t
H5Z_udf_filter_callback(unsigned int flags, size_t cd_nelmts,
const unsigned int *cd_values, size_t nbytes, size_t *buf_size, void **buf)
{
if (flags & H5Z_FLAG_REVERSE && getenv("IOFILTER_READ_METADATA") != NULL)
{
std::string json_string((const char *) *buf);
*buf_size = json_string.size();
}
else if (flags & H5Z_FLAG_REVERSE)
if (flags & H5Z_FLAG_REVERSE)
{
std::string json_string((const char *) *buf);
json jas = json::parse(json_string);
Expand Down
50 changes: 25 additions & 25 deletions src/libudf.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -339,31 +339,31 @@ bool HDF5_Handler::extractUDFMetadata(std::string &json_payload)
if (hdf5_datatype < 0 && extractInfo() == false)
return false;

// Note: ideally, we'd like to resort to H5Dget_offset() to get the dataset
// offset relative to the beginning of the file and then read the UDF metadata
// by ourselves. However, H5Dget_offset does not work with chunked datasets,
// which we use to store UDFs.
// An alternative would be to retrieve the underlying file handle with
// H5Fget_vfd_handle() and then read the file offset at /proc/self/fdinfo/N,
// but does not work either: the file offset is reported as 0 after the handle
// is obtained (based on observation, not code inspection).
// So, we introduce a semantic modification to our I/O filter read path: if
// there's an environment variable named "IOFILTER_READ_METADATA", then the I/O
// filter returns the JSON metadata associated with the dataset -- otherwise
// the standard read operation executes. Not an ideal solution, but it works.

char *rdata = new char[1024*1024];
os::setEnvironmentVariable("IOFILTER_READ_METADATA", "1");
herr_t ret = H5Dread(dset_id, hdf5_datatype, H5S_ALL, H5S_ALL, H5P_DEFAULT, rdata);
os::clearEnvironmentVariable("IOFILTER_READ_METADATA");
if (ret < 0)
{
delete[] rdata;
FAIL("error reading UDF dataset metadata");
}
json_payload.assign(rdata, strlen(rdata));
delete[] rdata;
return true;
// Sanity check: make sure we have a valid data chunk in the file
hsize_t nchunks = 0;
if (H5Sselect_all(space_id) < 0)
FAIL("failed to select the full extent of the dataset\n");
if (H5Dget_num_chunks(dset_id, space_id, &nchunks) < 0)
FAIL("failed to retrieve the number of chunks in the dataset\n");
if (nchunks == 0)
FAIL("dataset has no chunks\n");

// Locate the metadata, which is a NULL-terminated string allocated at the
// beginning of the first (and only) data chunk created by the UDF filter.
haddr_t file_offset = 0;
if (H5Dget_chunk_info(dset_id, space_id, 0, NULL, NULL, &file_offset, NULL) < 0)
FAIL("failed to retrieve information from chunk #0\n");

// Play safe and use a different file handle to extract the file data (as
// opposed to reusing the VFD file handle). We just don't want to make
// assumptions about the VFD driver.
std::ifstream file(hdf5_file, std::ifstream::in | std::ifstream::binary);
if (! file.is_open())
FAIL("failed to open %s\n", hdf5_file.c_str());
file.seekg(file_offset);
std::getline(file, json_payload);

return json_payload.length() > 0;
}

///////////////////////
Expand Down

0 comments on commit 6e518d7

Please sign in to comment.