Reading object metadata #10

rkoo19 · 2021-11-15T17:11:05Z

I was wondering if is way a way to also fetch an object's metadata when reading the object itself. I am trying to use ImageNet to train an image classification model, similar to what is done in s3_imagenet_example.py, but I am trying to add image class as metadata for the object itself.

rkoo19 · 2021-11-15T17:40:47Z

So, if I am to use map-style w/ S3Dataset, I want to be able to fetch the object itself from my S3 bucket, but to also be able to fetch a piece of metadata associated w/ that said object.

rkoo19 · 2021-11-15T17:45:28Z

I was reading the class definition for S3Dataset, and I saw that when getting an object, it uses some filename to fetch an object, but does nothing about metadata. I would like to, if there is not already do so, modify the procedure of getting an object from S3 to also fetch metadata associated w/ the object as well. I hope this makes sense, and I would appreciate any help I could get!

johnbensnyder · 2021-11-16T02:34:39Z

How is the object metadata stored? One possibility might be to use the S3BaseClass to write a custom method of reading the file object from S3, then use the filename to read metadata from some other source. For example, here's the setup I use to read an image and annotations for the COCO dataset.

def _load_image(self, image_id):
        if self.handler == None:
            self.handler = _pywrap_s3_io.S3Init()
        filename = os.path.join(self.root, self.coco.loadImgs(image_id)[0]["file_name"])
        fileobj = self.handler.s3_read(filename)
        return Image.open(io.BytesIO(fileobj)).convert("RGB")
    
def _load_target(self, image_id):
        return self.coco.loadAnns(self.coco.getAnnIds(image_id))
    
def __getitem__(self, idx):
        image_id = self.ids[idx]
        img = self._load_image(image_id)
        anno = self._load_target(image_id)
        target = self.build_target(anno, img.size)
        if self._transforms is not None:
                img, target = self._transforms(img, target)
        return img, target, idx

ydaiming · 2022-03-19T00:20:47Z

@johnbensnyder Thanks for helping on this issue!
@rkoo19

We're upstreaming the amazon-s3-plugin-for-pytorch into the torchdata package (pytorch/data#318).
We're dropping support for this plugin.

The current s3 plugin doesn't have this feature, so do the new S3 IO datapipes. We'll backlog this feature request, and update the feature in the new S3 IO datapipes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading object metadata #10

Reading object metadata #10

rkoo19 commented Nov 15, 2021

rkoo19 commented Nov 15, 2021

rkoo19 commented Nov 15, 2021

johnbensnyder commented Nov 16, 2021 •

edited

Loading

ydaiming commented Mar 19, 2022 •

edited

Loading

Reading object metadata #10

Reading object metadata #10

Comments

rkoo19 commented Nov 15, 2021

rkoo19 commented Nov 15, 2021

rkoo19 commented Nov 15, 2021

johnbensnyder commented Nov 16, 2021 • edited Loading

ydaiming commented Mar 19, 2022 • edited Loading

johnbensnyder commented Nov 16, 2021 •

edited

Loading

ydaiming commented Mar 19, 2022 •

edited

Loading