Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading object metadata #10

Open
rkoo19 opened this issue Nov 15, 2021 · 4 comments
Open

Reading object metadata #10

rkoo19 opened this issue Nov 15, 2021 · 4 comments

Comments

@rkoo19
Copy link

rkoo19 commented Nov 15, 2021

I was wondering if is way a way to also fetch an object's metadata when reading the object itself. I am trying to use ImageNet to train an image classification model, similar to what is done in s3_imagenet_example.py, but I am trying to add image class as metadata for the object itself.

@rkoo19
Copy link
Author

rkoo19 commented Nov 15, 2021

So, if I am to use map-style w/ S3Dataset, I want to be able to fetch the object itself from my S3 bucket, but to also be able to fetch a piece of metadata associated w/ that said object.

@rkoo19
Copy link
Author

rkoo19 commented Nov 15, 2021

I was reading the class definition for S3Dataset, and I saw that when getting an object, it uses some filename to fetch an object, but does nothing about metadata. I would like to, if there is not already do so, modify the procedure of getting an object from S3 to also fetch metadata associated w/ the object as well. I hope this makes sense, and I would appreciate any help I could get!
Screen Shot 2021-11-15 at 12 43 14 PM

@johnbensnyder
Copy link

johnbensnyder commented Nov 16, 2021

How is the object metadata stored? One possibility might be to use the S3BaseClass to write a custom method of reading the file object from S3, then use the filename to read metadata from some other source. For example, here's the setup I use to read an image and annotations for the COCO dataset.

def _load_image(self, image_id):
        if self.handler == None:
            self.handler = _pywrap_s3_io.S3Init()
        filename = os.path.join(self.root, self.coco.loadImgs(image_id)[0]["file_name"])
        fileobj = self.handler.s3_read(filename)
        return Image.open(io.BytesIO(fileobj)).convert("RGB")
    
def _load_target(self, image_id):
        return self.coco.loadAnns(self.coco.getAnnIds(image_id))
    
def __getitem__(self, idx):
        image_id = self.ids[idx]
        img = self._load_image(image_id)
        anno = self._load_target(image_id)
        target = self.build_target(anno, img.size)
        if self._transforms is not None:
                img, target = self._transforms(img, target)
        return img, target, idx

@ydaiming
Copy link
Contributor

ydaiming commented Mar 19, 2022

@johnbensnyder Thanks for helping on this issue!
@rkoo19

We're upstreaming the amazon-s3-plugin-for-pytorch into the torchdata package (pytorch/data#318).
We're dropping support for this plugin.

The current s3 plugin doesn't have this feature, so do the new S3 IO datapipes. We'll backlog this feature request, and update the feature in the new S3 IO datapipes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants