-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ERROR]: Training different data -> IndexError: index 3 is out of bounds for axis 2 with size 3 #277
Comments
Hi @williamobrein, Thanks for putting this in. I'll have a look and get back to you as soon as I can. -N |
In response to your questions:
|
Can you give any more information about the imagery you're using? If I had to guess, the above error is likely due to the |
Thank you very much for the quick return.
|
My dataset is 3-channel(RGB). Does the Spacenet dataset have RGBA channels? So the 4th channel is Alpha? If the input of the model accepts 3 channels, I think I need to remove your preprocessing process. Am I right? Can you help with this? Or is there any other way you suggest I should follow? Below is the information of a tile after the
|
Hi @williamobrein,
Yes! It should be!
My apologies for the confusion - I thought it was delineated there, but looking again I see I'm incorrect. Yes, it's BGR. There's a
That's correct - the only thing you'll need to do is calculate the mean and standard deviation for each channel (then divide by the bit depth, because that's how
Yes, TIF works! That should've been specified there, sorry it's unclear.
Actually the 4th channel in the SpaceNet Atlanta dataset is Near-IR, which gets dropped.
Yes, you need to remove the
|
If this addresses these issues let me know (or just close the issue). Thanks! |
Sorry for the late reply. I had work to do. Thank you for the answers. I can manually convert it to BGR for now, but I would appreciate it if it merges into master branch as soon as possible! I've solved all my previous questions. Thanks for your help. But I have new questions. I'll calculate the mean and standard deviation for each channel then divide by the bit depth. And I will start the training. After that I'll let you know and close the issues. Don't worry about it!
My values are as follows just for one channel.
When I divide it into bit depth(my data is 24 bit, so bit depth is 2^24 ?), I get this result.
Is there a format for displaying these values or should I write the result directly? Because your values seem to be written in a format. Can you explain if I'm wrong? How exactly is this calculation process? Can you give me an example?
|
Apologies, I wasn't super clear there: So, in your case for the mean, Do your images use the full 24-bit range? That is, are there pixel values approaching 16777216? If not, you could truncate to 16-bit (maximum pixel value 65535) or 8-bit (max value 255) and re-save the images, which would make them substantially smaller files. 24-bit images are indeed huge.
We'd love to accommodate more image types as part of
Yes, this is something we're exploring (see #163). Development time has limited our ability to implement it so far, however...another area where a community contributor would be welcome to step in.
Resolution, re-scaling, and how they impact model performance remains to some degree an open question in the field. If you're trying to directly use the pre-trained weights from XD_XD's model without any fine-tuning, you will want pixel size to be very similar to the SpaceNet Atlanta Data (0.5 m/px). If you're fine-tuning the model weights using your own dataset, or re-training completely, it shouldn't matter as much.
I don't have a great answer for this beyond what I said in response to the last question. My personal opinion: since re-sampling will almost always result in some loss of information from the source data, it should be avoided when possible; however, I'm not aware of any studies that have directly examined the validity of my assumption. If you do resample, I recommend bilinear or bicubic resampling. For example, in the SpaceNet Atlanta dataset, all of the different collects were re-sampled to 0.5 m/px using bilinear resampling to ensure consistency within the dataset.
Yes we did - we included the full SpaceNet Atlanta training set, which includes non-building tiles. Particularly if your testing dataset is likely to include non-building tiles, this can be valuable. |
Thanks, that's what I did. Nice to confirm. Why do we give the maximum pixel value, standard deviation, and mean value? I've never seen anything like that in traditional object detection models. Can you explain? Or do you have a source to suggest? Spacenet Atlanta data is not RGB? How can you use 16 bits? I think you need to use 24 bits. (R[8]G[8]B[8] = 24) What's the difference? There are several tif files in the Spacenet Atlanta data set. How did you find their mean value and standard deviation? Did you merge it into one piece? Or do you have a different method? I have a lot of tif files. I don't know how to calculate the standard deviation and mean value of each part separately and then combine them. It takes a long time to make them all in one file and requires high processing power.
Yes you are right! 24-bit images are too large. I will consider your suggestions on this issue. I'm gonna check my images and make edits.
I understand you, very well. I don't have time to improve it anytime soon. But I can open a issues like you said. This is useful for those who want to improve it.
I understand, I'm going to do some research on this. I'il let you know if I find anything.
Thank you, that's what I thought. I just wanted to get your opinion. Maybe if I will train completely from the beginning, I might try.
Actually, I didn't because I thought there would be data loss. But I wanted to ask you maybe you know an academic resource. My data is consistent on this. I don't need any sampling at this time.
What is the purpose of including non-building images? While learning the images of buildings on the model, doesn't he learn the rest of the places? I'm confused here. Is there a logic different from recognizing objects? |
A fairly common practice for many computer vision models is to either z-score pixel intensities or normalize them to a 0-1 range. In this case, we're using the albumentations library to run z-scoring. Normalization is important to achieve consistent performance across images from different sensors/collects. For example, SpaceNet Atlanta's pixel intensities are mostly between 0 and 1200; if you provided images that ranged in pixel values from 100 to 200, the model would likely have no idea how to generate valid predictions. We show this in the 4th notebook in the Solaris FOSS4G tutorial.
Apologies, I was providing per-channel bit depth. Every channel (R, G, B, and near-IR) are encoded as 16-bit values. Looks like I misinterpreted your description of your image - I took 24 bit to mean 24 bits per channel.
Though I haven't explored this in great detail personally, my expectation is that during training, the model learns the distribution of number of building pixels per image to some degree. As U-Nets utilize both whole-image information (at the middle layers) as well as fine-grained information (in the beginning and end), I could envision a model learning that it should never predict zero building pixels if the training set it's provided never has zero building pixels. Generally, best practices recommend matching your training and testing datasets' distributions to one another, and if your test set includes building-free images, we believe that training should too. Segmentation models do indeed work differently from object detection models, which generally provide proposals and classification and then use NMS to filter out bad predictions. I could envision these two handling variation between training and testing set distributions differently. |
I understand. I need to study Albumentations. Thanks for the lead.
I know that normalization is important in image detection. But I don't know about the z-score. I think I need to figure out that. I'll check the notebooks. Thank you!
Now everything is clear. I'm going to review my image and make it 16 bits per channel and start training like that. Thank you!
I thought that there are already other areas in the images that contain buildings. I mean areas without buildings. I guess that's how traditional object recognition works. Subject to change depending on the model. But what you say makes sense. Of course, there will be parts in my test set that do not include buildings. I'm gonna edit my data based on what we're talking about! I think we can close this. Thanks again for everything! |
Can we run |
Do you mean pre-processing in terms of tiling or image augmentation before it's fed into the model? Either way, the present answer is no but an enterprising user would be welcome to make a PR. If you want to encourage that, I'd recommend creating a new issue here for it - I'm going to close this one since we've moved fairly far afield from the original question. |
Hello, first of all thank you for developing
solaris
. I've been working on object detection for a long time. But I'm new to Github. So I'm sorry for my faults!I tried to train with my own data but I got an error. I received an error:
IndexError: index 3 is out of bounds for axis 2 with size 3
As you mentioned in the document, I divided the satellite image (in tif format) into tiles. Then I divided geojson files in the same way. I did the mask creation process. Again I created my mask(footprint mask) in tif format. Then I created the training and test csv files as you specified. I have edited the configuration file of the pre-trained model
xdxd_spacenet4
.Error Message
When I run this command, I get the error like above.
How can I solve this problem? Anybody have any ideas? Thanks in advance.
What should I do?
I have some questions. It would be very helpful if you could help.
Environment information
solaris
version: 0.1.3The text was updated successfully, but these errors were encountered: