Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on high density collections #39

Closed
danielrode opened this issue Sep 24, 2024 · 11 comments
Closed

Crash on high density collections #39

danielrode opened this issue Sep 24, 2024 · 11 comments

Comments

@danielrode
Copy link

I am using wrench to create a density raster for a very high-density LiDAR collection using pdal_wrench density. The collection I am working with reaches densities of 200 points/meter^2 in some places. (I know this because I used lidR to create a density raster previously, but we are in the process of converting parts of our workflow to use PDAL.) I wanted to create a 20-meter raster, however, I kept getting the following error:

0terminate called after throwing an instance of 'pdal::pdal_error'
what(): writers.gdal: Unable to write block for for raster './export/tiles/994_43.tif'.
Unable to convert data for raster type as requested: 35017 -> short
./gen-density-raster.sh: line 24: 43971 Aborted (core dumped) /usr/lib64/qgis/pdal_wrench density "${args[@]}"

After some troubleshooting and looking at the point clouds in QGIS, I finally realized that GDAL/PDAL was crashing once it reached the higher density areas of the collection. Assuming the worst case for a single pixel (a value of 200) and then multiplying this by 20x20 gives us 80,000. This is well above 32,767 (the maximum value of a short integer, which is what GDAL is using here to store raster values).

For now, I worked around this by creating a higher resolution raster (using 1-meter pixels instead of 20-) and will down-sample later.

It would be nice if PDAL Wrench could handle creating coarse resolution density rasters from these kinds of high-resolution LiDAR collections (perhaps by providing the option to use higher bit values for rasters), or at least provide a clearer error message when the value is too large to be saved.

@hobu
Copy link
Member

hobu commented Sep 24, 2024

which is what GDAL is using here to store raster values

Wrench is using an int16_t to store the counts. This could probably be bumped to an int32_t without impact, especially if the output TIFF is being compressed with a common compression encoding.

@danielrode
Copy link
Author

Wrench crashed again, even with the 1-meter pixels setting. It ran for over 8 hours before crashing with this message:

0/usr/include/c++/14/bits/stl_vector.h:1127: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator [with _Tp = unsigned char; _Alloc = std::allocator; reference = unsigned char&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
./gen-density-raster.sh: line 31: 71023 Aborted (core dumped) /usr/lib64/qgis/pdal_wrench density "${args[@]}"

I am not sure if this is also a int16 vs int32 problem. I know it is not a RAM issue because my system still had over 25GB of RAM available (out of 128GB) at the time Wrench crashed.

@abellgithub
Copy link

It's very hard to know what might be going on without the full command and perhaps the input file.

Also, an assertion suggests you're running in debug mode. Can you provide information about the version or PDAL/wrench that you're using and how it was built, if you know?

@abellgithub
Copy link

Also, things can get pretty ugly if you don't provide bounds for the output. I'm not sure how wrench decides this. Without this, memory often has to be copied over and over again. This is slow. Having some idea about the scale of your input and output is very helpful to analyzing what might be going on.

@danielrode
Copy link
Author

I tried running pdal_wrench --version and pdal_wrench --help, but neither would show version information.

I am using the version of Wrench that comes bundled with QGIS 3.34:

image

Here's the command I ran last (in Bash) that produced the error mentioned in my last post:

args=(
  --input=collection.vpc
  --resolution=1
  --output=./export/tiles.tif
  --tile-size=100
  --tile-origin-x=0
  --tile-origin-y=0
)
/usr/lib64/qgis/pdal_wrench density "${args[@]}"

collection.vpc points to a LiDAR collection of about 9,000 COPC files; the collection is almost 2TB. I do not have time to sift through the collection at the moment, but if I find out which specific tile is causing trouble, I will post that here later.

For now, I am just going to work around this by running a bunch of vanilla PDAL instances in parallel, each on one .copc.laz file.

As for scale, this was the output Wrench gave when it started its run:

grid 114360x168446
tiles 1145 1686

The collection uses meters as its projection. I just tried to run pdal info to get the collection bounds, but got another crash:

$ pdal --version
pdal 2.6.3 (git-version: Release)
$ pdal info --metadata collection.vpc
/usr/include/c++/14/bits/basic_string.h:1269: std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::reference std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::operator [with _CharT = char; _Traits = std::char_traits; _Alloc = std::allocator; reference = char&; size_type = long unsigned int]: Assertion '__pos <= size()' failed.

Just eyeballing it in QGIS, the collection appears to span an area of 120 km by 170 km. The collection does not fully fill in this area though (there was no LiDAR collected for about 2/3 of the area).

@abellgithub
Copy link

114360 * 168446 = 19,263,484,560. That's 19GB. Then each cell is 4 bytes, so you're up to almost 80GB. Also, if PDAL doesn't know the size of the output buffer at start time, it will have to move data around. This means allocating for both source and destination, so you could think that maybe you need 160GB to do this. It seems likely you're running out of memory. It's a very large problem and you should probably break it into pieces.

@danielrode
Copy link
Author

Like I mentioned above, I had 25GB of RAM still available when Wrench crashed (I run a RAM usage monitor that polls every 2 seconds). I suppose it is possible that Wrench memory usage spiked and then crashed within that 2 second window, but I do not know enough about the internal workings of the code to know how likely that is.

Is it possible to specify the size of the output buffer at start time? If so, how?

Either way, I agree, breaking up the processing into batches is probably going to be my best path forward.

@abellgithub
Copy link

abellgithub commented Sep 27, 2024

If you need a buffer of 50GB, a single allocation may fail if you only have 25GB.

I don't know how you specify the size using wrench.

@danielrode
Copy link
Author

danielrode commented Sep 30, 2024

After running vanilla PDAL individually on each tile, I found the problem tile that was causing the most recent error I encountered. This particular error is not a memory issue since I reproduced it again when running a single PDAL instance on just that one tile and there were no other memory intensive programs running at the time; this particular COPC LAZ tile is only 250 MB.

Command:

pdal pipeline --stream --stdin <<EOF
[
"import/tile______.copc.laz",
{
  "type": "writers.gdal",
  "resolution": 1,
  "binmode": true,
  "dimension": "X",
  "data_type":"uint16_t",
  "output_type": ["count"],
  "gdaldriver": "GTiff",
  "filename": "tmp.tif"
}
]
EOF

Error:

/usr/include/c++/14/bits/stl_vector.h:1127: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator [with _Tp = unsigned char; _Alloc = std::allocator; reference = unsigned char&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
Aborted (core dumped)

image

All other tiles for the collection appear to have rasterized successfully. I opened up this particular tile in QGIS to inspect it and I noticed a 25-meter by 180-meter void in the center of the tile where the point density is very very low (only a few points scattered about). This appears to be a water body that scattered most of the laser pulses and so virtually no data was recorded there.

I suspect that this particular crash is happening because GDAL does not like being passed pixels in the center of a tile that contain no points. What are some workarounds for this? And if this is the case, why doesn't GDAL complain when processing edge tiles? It would be nice if PDAL could anticipate this kind of lidar input and provide an informative error message such as "error: No points found for pixel X,Y (cannot process tile)", or simply tell GDAL to mark those pixels as "No Data" as it does for edge tiles.

@abellgithub
Copy link

I don't think there is any way to know what's going on without the data. If you would like to provide the data I will see what I can learn.

@danielrode
Copy link
Author

I did some more investigation and found that the original LAZ file from our vendor for that tile worked fine. I just converted that original to COPC again (with PDAL) and that version worked too. I am guessing that the earlier PDAL conversion I did to COPC awhile back must have corrupted that one tile, but in such a way that lidR did not mind, so I did not catch the corruption until now. I am going to close this issue and open a new one just for the original int 16 vs int 32 issue I ran into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants