Skip to content

openterrain/spark-chunker

Repository files navigation

spark-chunker

Usage (local)

Installation

brew install apache-spark
pip install -r requirements.txt

Running

spark-submit chunk.py

Create a tile index with the bounding box of each constituent file. This will be used to determine which spherical Mercator tiles need to be created.

aws s3 ls ned-13arcsec.openterrain.org/4326/ | \
  awk '{print "/vsicurl/http://s3.amazonaws.com/ned-13arcsec.openterrain.org/4326/" $4}' | \
  xargs gdaltindex ned-13arcsec.shp

It's also handy to have around as GeoJSON (we could have created it as GeoJSON, but it's suitable as input for gdalbuildvrt, so we'll keep the Shapefile around):

ogr2ogr -F GeoJSON ned-13arcsec.json ned-13arcsec.shp

Create a VRT containing all constituent files. This is used when chunking, as it allows files that share the same tile to be simultaneously chunked, rather than being chunked to the same tile twice.

gdalbuildvrt -resolution highest ned-13arcsec.vrt ned-13arcsec.shp

The VRT needs to be manually tweaked to set the NoDataValue for the entire group to be the same as individual NODATA values.

Using the GeoJSON index and mercantile, we can produce a list of tiles at our target zoom (determined with the help of get_zoom() in chunk.py, as it's dependent on the source resolution and the target chunk size):

jq -rc .features[] ned-13arcsec.json | mercantile tiles 11 > tiles.txt

Then, to actually run the chunking task:

spark-submit chunk.py tiles.txt

Releases

No releases published

Packages

No packages published