forked from tsutterley/nsidc-subsetter
/
nsidc_subset_altimetry.py
428 lines (398 loc) · 18.8 KB
/
nsidc_subset_altimetry.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
#!/usr/bin/env python
u"""
nsidc_subset_altimetry.py
Written by Tyler Sutterley (05/2020)
Program to acquire subset altimetry datafiles from the NSIDC API:
https://wiki.earthdata.nasa.gov/display/EL/How+To+Access+Data+With+Python
https://nsidc.org/support/faq/what-options-are-available-bulk-downloading-data-
https-earthdata-login-enabled
https://nsidc.org/support/how/2018-agu-tutorial
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
http://www.voidspace.org.uk/python/articles/authentication.shtml#base64
Register with NASA Earthdata Login system:
https://urs.earthdata.nasa.gov
Add NSIDC_DATAPOOL_OPS to NASA Earthdata Applications
https://urs.earthdata.nasa.gov/oauth/authorize?client_id=_JLuwMHxb2xX6NwYTb4dRA
CALLING SEQUENCE:
python nsidc_subset_altimetry.py -T 2018-11-23T00:00:00,2018-11-23T23:59:59
-B -50.33333,68.56667,-49.33333,69.56667 --version=001 -F NetCDF4-CF
--user=<username> -V ATL03
where <username> is your NASA Earthdata username
INPUTS:
GLAH12: GLAS/ICESat L2 Antarctic and Greenland Ice Sheet Altimetry Data
ILATM2: Airborne Topographic Mapper Icessn Elevation, Slope, and Roughness
ILATM1B: Airborne Topographic Mapper QFIT Elevation
ILVIS1B: Land, Vegetation and Ice Sensor Geolocated Return Energy Waveforms
ILVIS2: Geolocated Land, Vegetation and Ice Sensor Surface Elevation Product
ATL03: Global Geolocated Photon Data
ATL04: Normalized Relative Backscatter
ATL06: Land Ice Height
ATL07: Sea Ice Height
ATL08: Land and Vegetation Height
ATL09: Atmospheric Layer Characteristics
ATL10: Sea Ice Freeboard
ATL12: Ocean Surface Height
ATL13: Inland Water Surface Height
COMMAND LINE OPTIONS:
--help: list the command line options
-D X, --directory=X: working data directory
-U X, --user=X: username for NASA Earthdata Login
-N X, --netrc=X: path to .netrc file for alternative authentication
--version: version of the dataset to use
-B X, --bbox=X: Bounding box (lonmin,latmin,lonmax,latmax)
-P X, --polygon=X: Georeferenced file containing a set of polygons
-T X, --time=X: Time range (comma-separated start and end)
-F X, --format=X: Output data format (TABULAR_ASCII, NetCDF4)
-M X, --mode=X: Local permissions mode of the files processed
-V, --verbose: Verbose output of processing
-Z, --unzip: Unzip dataset from NSIDC subsetting service
PYTHON DEPENDENCIES:
lxml: Pythonic XML and HTML processing library using libxml2/libxslt
http://lxml.de/
https://github.com/lxml/lxml
fiona: Python wrapper for vector data access functions from the OGR library
https://fiona.readthedocs.io/en/latest/manual.html
geopandas: Python tools for geographic data
http://geopandas.readthedocs.io/
shapely: PostGIS-ish operations outside a database context for Python
http://toblerity.org/shapely/index.html
PROGRAM DEPENDENCIES:
read_shapefile.py: reads ESRI shapefiles for spatial coordinates
read_kml_file.py: reads kml/kmz files for spatial coordinates
read_geojson_file.py: reads GeoJSON files for spatial coordinates
UPDATE HISTORY:
Updated 05/2020: added option netrc to use alternative authentication
Updated 03/2020: simplify polygon extension if statements
raise exception if polygon file extension is not presently available
Updated 09/2019: added ssl context to urlopen headers
Updated 07/2019: can use specific identifiers within a georeferenced file
Updated 06/2019: added option polygon to subset using a georeferenced file
added read functions for kml/kmz georeferenced files
Written 01/2019
"""
from __future__ import print_function
import sys
import os
import io
import re
import ssl
import time
import netrc
import getopt
import shutil
import base64
import getpass
import zipfile
import builtins
import posixpath
import lxml.etree
import shapely.geometry
import dateutil.parser
from subsetting_tools.read_shapefile import read_shapefile
from subsetting_tools.read_kml_file import read_kml_file
from subsetting_tools.read_geojson_file import read_geojson_file
if sys.version_info[0] == 2:
from cookielib import CookieJar
import urllib2
else:
from http.cookiejar import CookieJar
import urllib.request as urllib2
#-- PURPOSE: check internet connection
def check_connection():
#-- attempt to connect to https host for NSIDC
try:
HOST = 'https://n5eil01u.ecs.nsidc.org/'
urllib2.urlopen(HOST,timeout=20,context=ssl.SSLContext())
except urllib2.URLError:
raise RuntimeError('Check internet connection')
else:
return True
#-- PURPOSE: program to acquire subsetted NSIDC data
def nsidc_subset_altimetry(filepath, PRODUCT, VERSION, USER='', PASSWORD='',
BBOX=None, POLYGON=None, TIME=None, FORMAT=None, MODE=None, CLOBBER=False,
VERBOSE=False, UNZIP=False):
#-- https://docs.python.org/3/howto/urllib2.html#id5
#-- create a password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
#-- Add the username and password for NASA Earthdata Login system
password_mgr.add_password(None, 'https://urs.earthdata.nasa.gov',
USER, PASSWORD)
#-- Encode username/password for request authorization headers
base64_string = base64.b64encode('{0}:{1}'.format(USER,PASSWORD).encode())
#-- Create cookie jar for storing cookies. This is used to store and return
#-- the session cookie given to use by the data server (otherwise will just
#-- keep sending us back to Earthdata Login to authenticate).
cookie_jar = CookieJar()
#-- create "opener" (OpenerDirector instance)
opener = urllib2.build_opener(
urllib2.HTTPBasicAuthHandler(password_mgr),
urllib2.HTTPSHandler(context=ssl.SSLContext()),
urllib2.HTTPCookieProcessor(cookie_jar))
#-- add Authorization header to opener
authorization_header = "Basic {0}".format(base64_string.decode())
opener.addheaders = [("Authorization", authorization_header)]
#-- Now all calls to urllib2.urlopen use our opener.
urllib2.install_opener(opener)
#-- All calls to urllib2.urlopen will now use handler
#-- Make sure not to include the protocol in with the URL, or
#-- HTTPPasswordMgrWithDefaultRealm will be confused.
#-- compile lxml xml parser
parser = lxml.etree.XMLParser(recover=True, remove_blank_text=True)
#-- product and version flags
product_flag = '?short_name={0}'.format(PRODUCT)
version_flag = '&version={0}'.format(VERSION) if VERSION else ''
#-- if using time start and end to temporally subset data
if TIME:
#-- verify that start and end times are in ISO format
start_time = dateutil.parser.parse(TIME[0]).isoformat()
end_time = dateutil.parser.parse(TIME[1]).isoformat()
time_flag = '&time={0},{1}'.format(start_time, end_time)
temporal_flag = '&temporal={0},{1}'.format(start_time, end_time)
else:
time_flag = ''
temporal_flag = ''
#-- spatially subset data using bounding box or polygon file
if BBOX:
#-- if using a bounding box to spatially subset data
#-- min_lon,min_lat,max_lon,max_lat
bounds_flag = '&bounding_box={0:f},{1:f},{2:f},{3:f}'.format(*BBOX)
spatial_flag = '&bbox={0:f},{1:f},{2:f},{3:f}'.format(*BBOX)
elif POLYGON:
#-- read shapefile or kml/kmz file
fileBasename,fileExtension = os.path.splitext(POLYGON)
#-- extract file name and subsetter indices lists
match_object = re.match('(.*?)(\[(.*?)\])?$',POLYGON)
FILE = os.path.expanduser(match_object.group(1))
#-- read specific variables of interest
v = match_object.group(3).split(',') if match_object.group(2) else None
#-- get MultiPolygon object from input spatial file
if fileExtension in ('.shp','.zip'):
#-- if reading a shapefile or a zipped directory with a shapefile
ZIP = (fileExtension == '.zip')
m = read_shapefile(os.path.expanduser(FILE), VARIABLES=v, ZIP=ZIP)
elif fileExtension in ('.kml','.kmz'):
#-- if reading a keyhole markup language (can be compressed)
KMZ = (fileExtension == '.kmz')
m = read_kml_file(os.path.expanduser(FILE), VARIABLES=v, KMZ=KMZ)
elif fileExtension in ('.json','.geojson'):
#-- if reading a GeoJSON file
m = read_geojson_file(os.path.expanduser(FILE), VARIABLES=v)
else:
raise IOError('Unlisted polygon type ({0})'.format(fileExtension))
#-- calculate the bounds of the MultiPolygon object
bounds_flag = '&bounding_box={0:f},{1:f},{2:f},{3:f}'.format(*m.bounds)
#-- calculate the convex hull of the MultiPolygon object for subsetting
#-- the NSIDC api requires polygons to be in counter-clockwise order
X,Y = shapely.geometry.polygon.orient(m.convex_hull,sign=1).exterior.xy
#-- coordinate order for polygon flag is lon1,lat1,lon2,lat2,...
polygon_flag = ','.join(['{0:f},{1:f}'.format(x,y) for x,y in zip(X,Y)])
spatial_flag = '&polygon={0}'.format(polygon_flag)
else:
#-- do not spatially subset data
bounds_flag = ''
spatial_flag = ''
#-- if changing the output format
format_flag = '&format={0}'.format(FORMAT) if FORMAT else ''
#-- get dictionary of granules for temporal and spatial subset
HOST = posixpath.join('https://cmr.earthdata.nasa.gov','search','granules')
page_size,page_num = (10,1)
granules = {}
FLAG = True
#-- reduce to a set number of files per page and then iterate through pages
while FLAG:
#-- flags for page size and page number
size_flag = '&page_size={0:d}'.format(page_size)
num_flag = '&page_num={0:d}'.format(page_num)
#-- url for page
remote_url = ''.join([HOST,product_flag,version_flag,bounds_flag,
temporal_flag,size_flag,num_flag])
#-- Create and submit request. There are a wide range of exceptions
#-- that can be thrown here, including HTTPError and URLError.
request = urllib2.Request(remote_url)
tree = lxml.etree.parse(urllib2.urlopen(request, timeout=20), parser)
root = tree.getroot()
#-- total number of hits for subset (not just on page)
hits = int(tree.find('hits').text)
#-- extract references on page
references = [i for i in tree.iter('reference',root.nsmap)]
#-- check flag
FLAG = (len(references) > 0)
for reference in references:
name = reference.find('name',root.nsmap).text
id = reference.find('id',root.nsmap).text
location = reference.find('location',root.nsmap).text
revision_id = reference.find('revision-id',root.nsmap).text
#-- read cmd location to get filename
req = urllib2.Request(location)
#-- parse cmd location url
tr = lxml.etree.parse(urllib2.urlopen(req, timeout=20), parser)
r = tr.getroot()
f,=tr.xpath('.//gmd:fileIdentifier/gmx:FileName',namespaces=r.nsmap)
#-- create list of id, cmd location, revision and file
granules[name] = [id,location,revision_id,f.text]
#-- add to page number if valid page
page_num += 1 if FLAG else 0
#-- for each page of data
for p in range(1,page_num):
#-- flags for page size and page number
size_flag = '&page_size={0:d}'.format(page_size)
num_flag = '&page_num={0:d}'.format(p)
#-- remote https server for page of NSIDC Data
HOST = posixpath.join('https://n5eil02u.ecs.nsidc.org','egi','request')
remote_url = ''.join([HOST,product_flag,version_flag,bounds_flag,
spatial_flag,time_flag,format_flag,size_flag,num_flag])
#-- local file
today = time.strftime('%Y-%m-%dT%H-%M-%S',time.localtime())
#-- download as either zipped file (default) or unzip to a directory
if UNZIP:
#-- Create and submit request. There are a wide range of exceptions
#-- that can be thrown here, including HTTPError and URLError.
request = urllib2.Request(remote_url)
response = urllib2.urlopen(request)
#-- read to BytesIO object
fid = io.BytesIO(response.read())
#-- use zipfile to extract contents from bytes
remote_data = zipfile.ZipFile(fid)
subdir = '{0}_{1}'.format(PRODUCT,today)
print('{0} -->\n'.format(remote_url)) if VERBOSE else None
#-- extract each member and convert permissions to MODE
for member in remote_data.filelist:
local_file = os.path.join(filepath,subdir,member.filename)
print('\t{0}\n'.format(local_file)) if VERBOSE else None
remote_data.extract(member, path=os.path.join(filepath,subdir))
os.chmod(local_file, MODE)
#-- close the zipfile object
remote_data.close()
else:
#-- Printing files transferred if VERBOSE
local_zip=os.path.join(filepath,'{0}_{1}.zip'.format(PRODUCT,today))
args = (remote_url,local_zip)
print('{0} -->\n\t{1}\n'.format(*args)) if VERBOSE else None
#-- Create and submit request. There are a wide range of exceptions
#-- that can be thrown here, including HTTPError and URLError.
request = urllib2.Request(remote_url)
response = urllib2.urlopen(request)
#-- copy contents to local file using chunked transfer encoding
#-- transfer should work properly with ascii and binary data formats
CHUNK = 16 * 1024
with open(local_zip, 'wb') as f:
shutil.copyfileobj(response, f, CHUNK)
#-- keep remote modification time of file and local access time
# os.utime(local_zip, (os.stat(local_zip).st_atime, remote_mtime))
#-- convert permissions to MODE
os.chmod(local_zip, MODE)
#-- PURPOSE: help module to describe the optional input parameters
def usage():
print('\nHelp: {0}'.format(os.path.basename(sys.argv[0])))
print(' -U X, --user=X\t\tUsername for NASA Earthdata Login')
print(' -N X, --netrc=X\t\tPath to .netrc file for authentication')
print(' -D X, --directory=X\tWorking data directory')
print(' --version\t\tVersion of the dataset to use')
print(' -B X, --bbox=X\t\tBounding box (lonmin,latmin,lonmax,latmax)')
print(' -P X, --polygon=X\tGeoreferenced file containing a set of polygons')
print(' -T X, --time=X\t\tTime range (comma-separated start and end)')
print(' -F X, --format=X\tOutput data format (TABULAR_ASCII, NetCDF4)')
print(' -M X, --mode=X\t\tPermission mode of files processed')
print(' -V, --verbose\t\tVerbose output of processing')
print(' -Z, --unzip\t\tUnzip dataset from NSIDC subsetting service\n')
#-- Main program that calls nsidc_subset_altimetry()
def main():
#-- Read the system arguments listed after the program
short_options = 'hU:N:D:B:P:T:F:M:VZ'
long_options = ['help','version=','bbox=','polygon=','time=','format=',
'user=','netrc=','directory=','mode=','verbose','unzip']
optlist,arglist = getopt.getopt(sys.argv[1:],short_options,long_options)
#-- command line parameters
VERSION = None
BBOX = None
POLYGON = None
TIME = None
FORMAT = None
USER = ''
NETRC = None
#-- working data directory
DIRECTORY = os.getcwd()
#-- permissions mode of the local directories and files (number in octal)
MODE = 0o775
VERBOSE = False
UNZIP = False
for opt, arg in optlist:
if opt in ("-h","--help"):
usage()
sys.exit()
elif opt in ("-U","--user"):
USER = arg
elif opt in ("-N","--netrc"):
NETRC = os.path.expanduser(arg)
elif opt in ("-D","--directory"):
DIRECTORY = os.path.expanduser(arg)
elif opt in ("--version"):
VERSION = arg
elif opt in ("-B","--bbox"):
BBOX = [float(i) for i in arg.split(',')]
elif opt in ("-P","--polygon"):
POLYGON = os.path.expanduser(arg)
elif opt in ("-T","--time"):
TIME = arg.split(',')
elif opt in ("-F","--format"):
FORMAT = arg
elif opt in ("-M","--mode"):
MODE = int(arg, 8)
elif opt in ("-V","--verbose"):
VERBOSE = True
elif opt in ("-Z","--unzip"):
UNZIP = True
#-- Products for the NSIDC subsetter
P = {}
#-- ICESat/GLAS
P['GLAH12'] = 'GLAS/ICESat L2 Antarctic and Greenland Ice Sheet Altimetry'
#-- Operation IceBridge
P['ILATM2'] = 'Icebridge Airborne Topographic Mapper Icessn Product'
P['ILATM1B'] = 'Icebridge Airborne Topographic Mapper QFIT Elevation'
P['ILVIS1B'] = 'Icebridge LVIS Geolocated Return Energy Waveforms'
P['ILVIS2'] = 'Icebridge Land, Vegetation and Ice Sensor Elevation Product'
#-- ICESat-2/ATLAS
P['ATL03'] = 'Global Geolocated Photon Data'
P['ATL04'] = 'Normalized Relative Backscatter'
P['ATL06'] = 'Land Ice Height'
P['ATL07'] = 'Sea Ice Height'
P['ATL08'] = 'Land and Vegetation Height'
P['ATL09'] = 'Atmospheric Layer Characteristics'
P['ATL10'] = 'Sea Ice Freeboard'
P['ATL12'] = 'Ocean Surface Height'
P['ATL13'] = 'Inland Water Surface Height'
#-- enter dataset to transfer as system argument
if not arglist:
for key,val in P.items():
print('{0}: {1}'.format(key, val))
raise Exception('No System Arguments Listed')
#-- NASA Earthdata hostname
HOST = 'urs.earthdata.nasa.gov'
#-- get authentication
if not USER and not NETRC:
#-- check that NASA Earthdata credentials were entered
USER = builtins.input('Username for {0}: '.format(HOST))
#-- enter password securely from command-line
PASSWORD = getpass.getpass('Password for {0}@{1}: '.format(USER,HOST))
elif NETRC:
USER,LOGIN,PASSWORD = netrc.netrc(NETRC).authenticators(HOST)
else:
#-- enter password securely from command-line
PASSWORD = getpass.getpass('Password for {0}@{1}: '.format(USER,HOST))
#-- recursively create directory if presently non-existent
os.makedirs(DIRECTORY) if not os.access(DIRECTORY, os.F_OK) else None
#-- check internet connection before attempting to run program
if check_connection():
#-- check that each data product entered was correctly typed
keys = ','.join(sorted([key for key in P.keys()]))
for p in arglist:
if p not in P.keys():
raise IOError('Incorrect Data Product Entered ({0})'.format(p))
#-- run program for product
nsidc_subset_altimetry(DIRECTORY, p, VERSION, USER=USER,
PASSWORD=PASSWORD, BBOX=BBOX, TIME=TIME, FORMAT=FORMAT,
POLYGON=POLYGON, MODE=MODE, VERBOSE=VERBOSE,
UNZIP=UNZIP)
#-- run main program
if __name__ == '__main__':
main()