pip install pyncml
pip install git+https://github.com/kwilcox/pyncml.git
-
Adding things * Attributes:
<attribute name="some_new_attribute" type="string" value="some_standard_name" />
-
Renaming things
- Variables:
<variable name="new_var" orgName="old_var" />
- Attributes:
<attribute name="new_attr" orgName="old_attr" />
- Dimensions:
<dimension name="new_dim" orgName="old_dim" />
- Variables:
-
Removing things
- Variables:
<remove name="some_variable" type="variable" />
- Attributes:
<remove name="some_variable" type="variable" />
- Variables:
-
Aggregating things
- Scans:
<scan location="some_directory/foo/bar/" suffix=".nc" subdirs="true" />
- Scans:
- Adding variables (could be implemented in the future)
- Groups (could be implemented in the future)
- Setting actual data values on variables (could be implemented in the future)
- Creating a file from scratch (could be implemented in the future)
- Removing Dimensions (not implemented in the C library)
- Aggregation scans that utilize the
dateFormatMark
attribute (most likely will never be implemented)
The apply
function takes in a path to the input_file
NetCDF file, an ncml
object (string, file path, or python etree object), and an optional output_file
. If an output_file is not specified, the input_file
will be edited in place. The object returned from the apply
function is a netcdf4-python object, ready to be used.
Any location
attributes in the NcML are ignored and the NcML is applied against the file specified as the input_file
.
netcdf = '/some/file/path/in.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml)
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = """<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
<attribute name="new_history" orgName="history" />
<attribute name="new_file_format" orgName="file_format" value="New Format" />
<remove name="source" type="attribute" />
</netcdf>
"""
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
import pyncml
netcdf = '/some/file/path/in.nc'
out = '/some/file/path/out.nc'
ncml = pyncml.etree.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<attribute name="new_attribute" value="works" />
</netcdf>
""")
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
The scan
function takes in a path to an ncml
object (string, file path, or python etree object). The object returned from the scan
function is a metadata object describing the scan aggregation it is not a netcdf4-python object of the aggregation. You can create a netcdf4-python
object from the scan aggregation (example below).
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
print agg.starting
2014-06-20 00:00:00+00:00
print agg.ending
2014-07-19 23:00:00+00:00
print agg.timevar_name
u'time'
print agg.standard_names
[
u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'
]
print agg.members # These are already sorted by the 'starting' date
[
{
'starting': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/first_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
{
'starting': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'ending': datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
'path': '/path/to/aggregation/defined/in/ncml/second_member.nc'
'standard_names': [u'time',
u'projection_y_coordinate',
u'projection_x_coordinate',
u'eastward_wind_velocity'],
},
...
]
Note: This will not work with aggregations whose members overlap in time!
ncml = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
files = [ f.path for f in agg.members ]
agg = netCDF4.MFDataset(files)
time = agg.variables.get(agg.timevar_name)
print time
<class 'netCDF4._Variable'>
float64 time('time',)
long_name: date time
units: hours since 1970-01-01 00:00:00
_CoordinateAxisType: Time
unlimited dimensions = ('time',)
current size = (14,)
print time[:]
[ 389784. 389785. 389786. 389787. 389788. 389789. 389790. 389791.
389792. 389793. 390500. 390501. 390502. 390503.]
print netCDF4.num2date(time[:], units=time.units)
[datetime.datetime(2014, 6, 20, 0, 0) datetime.datetime(2014, 6, 20, 1, 0)
datetime.datetime(2014, 6, 20, 2, 0) datetime.datetime(2014, 6, 20, 3, 0)
datetime.datetime(2014, 6, 20, 4, 0) datetime.datetime(2014, 6, 20, 5, 0)
datetime.datetime(2014, 6, 20, 6, 0) datetime.datetime(2014, 6, 20, 7, 0)
datetime.datetime(2014, 6, 20, 8, 0) datetime.datetime(2014, 6, 20, 9, 0)
datetime.datetime(2014, 7, 19, 20, 0)
datetime.datetime(2014, 7, 19, 21, 0)
datetime.datetime(2014, 7, 19, 22, 0)
datetime.datetime(2014, 7, 19, 23, 0)]