Skip to content

hdfeos/pyncml

 
 

Repository files navigation

pyncml

A simple python library to apply NcML logic to NetCDF files

Installation

Stable
pip install pyncml
Development
pip install git+https://github.com/kwilcox/pyncml.git

Supported

  • Adding things * Attributes: <attribute name="some_new_attribute" type="string" value="some_standard_name" />

  • Renaming things

    • Variables: <variable name="new_var" orgName="old_var" />
    • Attributes: <attribute name="new_attr" orgName="old_attr" />
    • Dimensions: <dimension name="new_dim" orgName="old_dim" />
  • Removing things

    • Variables: <remove name="some_variable" type="variable" />
    • Attributes: <remove name="some_variable" type="variable" />
  • Aggregating things

    • Scans: <scan location="some_directory/foo/bar/" suffix=".nc" subdirs="true" />

Not supported

  • Adding variables (could be implemented in the future)
  • Groups (could be implemented in the future)
  • Setting actual data values on variables (could be implemented in the future)
  • Creating a file from scratch (could be implemented in the future)
  • Removing Dimensions (not implemented in the C library)
  • Aggregation scans that utilize the dateFormatMark attribute (most likely will never be implemented)

Usage

Apply

The apply function takes in a path to the input_file NetCDF file, an ncml object (string, file path, or python etree object), and an optional output_file. If an output_file is not specified, the input_file will be edited in place. The object returned from the apply function is a netcdf4-python object, ready to be used.

Any location attributes in the NcML are ignored and the NcML is applied against the file specified as the input_file.

Editing a file in place
netcdf = '/some/file/path/in.nc'
ncml   = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml)
Using an NcML file
netcdf = '/some/file/path/in.nc'
out    = '/some/file/path/out.nc'
ncml   = '/some/file/path/foo.ncml'
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
Using an NcML string
netcdf = '/some/file/path/in.nc'
out    = '/some/file/path/out.nc'
ncml   = """<?xml version="1.0" encoding="UTF-8"?>
         <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
             <attribute name="new_attribute" value="works" />
             <attribute name="new_history" orgName="history" />
             <attribute name="new_file_format" orgName="file_format" value="New Format" />
             <remove name="source" type="attribute" />
         </netcdf>
         """
import pyncml
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)
Using an etree object
import pyncml
netcdf = '/some/file/path/in.nc'
out    = '/some/file/path/out.nc'
ncml   = pyncml.etree.fromstring("""<?xml version="1.0" encoding="UTF-8"?>
         <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
             <attribute name="new_attribute" value="works" />
         </netcdf>
         """)
nc = pyncml.apply(input_file=netcdf, ncml=ncml, output_file=out)

Scan

The scan function takes in a path to an ncml object (string, file path, or python etree object). The object returned from the scan function is a metadata object describing the scan aggregation it is not a netcdf4-python object of the aggregation. You can create a netcdf4-python object from the scan aggregation (example below).

Obtaining aggregation metadata
ncml   = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)

print agg.starting
2014-06-20 00:00:00+00:00

print agg.ending
2014-07-19 23:00:00+00:00

print agg.timevar_name
u'time'

print agg.standard_names
[
  u'time',
  u'projection_y_coordinate',
  u'projection_x_coordinate',
  u'eastward_wind_velocity'
]

print agg.members  # These are already sorted by the 'starting' date
[
  {
    'starting':       datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
    'ending':         datetime.datetime(2014, 6, 20, 0, 0, tzinfo=<UTC>),
    'path':           '/path/to/aggregation/defined/in/ncml/first_member.nc'
    'standard_names': [u'time',
                       u'projection_y_coordinate',
                       u'projection_x_coordinate',
                       u'eastward_wind_velocity'],
  },
  {
    'starting':       datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
    'ending':         datetime.datetime(2014, 6, 20, 1, 0, tzinfo=<UTC>),
    'path':           '/path/to/aggregation/defined/in/ncml/second_member.nc'
    'standard_names': [u'time',
                       u'projection_y_coordinate',
                       u'projection_x_coordinate',
                       u'eastward_wind_velocity'],
  },
  ...
]
Creating netcdf4-python Aggregation object

Note: This will not work with aggregations whose members overlap in time!

ncml   = '/some/file/path/foo.ncml'
import pyncml
agg = pyncml.scan(ncml=ncml)
files = [ f.path for f in agg.members ]
agg = netCDF4.MFDataset(files)
time = agg.variables.get(agg.timevar_name)

print time
<class 'netCDF4._Variable'>
float64 time('time',)
    long_name: date time
    units: hours since 1970-01-01 00:00:00
    _CoordinateAxisType: Time
unlimited dimensions = ('time',)
current size = (14,)

print time[:]
[ 389784.  389785.  389786.  389787.  389788.  389789.  389790.  389791.
  389792.  389793.  390500.  390501.  390502.  390503.]

print netCDF4.num2date(time[:], units=time.units)
[datetime.datetime(2014, 6, 20, 0, 0) datetime.datetime(2014, 6, 20, 1, 0)
 datetime.datetime(2014, 6, 20, 2, 0) datetime.datetime(2014, 6, 20, 3, 0)
 datetime.datetime(2014, 6, 20, 4, 0) datetime.datetime(2014, 6, 20, 5, 0)
 datetime.datetime(2014, 6, 20, 6, 0) datetime.datetime(2014, 6, 20, 7, 0)
 datetime.datetime(2014, 6, 20, 8, 0) datetime.datetime(2014, 6, 20, 9, 0)
 datetime.datetime(2014, 7, 19, 20, 0)
 datetime.datetime(2014, 7, 19, 21, 0)
 datetime.datetime(2014, 7, 19, 22, 0)
 datetime.datetime(2014, 7, 19, 23, 0)]

About

A simple python library to apply NcML logic to NetCDF files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%