cso_file module

Create and access file with satellite data extract.

Methods

The following methods are defined:

Class hierchy

The classes are defined according to the following hierchy:

Methods and classes

cso_file.CheckDir(filename, dmode=None)

Check if filename has a directory path; if so, create that directory if it does not exist yet. Optional dmode (default 0o777) is used for creation of directories.

cso_file.Pack_DataArray(da, dtype='i2')

Set encoding parameters of xarray.DataArray object da; on return, the argument is changed in-place.

Argument dtype defines the data type in which the array will be written; by default this is a short integer (i2, 2-bytes) instead of a float. Attributes are added that define how to decode values:

short pressure(z) ;
pressure:add_offset   = 53257.7493494103 ;
pressure:scale_factor = 1.50569706659902 ;
pressure:_FillValue   = 32767s ;

The actual values are recalculated using the formula:

values = add_offset + scale_factor * values__short

The packing parameters are computed using:

scale_factor = (vmax - vmin)/( fmax - fmin)
add_offset   = vmin - scale_factor * fmin

where [vmin,vmax] is the range of input values, and [fmin,fmax] the range of possible values of the stored data type.

The scale_factor should not become 0.0 since it is used for division during encoding. This will be the case when the input has the same value in all elements; in that case, the following parameters are used that will lead to zero packed values:

scale_factor = 1.0
add_offset   = vmin
class cso_file.CSO_File(filename=None)

Bases: object

Storage for satelite data extract.

If optional filename is provided, the Read() method is called to read data from a file. If the optional varname list is specified too, only selected variables will be read.

To create a new track, initialize without arguments and add new variables using the AddVariable() method. Only certain dimension names and combinations are supported.

Close()

Close dataset.

AddDim(dname, length)

Add dimension.

Arguments:

  • dname : (str) dimension name to be created

  • len : (int) dimension length

GetDimLength(dname)

Return dimension length.

GetShape(dnames)

Return shape following specified dimension names.

AddCoord(cname, da)

Add coordinate variable.

Arguments:

  • cname : coordinate name to be created

  • da : xarray.DataArray object

GetCoords(cnames)

Return dict with coordinate variables as specified in cnames.

AddTimeCoord(times, bounds=None)

Add time coordinate variable.

Arguments:

  • times : time stamp series created with for example pandas.date_range() method

Optional arguments:

* ``bounds=(times1,times2)`` : tupple or list with two timeseries that define the time bounds
AddVariable(vname, da)

Add variable field.

Arguments:

  • vname : variable name to be created

  • da : xarray.DataArray object

AddDataVariable(vname, dnames, dtype='f4', attrs={}, values=None)

Add variable with specified name and dimensions.

Arguments:

  • vname : variable name to be created

  • dnames : dimension names, e.g. ('time','station')

Optional arguments:

  • dtype : datatype keyword

  • attrs : dict with variable attributes

  • values : array with data values

Write(filename, unlimited_dims=None, attrs=None, history=[], packed=True, complevel=1)

Write data to provided filename.

Optional arguments:

  • attrs : dictionairy with global attributes

  • history : list of str values that describe how the content was created, this will be added to the global history attribute

  • unlimited_dims : names of unlimited dimesions

Optional arguments to disable file size reduction measures:

  • packed : by default float arrays are written as short-integers accompanied with add_offset and scale_factor attributes; set to False to keep original data types.

  • complevel : zlib compression level, default 1 (of 9) to have decent compression without too much overhead; set to 0 to have no compression at all

The following example shows the impact of packing and compression on file sizes; in this exmple, the combination of packing and deflation reduces file sizes with 83%%:

301.009.424  S5p_RPRO_GLYOX_01130.nc
111.648.738  S5p_RPRO_GLYOX_01130__complevel1.nc
104.134.232  S5p_RPRO_GLYOX_01130__complevel9.nc
153.334.891  S5p_RPRO_GLYOX_01130__packed.nc
 41.020.139  S5p_RPRO_GLYOX_01130__packed_complevel1.nc
GetDims(varname)

Return variable dimension names.

GetData(varname)

Return data values and units for named variable.

GetTrack(da)

Extract variable on track grid.

Arguments:

  • da : xarray.DataArray or variable name

Return values:

  • xx : corner longitudes (nimage+1,npixel+1)

  • yy : corner latitudes (nimage+1,npixel+1)

  • values : masked array with values at pixel locations (nimage,npixel)

  • units : str units

The corner locations are created using the following variables that are expected to be present:

  • track_longitude_bounds : 3D array of longitudes with dimensions ('scan','pixel','corner')

  • track_latitude_bounds : idem for latitudes

  • pixel : compress index that defines the ('scan','pixel') location

Example usage to obtain retrieval from data file that includes the track information:

xx,yy,vcd_retr,vcd_units = dfile.GetTrack( 'vcd' )

Example usage for simulated retrieval from state file:

xx,yy,vcd_retr,vcd_units = dfile.GetTrack( sfile.ds['vcd'] )
GetTimeRange()

Return time range of pixels in file. A variable time should be present for this.

GetTimeAverage(freq=None)

Return average of first and last time value.

Optional arguments:

  • freq : round time to given fequency, for example ‘60min’.

GetAttr(name, quiet=False)

Return global attribute with provided name. If the attribtue is not present an error is raised, unless quiet=True.

SelectPixels(rcf, rckey, indent='')

Apply filters specified in rcfile.

Arguments:

  • rcf : RcFile object with settings

  • rckey : basename for rcfile keys, e.g. “s5p” for the example below

Return values:

  • selected : boolean array with shape of track (pixel) which is True if a pixel passed all checks;

  • history : list of character string describing the selelections.

Example configuration:

! Specifiy a list of filter names.
! For each name, specify the variable which values are used for testing,
! the type test, and eventually some thresholds or other settings for this type.
! The units of the thresholds should match with the units in the variables,
! the expected units have to be defined too.
! The examples below show possible types and their settings.

! filters:
s5p.filters                        :  lons lats albedo quality detected valid

! select range of values:
s5p.filter.lons.var                :  longitude
s5p.filter.lons.type               :  minmax
s5p.filter.lons.minmax             :  -15.0 35.0
s5p.filter.lons.units              :  degrees_east

! select above a minimum:
s5p.filter.lats.var                :  latitude
s5p.filter.lats.type               :  minmax
s5p.filter.lats.minmax             :  35.0 70.0
s5p.filter.lats.units              :  degrees_north

! select up to a maximum:
s5p.filter.albedo.var              :  albedo
s5p.filter.albedo.type             :  max
s5p.filter.albedo.max              :  0.3
s5p.filter.albedo.units            :  1

! select from a minimum:
s5p.filter.quality.var             :  qa_value
s5p.filter.quality.type            :  min
s5p.filter.quality.max             :  0.8
s5p.filter.quality.units           :  1

! select on flag:
s5p.filter.detected.var            :  detection_flag
s5p.filter.detected.type           :  equal
s5p.filter.detected.value          :  1

! select only values with data (no "_FillValue"):
s5p.filter.valid.var               :  vmr
s5p.filter.valid.type              :  valid
class cso_file.CSO_Listing(filename=None, indent='')

Bases: object

Storage for table with file properties. The records describe properties of CSO_File files, for example the time range of the pixels included. This could be used to quickly scan an archive for filenames with pixels within a requested interval.

The content of the table file looks like:

filename;start_time;end_time
2007/06/RAL-IASI-CH4_20070601T003855.nc;2007-06-01 00:45:36.827000000;2007-06-01 02:23:57.567000000
2007/06/RAL-IASI-CH4_20070601T022359.nc;2007-06-01 02:23:59.512000000;2007-06-01 04:05:57.328000000
...

Optional arguments:

  • filename : listing file read into table

Save(filename, dmode=None, indent='')

Write table to file.

GetRecord(irec)

Return record for corresponding 0-based row index.

GetValues(name)

Return pandas.Series object with all values for provided column name.

keys()

Returns list of column names.

Cleanup(indent='')

Remove records from table if filename is not present anymore.

Check(filename, indent='')

Return exception if no record for filename is present.

Update(filename, csf, tr=None, xcolumns=[], indent='')

Add record (or replace) for filename with information from the csf CSO_File object.

The time range of pixels is read from the csf object, unless the datetime.datetime tupple tr=(t1,t2) is supplied.

The optional xcolumns specify a list of global attributes to be added as extra columns.

UpdateRecord(filename, data, indent='')

Add record (or replace) for filename with columns and values defined in the data dict.

Select(tr=None, method='overlap', expr=None, blacklist=[], indent='', **kwargs)

Return CSO_Listing objects with selection of records.

Optional arguments:

  • tr=(t1,t2) : select records that could be assigned to this timerange; requires the method keyword:

    • method='overlap' (default) : select records that have pixels overlapping with interval

    • method='middle' : select pixels with the middle of start_time and end_time within the inerval (t1,t2]

  • name=value arguments select records based on column values

  • The exprssion expr provides a list of ‘;’-seperated selection expressions with templates %{..} for the column names:

    (%{processor_version} == '020400') & (%{processing} == 'RPRO') ; ...
    

    This is evaluted after the previously described selections. The result should be None or exactly one record. Eventually skip files in blacklists.

Append(lst, path='')

Append content of other listing.

Sort(by='filename')

Sort listing table by filename or other key.