cso_s5p module

The cso_s5p module provides classes to convert S5p data into a CSO format.

Class hierchy

The classes and are defined according to the following hierchy:

Classes

class cso_s5p.S5p_File(filename)

Bases: object

Base class to access data in S5p file.

Example of variables:

PRODUCT/
    dimensions:
        scanline = 3245 ;
        ground_pixel = 450 ;
        corner = 4 ;
        time = 1 ;
        layer = 34 ;
    variables:
        int time(time) ;
        float latitude (time, scanline, ground_pixel) ;
                        units = "degrees_east" ;
        float longitude(time, scanline, ground_pixel) ;
                        units = "degrees_north" ;
        float nitrogendioxide_tropospheric_column(time, scanline, ground_pixel) ;
                        units = "mol m-2" ;
                        multiplication_factor_to_convert_to_molecules_percm2 = 6.02214e+19f ;
            float nitrogendioxide_tropospheric_column_precision(time, scanline, ground_pixel) ;
                        units = "mol m-2" ;
                        multiplication_factor_to_convert_to_molecules_percm2 = 6.022141e+19f ;
            float averaging_kernel(time, scanline, ground_pixel, layer) ;
                        units = "1" ;
            ubyte qa_value(time, scanline, ground_pixel) ;
                        units = "1" ;
                        long_name = "data quality value" ;
                        comment = "A continuous quality descriptor, varying between 0 (no data) and 1 (full quality data). Recommend to ignore data with qa_value < 0.5" ;
    SUPPORT_DATA/
        GEOLOCATIONS/
            variables:
                float solar_zenith_angle(time, scanline, ground_pixel) ;
                            units = "degree" ;
                float latitude_bounds   (time, scanline, ground_pixel, corner) ;
                                units = "degrees_east" ;
                float longitude_bounds  (time, scanline, ground_pixel, corner) ;
                                units = "degrees_north" ;
        INPUT_DATA/
            variables:
                float surface_altitude(time, scanline, ground_pixel) ;
                            units = "m" ;
                float surface_pressure(time, scanline, ground_pixel) ;
                            units = "Pa" ;
                float surface_albedo(time, scanline, ground_pixel) ;
                            units = "1" ;
                float cloud_fraction_crb(time, scanline, ground_pixel) ;
                            units = "1" ;

Arguments:

  • filename : name of input file

GetDim(dimname)

Return length of dimension.

GetData(path, units=None)

Extract data array from input file and perform some adhoc fixes:

  • Eventually insert units attribute if not present yet, this is then taken from the argument.

  • Convert data to target units if necessary.

  • Add a long_name attribute if not present yet.

Arguments:

  • path : variable path in input

Optional arguments:

  • units : target units

Return values:

  • da : :py:xarray.DataArray` object

SelectPixels(rcf, rckey, indent='')

Apply filters specified in rcfile.

Arguments:

  • rcf : RcFile object with settings

  • rckey : basename for rcfile keys, e.g. “s5p” for the example below

Return values:

  • selected : boolean array with shape of track (time,scan,pixel) which is True if a pixel passed all checks;

  • history : list of character string describing the selelections.

Example configuration:

! Specifiy a list of filter names.
! For each name, specify the variable which values are used for testing,
! the type test, and eventually some thresholds or other settings for this type.
! The units of the thresholds should match with the units in the variables,
! the expected units have to be defined too.
! The examples below show possible types and their settings.

! filters:
s5p.filters                        :  lons lats albedo valid

! select range of values:
s5p.filter.lons.var                :  PRODUCT/longitude
s5p.filter.lons.type               :  minmax
s5p.filter.lons.minmax             :  -15.0 35.0
s5p.filter.lons.units              :  degrees_east

! select above a minimum:
s5p.filter.lats.var                :  PRODUCT/latitude
s5p.filter.lats.type               :  minmax
s5p.filter.lats.minmax             :  35.0 70.0
s5p.filter.lats.units              :  degrees_north

! select below a maximum:
s5p.filter.albedo.var              :  Data Fields/SurfaceAlbedo
s5p.filter.albedo.type             :  max
s5p.filter.albedo.max              :  0.3
s5p.filter.albedo.units            :  1

! select only values with data (no "_FillValue"):
s5p.filter.valid.var               :  Data Fields/NO2RetrievalTroposphericVerticalColumn
s5p.filter.valid.type              :  valid
class cso_s5p.CSO_S5p_File(filename=None)

Bases: CSO_File

Storage for CSO satellite file filled from S5p data.

AddSelection(sfile, selected, rcf, rcbase, indent='')

Add selected S5p pixels to satellite extract file.

Arguments:

The first setting that is read is a list with variable names to be created in the target file:

<rcbase>.output.vars    :  longitude corner_longitudes \
                           latitude corner_latitudes \
                           vcd  ...

For each variable, a series of settings should be specified that describe how the variable should look like and how to create it.

The first setting is a list of dimension names that define the shape of the variable. Supported dimensions are:

  • pixel : selected pixels

  • corner : number of footprint bounds (probably 4)

  • layer : number of layers in atmospheric profile (layers in kernel)

  • layeri : number of layer interfaces in atmospheric profile (layer+1)

  • retr : number of layers in retrieval product (1 for columns)

  • retr0 : same as retr, used for matrix dimensions (retr,retr0) to avoid repeated dimensions where the cf-checker complains about

  • track_scan : original scan index in 2D track

  • track_pixel : original ground pixel in 2D track

For a 1D variable with values per pixel the dimension setting is therefore:

<rcbase>.output.var.longitude.dims    :   pixel

For most variables it is sufficient to provide only the name of the original variable from which the data should be read:

<rcbase>.output.var.longitude.from    :   Geolocation Fields/Longitude

For some variables a special processing needs to be done. For these variables a key ‘special’ is used which will enable the correct conversion. The following specials are currently implemented:

  • longitude_bounds : longitude bounds per pixel; if pixel covers date line, corner values are reset to values outside [-180,+180] to ensure that the footprints remains convex with the center inside:

    <rcbase>.output.var.longitude_bounds.from            :   PRODUCT/SUPPORT_DATA/GEOLOCATIONS/longitude_bounds
    <rcbase>.output.var.longitude_bounds.special         :   longitude_bounds
    
  • track_longitude : longitudes at centers of original 2D track; requires a .from setting

  • track_latitude : latitudes at centers of original 2D track; requires a .from setting

  • track_longitude_bounds : longitude bounds of original 2D track; if pixel covers date line, corner values are reset to values outside [-180,+180] to ensure that the footprints remains convex with the center inside;

  • track_latitude_bounds : latitude bounds of original 2D track; requires a .from setting

  • ground_pixel : index of ground pixel in original 2D track; requires a .from setting

  • sum : create a variable as the sum over over layers; requires a .from setting

  • square : create a variable as the square of the input; requires a .from setting

  • time : create time stamps per pixel from a reference time and a time delta; requires settings:

    <rcbase>.output.var.time.tref      :   PRODUCT/time
    <rcbase>.output.var.time.dt        :   PRODUCT/delta_time
    
  • hybounds_to_pressure : form pressure from hybride sigma pressure coordinate, where the available hybride coefficients have shape ('layer',2); requires settings:

    <rcbase>.output.var.pressure.sp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
    <rcbase>.output.var.pressure.hyab     :   PRODUCT/tm5_constant_a
    <rcbase>.output.var.pressure.hybb     :   PRODUCT/tm5_constant_b
    <rcbase>.output.var.pressure.units    :   Pa
    
  • hymid_to_pressure : form pressure from hybride sigma pressure coordinate, where the available hybride coefficients are valid for the middle of the layers and therefore have shape ('layer'); requires settings:

    <rcbase>.output.var.pressure.sp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
    <rcbase>.output.var.pressure.hyam     :   PRODUCT/SUPPORT_DATA/INPUT_DATA/hyam
    <rcbase>.output.var.pressure.hybm     :   PRODUCT/SUPPORT_DATA/INPUT_DATA/hybm
    <rcbase>.output.var.pressure.units    :   Pa
    
  • pmid_to_pressure : interpolate and extrapolate from pressure at mid of layers to boundaries; requires settings:

    <rcbase>.output.var.pressure.pmid     :   PRODUCT/SUPPORT_DATA/INPUT_DATA/pressure
    <rcbase>.output.var.pressure.units    :   Pa
    
  • sp_pmid_to_pressure : interpolate from pressure at mid of layers to boundaries, use surface pressure and zero for first and last; requires settings:

    <rcbase>.output.var.pressure.sp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
    <rcbase>.output.var.pressure.pmid     :   PRODUCT/SUPPORT_DATA/INPUT_DATA/pressure
    <rcbase>.output.var.pressure.units    :   Pa
    
  • sp_dp_to_pressure : form pressure from surface pressure and a constant pressure step per layer (top at zero); requires settings:

    <rcbase>.output.var.pressure.sp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
    <rcbase>.output.var.pressure.dp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/pressure_interval
    <rcbase>.output.var.pressure.units    :   Pa
    
  • pbottom_to_pressure : form pressure from pressures at bottom of layer, use zero for top; requires settings:

    <rcbase>.output.var.pressure.pbottom  :   PRODUCT/SUPPORT_DATA/INPUT_DATA/bottom_pressure
    <rcbase>.output.var.pressure.units    :   Pa
    
  • salt_altmid_to_altitude : interpolate from altitude at mid of layers to boundaries, use surface altitude for first, and extrapolation for last; requires settings:

    <rcbase>.output.var.altitude.salt       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_altiutde
    <rcbase>.output.var.altitude.altmid     :   PRODUCT/SUPPORT_DATA/INPUT_DATA/altitude
    <rcbase>.output.var.altitude.units      :   m
    
  • kernel_trop : create averaging kernel for tropospheric column as the original kernel times the ratio between total-air-mass-factor and tropospheric-air-mass-factor:

    \[K_{tropo} ~=~ K~ AMF_{total}~/~{AMF}_{tropo}\]

    Required settings:

    <rcbase>.output.var.kernel.avk        :   PRODUCT/averaging_kernel
    <rcbase>.output.var.kernel.amf        :   PRODUCT/air_mass_factor_total
    <rcbase>.output.var.kernel.amft       :   PRODUCT/air_mass_factor_troposphere
    <rcbase>.output.var.kernel.troplayer  :   PRODUCT/tm5_tropopause_layer_index
    

    The variable specified by troplayer is used to reset the higher layers of the kernel (strospher) to zero.

  • kernel_by_dh : convert a kernel in m (per layer) to a unitless kernel using divison by a constant layer height:

    \[A ~=~ A_m~/~dh\]

    The layer thinkness dh is taken from a layers coordinate that defines the middle of a layer for a regular grid; it is checked that the coordinate defines a regular spacing.

    Required settings:

    <rcbase>.output.var.kernel.avk        :   PRODUCT/averaging_kernel
    <rcbase>.output.var.kernel.layer      :   PRODUCT/layer
    
  • square : create a variable as the square of the input; requires a .from setting.

Optionally swap layers, for example to have profiles in upward direction (surface to top) rather than downward (top to bottom):

<rcbase>.output.var.longitude.swap_layers  :  True

Optionally provide a target data type; by default original data type in the input file is used:

<rcbase>.output.var.longitude.dtype   :  f4

Optionally provide target units too. In the (unlikely) case that the original variable has no units attribute, this setting is required to define the (assumed) units. If the provided units are different from the units in the original variable, the data is converted to the provided units:

<rcbase>.output.var.longitude.units   :   degrees_east

Optionally provide a dictionairy with attributes to be added. If the attribute value is None, the attribute is removed if present from the input; this is sometimes needed if the CF compliance checker complains:

!~ skip some attributes, cf-checker complains ...
<rcbase>.output.var.qa_value.attrs         :   { 'valid_min' : None, 'valid_max' : None }
ResetLongitudeBounds(data)

Reset longitudes in (scan,pixel,corner) array such that they form a contineous series without jumps when passing the date line.

class cso_s5p.CSO_S5p_Convert(rcfile, rcbase='', env={}, indent='')

Bases: UtopyaRc

Convert raw S5p observations to CSO format. During conversion, a variable and pixel selection could be applied.

This version will also download source files if not available yet, and (optionally) remove them after conversion. This is useful in case storage capacity is limitted and the entire archive of source files cannot be permanently mirrored.

Arguments:

  • rcfile, rcbase, env : settings file, prefix for keys, and environment dictionairy

A time range is read to select the files to be converted:

<rcbase>.timerange.start        :  2018-06-01 00:00
<rcbase>.timerange.end          :  2018-06-03 23:59

The input files are searched in a table created by an inquire class, for example CSO_DataSpace_Inquire or CSO_PAL_Inquire. These have scanned the archives to examine which processings and versions are available, and stored the result in a csv file. Specify the name of the csv file; this might contain templates for a date that is taken from another key:

! listing of available source files,
! created by for example 'inquire' job:
<rcbase>.inquire.file                   :  /data/Copernicus/S5p/Copernicus_S5P_NO2_%Y-%m-%d.csv
!! date used in filename, leave empty for today:
!<rcbase>.inquire.filedate               :  2022-01-28

Specify the directory where the input files are to be searched, or where to download them to if not present yet. A flag is used to decide whether downloaded files should be removed immediately after conversion:

! target dir for downloads:
<rcbase>.input.dir                      :  /data/Copernicus/S5P/%{processing}/NO2/%Y/%m

! remove downloaded input files after convert?
<rcbase>.downloads.cleanup              :  False

The input files keep the same name as used in the DataSpace archive, for example:

/data/Copernicus/S5P/OFFL/NO2/2018/07/S5P_OFFL_L2__NO2____20180701T005930_20180701T024100_03698_01_010002_20180707T022838.nc
                                                            start_time      end_time      orbit

Provide ‘;’ seperated list of to decide if a particular orbit file should be processed. If more than one file is available for a particular orbit (from “OFFL” and “RPRO” processing), the file with the first match will be used. The expressions should include templates ‘%{header}’ for the column values. Example to select files from collection ‘03’, preferably from processing ‘RPRO’ but otherwise from ‘OFFL’:

<rcbase>.selection           :  (%{collection} == '03') and (%{processing} == 'RPRO') ; \
                                (%{collection} == '03') and (%{processing} == 'OFFL')

Sometimes a file cannot be converted, for example because it is corrupted or could not be downloaded at all. Specify an (optional) blacklist:

! skip some input files:
<rcbase>.blacklist         :  S5P_PAL__L2__NO2____20190806T022006_20190806T040136_09388_01_020301_20211110T020511.nc

By default the conversion will stop if a file is corrupted or could not be downloaded. To let the conversion firs process all files, an option is present to create a so-called error file. An error file has the same name as the target file of the conversion, but with extension .err instead of .nc. The error file contains a text that describes what is wrong with the source file, for example that it cannot be opened. Enable the creation of error files with the following flag:

! enable error files for missing or corrupted input files?
<rcbase>.create-error-files    :  True

If this flag is enabled, and an error file is found instead of the target file, the conversion will simply skip this target and will not try to download the source file again.

If an input file should be converted, it is read into a S5p_File object. The SelectPixels method is called to select pixels based on critera defined in the settings; see its documentation for how to configure the pixel selection. This method also returns a history line to desribe the selection, which will be added as attribute to the output file.

If no pixels are selected, for example because an orbit is outside the target domain, an informative message is written to a so-called message file. A message file has the same name as the target file of the conversion, but with extension .msg instead of .nc. If this file is present, the conversion will simply skip this target and will for example not try to download the source file again.

The output file is created as an CSO_S5p_File object. It’s AddSelection method is called with the input object as argument, and this will copy the selected pixels for variables specified in the settings. The Write method creates the file.

The name of the output files should be specified with a directory and filename; the later could include a template for the orbit number:

! output filename:
! - times are taken from mid of selection, rounded to hours
! - replace templates with column values of listing, for example:
!      %{orbit}, %{processing}, ...
<rcbase>.output.filename       :  /Scratch/CSO/S5p/RPRO/NO2/Europe/%Y/%m/S5p_RPRO_NO2_%{orbit}.nc

Optionally define a creation mode for the (parent) directories:

! directory creation mode:
<rcbase>.dmode                         :  0o775

A flag is read to decide if existing output files should be renewed or kept:

<rcbase>.renew                  :  True

Global attributes for the target file should be specified with:

! global attributes:
<rcbase>.output.attrs               :  format Conventions author institution email
!
<rcbase>.output.attr.format         :  1.0
<rcbase>.output.attr.Conventions    :  CF-1.7
<rcbase>.output.attr.author         :  T. Emplate
<rcbase>.output.attr.institution    :  CSO
<rcbase>.output.attr.email          :  t.emplate@cso.org

To reduce file size, by default all variables are written to file as short-integers (2 bytes) accompanied by add_offset and scale_factor attributes. A flag is available to disable packing. In addition, zlib compression is enabled. The default compression level is 1 (out of 9); set the following flag to a higher level to have stronger compression (on expense of computation time), or set to 0 to disable compression:

! pack floats as shorts:
<rcbase>.output.packed              :  True
! zlib compression level (default 1, 0 for no compression):
<rcbase>.output.complevel           :  1

For testing an (optional) whitelist could be provided with output filenames (no path); if defined, only the listed files will be created:

! testing: create only these files:
<rcbase>.whitelist         :  S5p_RPRO_NO2_123456.nc
class cso_s5p.CSO_S5p_Download(rcfile, rcbase='', env={}, indent='')

Bases: UtopyaRc

Download S5p observations. This class is a simplified version of the CSO_S5p_Convert class, but without the conversion part.

Arguments:

  • rcfile, rcbase, env : settings file, prefix for keys, and environment dictionairy

A time range is read to select the files to be downloaded:

<rcbase>.timerange.start        :  2018-06-01 00:00
<rcbase>.timerange.end          :  2018-06-03 23:59

The input files are searched in a table created by an inquire class, for example CSO_DataSpace_Inquire or CSO_PAL_Inquire. These have scanned the archives to examine which processings and versions are available, and stored the result in a csv file. Specify the name of the csv file; this might contain templates for a date that is taken from another key:

! listing of available source files,
! created by for example 'inquire' job:
<rcbase>.inquire.file                   :  /data/Copernicus/S5p/Copernicus_S5P_NO2_%Y-%m-%d.csv
!! date used in filename, leave empty for today:
!<rcbase>.inquire.filedate               :  2022-01-28

Specify the directory where the input files are to be searched, or where to download them to if not present yet:

! target dir for downloads:
<rcbase>.input.dir                      :  /data/Copernicus/S5P/%{processing}/NO2/%Y/%m

The input files keep the same name as used in the DataSpace archive, for example:

/data/Copernicus/S5P/OFFL/NO2/2018/07/S5P_OFFL_L2__NO2____20180701T005930_20180701T024100_03698_01_010002_20180707T022838.nc
                                                            start_time      end_time      orbit

Provide a ‘;’-seperated list of selection criteria to decide if a particular orbit file should be processed. If more than one file is available for a particular orbit (from “OFFL” and “RPRO” processing), the file with the first match will be used. The expressions should include templates ‘%{header}’ for the column values. Example to select files from collection ‘03’, preferably from processing ‘RPRO’ but otherwise from ‘OFFL’:

<rcbase>.selection           :  (%{collection} == '03') and (%{processing} == 'RPRO') ; \
                                (%{collection} == '03') and (%{processing} == 'OFFL')

Sometimes a file cannot be converted, for example because it is corrupted or could not be downloaded at all. Specify an (optional) blacklist:

! skip some input files:
<rcbase>.blacklist         :  S5P_PAL__L2__NO2____20190806T022006_20190806T040136_09388_01_020301_20211110T020511.nc
class cso_s5p.CSO_S5p_Listing(rcfile, rcbase='', env={}, indent='')

Bases: UtopyaRc

Create listing file for converted orbit files.

A listing file contains the names of the converted orbit files, and the time range of pixels in the file:

filename                     ;start_time                   ;end_time                     ;orbit
2018/06/S5p_RPRO_NO2_03272.nc;2018-06-01T01:32:46.673000000;2018-06-01T01:36:12.948000000;03272
2018/06/S5p_RPRO_NO2_03273.nc;2018-06-01T03:12:53.649000000;2018-06-01T03:17:43.082000000;03273
2018/06/S5p_RPRO_NO2_03274.nc;2018-06-01T04:52:43.586000000;2018-06-01T04:59:12.377000000;03274
:

This file will be used by the observation operator to selects orbits with pixels valid for a desired time range.

In the settings, define the name of the file to be created:

! csv file that will hold records per file with:
! - timerange of pixels in file
! - orbit number
<rcbase>.file        :  /Scratch/CSO/S5p/RPRO/NO2/Europe/listing.csv

Optionally define a creation mode for the (parent) directories:

! directory creation mode:
<rcbase>.dmode                         :  0o775

An existing listing file is not replaced, unless the following flag is set:

! renew table?
<rcbase>.renew           :  True

Orbit files are searched within a timerange:

<rcbase>.timerange.start        :  2018-06-01 00:00
<rcbase>.timerange.end          :  2018-06-03 23:59

Specify filename filters to search for orbit files; the patterns are relative to the basedir of the listing file, and might contain templates for the time values. Multiple patterns could be defined; if for a certain orbit number more than one file is found, the first match is used. This could be explored to create a listing that combines reprocessed data with near-real-time data:

<rcbase>.patterns            :  RPRO/NO2/Europe/%Y/%m/S5p_*.nc                                         OFFL/NO2/Europe/%Y/%m/S5p_*.nc

Usually the time range is read from the file, but in case the file does not have a time accordinate, then the following flag might be used to force that the time that matches with the filename is used:

! adhoc: use the time for which file is valid as timerange;
! this is used for the synthetic S4 data that have no time record ...
<rcbase>.use_t              :  True

The orbit column in the listing is extra, and is read from global attributes; the list of extra columns is defined with:

! extra columns to be added, read from global attributes:
<rcbase>.xcolumns           :  orbit
class cso_s5p.CSO_S5p_Download_Listing(rcfile, rcbase='', env={}, indent='')

Bases: UtopyaRc

Create listing file for files downloaded from S5P data portals.

A listing file contains the names of the converted orbit files, the time range of pixels in the file, and other information extracted from the filenames:

filename ;mission;processing;product_id;start_time ;end_time ;orbit;collection;processor_version;processing_time RPRO/CH4/2018/04/S5P_RPRO_L2__CH4____20180430T001851_20180430T020219_02818_01_010301_20190513T141133.nc;S5P ;RPRO ;L2__CH4___;2018-04-30T00:18:51;2018-04-30T02:02:19;02818;01 ;010301 ;2019-05-13T14:11:33 RPRO/CH4/2018/04/S5P_RPRO_L2__CH4____20180430T020021_20180430T034349_02819_01_010301_20190513T135953.nc;S5P ;RPRO ;L2__CH4___;2018-04-30T02:00:21;2018-04-30T03:43:49;02819;01 ;010301 ;2019-05-13T13:59:53 :

This file could be used to scan for available versions and how they were produced.

In the settings, define the name of the file to be created:

! csv file that will hold records per file with:
! - timerange of pixels in file
! - orbit number
! time templates are replaced with todays date
<rcbase>.file        :  /Scratch/Copernicus/S5p/listing-CH4__%Y-%m-%d.csv

An existing listing file is not replaced, unless the following flag is set:

! renew table?
<rcbase>.renew           :  True

Orbit files are searched within a timerange:

<rcbase>.timerange.start        :  2018-06-01 00:00
<rcbase>.timerange.end          :  2018-06-03 23:59

Specify filename filters to search for orbit files; the patterns are relative to the basedir of the listing file, and might contain templates for the time values. Multiple patterns could be defined; if for a certain orbit number more than one file is found, the first match is used. This could be explored to create a listing that combines reprocessed data with near-real-time data:

<rcbase>.patterns            :  RPRO/CH4/%Y/%m/S5p_*.nc                                         OFFL/CH4/%Y/%m/S5p_*.nc