cso_s5p module

The cso_s5p module provides classes to convert S5p data into a CSO format.

Class hierchy

The classes and are defined according to the following hierchy:

Classes

class cso_s5p.S5p_File(filename)

Bases: object

Base class to access data in S5p file.

Example of variables:

PRODUCT/
    dimensions:
        scanline = 3245 ;
        ground_pixel = 450 ;
        corner = 4 ;
        time = 1 ;
        layer = 34 ;
    variables:
        int time(time) ;
        float latitude (time, scanline, ground_pixel) ;
              units = "degrees_east" ;
        float longitude(time, scanline, ground_pixel) ;
              units = "degrees_north" ;
        float nitrogendioxide_tropospheric_column(time, scanline, ground_pixel) ;
              units = "mol m-2" ;
              multiplication_factor_to_convert_to_molecules_percm2 = 6.02214e+19f ;
        float nitrogendioxide_tropospheric_column_precision(time, scanline, ground_pixel) ;
              units = "mol m-2" ;
              multiplication_factor_to_convert_to_molecules_percm2 = 6.022141e+19f ;
        float averaging_kernel(time, scanline, ground_pixel, layer) ;
              units = "1" ;
        ubyte qa_value(time, scanline, ground_pixel) ;
              units = "1" ;
              long_name = "data quality value" ;
              comment = "A continuous quality descriptor, varying between 0 (no data) and 1 (full quality data). Recommend to ignore data with qa_value < 0.5" ;
    SUPPORT_DATA/
        GEOLOCATIONS/
            variables:
                float solar_zenith_angle(time, scanline, ground_pixel) ;
                      units = "degree" ;
                float latitude_bounds   (time, scanline, ground_pixel, corner) ;
                      units = "degrees_east" ;
                float longitude_bounds  (time, scanline, ground_pixel, corner) ;
                      units = "degrees_north" ;
        INPUT_DATA/
            variables:
                float surface_altitude(time, scanline, ground_pixel) ;
                      units = "m" ;
                float surface_pressure(time, scanline, ground_pixel) ;
                      units = "Pa" ;
                float surface_albedo(time, scanline, ground_pixel) ;
                      units = "1" ;
                float cloud_fraction_crb(time, scanline, ground_pixel) ;
                      units = "1" ;

Arguments:

  • filename : name of input file

GetData(path, units=None)

Extract data array from input file and perform some adhoc fixes:

  • Eventually insert units attribute if not present yet, this is then taken from the argument.

  • Convert data to target units if necessary.

  • Add a long_name attribute if not present yet.

Arguments:

  • path : variable path in input

Optional arguments:

  • units : target units

Return values:

  • da : :py:xarray.DataArray` object

SelectPixels(rcf, rckey, indent='')

Apply filters specified in rcfile.

Arguments:

  • rcf : RcFile object with settings

  • rckey : basename for rcfile keys, e.g. “s5p” for the example below

Return values:

  • selected : boolean array with shape of track (time,scan,pixel) which is True if a pixel passed all checks;

  • history : list of character string describing the selelections.

Example configuration:

! Specifiy a list of filter names.
! For each name, specify the variable which values are used for testing,
! the type test, and eventually some thresholds or other settings for this type.
! The units of the thresholds should match with the units in the variables,
! the expected units have to be defined too.
! The examples below show possible types and their settings.

! filters:
s5p.filters                        :  lons lats albedo valid

! select range of values:
s5p.filter.lons.var                :  PRODUCT/longitude
s5p.filter.lons.type               :  minmax
s5p.filter.lons.minmax             :  -15.0 35.0
s5p.filter.lons.units              :  degrees_east

! select above a minimum:
s5p.filter.lats.var                :  PRODUCT/latitude
s5p.filter.lats.type               :  minmax
s5p.filter.lats.minmax             :  35.0 70.0
s5p.filter.lats.units              :  degrees_north

! select below a maximum:
s5p.filter.albedo.var              :  Data Fields/SurfaceAlbedo
s5p.filter.albedo.type             :  max
s5p.filter.albedo.max              :  0.3
s5p.filter.albedo.units            :  1

! select only values with data (no "_FillValue"):
s5p.filter.valid.var               :  Data Fields/NO2RetrievalTroposphericVerticalColumn
s5p.filter.valid.type              :  valid
class cso_s5p.CSO_S5p_File(filename=None)

Bases: cso_file.CSO_File

Storage for CSO satellite file filled from S5p data.

AddSelection(sfile, selected, rcf, rcbase, indent='')

Add selected OMI to satellite extract file.

Arguments:

The first setting that is read is a list with variable names to be created in the target file:

<rcbase>.output.vars    :  longitude corner_longitudes \
                           latitude corner_latitudes \
                           vcd  ...

For each variable, a series of settings should be specified that describe how the variable should look like and how to create it.

The first setting is a list of dimension names that define the shape of the variable. Supported dimensions are:

  • pixel : selected pixels

  • corner : number of footprint bounds (probably 4)

  • layer : number of layers in atmospheric profile (layers in kernel)

  • layeri : number of layer interfaces in atmospheric profile (layer+1)

  • retr : number of layers in retrieval product (1 for columns)

  • retr0 : same as retr, used for matrix dimensions (retr,retr0) to avoid repeated dimensions where the cf-checker complains about

  • track_scan : original scan index in 2D track

  • track_pixel : original ground pixel in 2D track

For a 1D variable with values per pixel the dimension setting is therefore:

<rcbase>.output.var.longitude.dims    :   pixel

For most variables it is sufficient to provide only the name of the original variable from which the data should be read:

<rcbase>.output.var.longitude.from    :   Geolocation Fields/Longitude

For some variables a special processing needs to be done. For these variables a key ‘special’ is used which will enable the correct conversion. The following specials are currently implemented:

  • track_longitude : longiudes at centers of original 2D track; requires a .from setting

  • track_latitude : latiudes at centers of original 2D track; requires a .from setting

  • track_longitude_bounds : longiude bounds at centers of original 2D track; requires a .from setting

  • track_latitude_bounds : latiude bounds at centers of original 2D track; requires a .from setting

  • ground_pixel : index of ground pixel in original 2D track; requires a .from setting

  • sum : create a variable as the sum over over layers; requires a .from setting

  • square : create a variable as the square of the input; requires a .from setting

  • time : create time stamps per pixel from a reference time and a time delta; requires settings:

    <rcbase>.output.var.time.tref      :   PRODUCT/time
    <rcbase>.output.var.time.dt        :   PRODUCT/delta_time
    
  • hybounds_to_pressure : form pressure from hybride sigma pressure coordinate, where the available hybride coefficients have shape ('layer',2); requires settings:

    <rcbase>.output.var.pressure.sp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
    <rcbase>.output.var.pressure.hyab     :   PRODUCT/tm5_constant_a
    <rcbase>.output.var.pressure.hybb     :   PRODUCT/tm5_constant_b
    <rcbase>.output.var.pressure.units    :   Pa
    
  • hymid_to_pressure : form pressure from hybride sigma pressure coordinate, where the available hybride coefficients are valid for the middle of the layers and therefore have shape ('layer'); requires settings:

    <rcbase>.output.var.pressure.sp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
    <rcbase>.output.var.pressure.hyam     :   PRODUCT/SUPPORT_DATA/INPUT_DATA/hyam
    <rcbase>.output.var.pressure.hybm     :   PRODUCT/SUPPORT_DATA/INPUT_DATA/hybm
    <rcbase>.output.var.pressure.units    :   Pa
    
  • sp_dp_to_pressure : form pressure from surface pressure and a constant pressure step per layer (top at zero); requires settings:

    <rcbase>.output.var.pressure.sp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
    <rcbase>.output.var.pressure.dp       :   PRODUCT/SUPPORT_DATA/INPUT_DATA/pressure_interval
    <rcbase>.output.var.pressure.units    :   Pa
    
  • kernel_trop : create averaging kernel for tropospheric column as the original kernel times the ratio between total-air-mass-factor and tropospheric-air-mass-factor:

    \[K_{tropo} ~=~ K~ AMF_{total}~/~{AMF}_{tropo}\]

    Required settings:

    <rcbase>.output.var.kernel.avk        :   PRODUCT/averaging_kernel
    <rcbase>.output.var.kernel.amf        :   PRODUCT/air_mass_factor_total
    <rcbase>.output.var.kernel.amft       :   PRODUCT/air_mass_factor_troposphere
    <rcbase>.output.var.kernel.troplayer  :   PRODUCT/tm5_tropopause_layer_index
    

    The variable specified by troplayer is used to reset the higher layers of the kernel (strospher) to zero.

  • kernel_by_dh : convert a kernel in m (per layer) to a unitless kernel using divison by a constant layer height:

    \[A ~=~ A_m~/~dh\]

    The layer thinkness dh is taken from a layers coordinate that defines the middle of a layer for a regular grid; it is checked that the coordinate defines a regular spacing.

    Required settings:

    <rcbase>.output.var.kernel.avk        :   PRODUCT/averaging_kernel
    <rcbase>.output.var.kernel.layer      :   PRODUCT/layer
    
  • square : create a variable as the square of the input; requires a .from setting.

Optionally provide target units too. In the (unlikely) case that the original variable has no units attribute, this setting is required to define the (assumed) units. If the provided units are different from the units in the original variable, the data is converted to the provided units:

<rcbase>.output.var.longitude.units   :   degrees_east

Optionally provide a dictionairy with attributes to be added. If the attribute value is None, the attribute is removed if present from the input; this is sometimes needed if the CF compliance checker complains:

!~ skip some attributes, cf-checker complains ...
<rcbase>.output.var.qa_value.attrs         :   { 'valid_min' : None, 'valid_max' : None }
class cso_s5p.CSO_S5p_Convert(rcfile, rcbase='', env={}, indent='')

Bases: utopya_rc.UtopyaRc

Convert raw S5p observations to CSO format. During conversion, a variable and pixel selection could be applied.

This version will also download source files if not available yet, and (optionally) remove them after conversion. This is useful in case storage capacity is limitted and the entire archive of source files cannot be permanently mirrored.

Arguments:

  • rcfile, rcbase, env : settings file, prefix for keys, and environment dictionairy

A time range is read to select the files to be converted:

<rcbase>.timerange.start        :  2018-06-01 00:00
<rcbase>.timerange.end          :  2018-06-03 23:59

The input files are searched in a table created by an inquire class, for example CSO_SciHub_Inquire or CSO_PAL_Inquire. These have scanned the archives to examine which processings and versions are available, and stored the result in a csv file. Specify the name of the csv file; this might contain templates for a date that is taken from another key:

! listing of available source files,
! created by 'inquire-s5phub' job:
<rcbase>.inquire.file                   :  /data/Copernicus/S5p/Copernicus_S5P_NO2_%Y-%m-%d.csv
!! date used in filename, leave empty for today:
!<rcbase>.inquire.filedate               :  2022-01-28

Specify the directory where the input files are to be searched, or where to download them to if not present yet. A flag is used to decide whether downloaded files should be removed immediately after conversion:

! target dir for downloads:
<rcbase>.input.dir                      :  /data/Copernicus/S5P/%{processing}/NO2/%Y/%m

! remove downloaded input files after convert?
<rcbase>.downloads.cleanup              :  False

The input files keep the same name as used in the SciHub archive, for example:

/data/Copernicus/S5P/OFFL/NO2/2018/07/S5P_OFFL_L2__NO2____20180701T005930_20180701T024100_03698_01_010002_20180707T022838.nc
                      start_time      end_time      orbit 

Specify which processings should be converted; other processings listed in the inquirerd table are ignored:

! list of processings:
<rcbase>.processings                    :  OFFL RPRO

Similar specify a list of processor versions:

! list of processor versions, empty for all: <rcbase>.processor_versions : 020301

Sometimes a file cannot be converted, for example because it is corrupted or could not be downloaded at all. Specify an (optional) blacklist:

! skip some input files:
<rcbase>.blacklist         :  S5P_PAL__L2__NO2____20190806T022006_20190806T040136_09388_01_020301_20211110T020511.nc

If an input file should be converted, it is read into a S5p_File object. The SelectPixels method is called to select pixels based on critera defined in the settings; see its documentation for how to configure the pixel selection. This method als returns a history line to desribe the selection, which will be added as attribute to the output file.

The output file is created as an CSO_S5p_File object. It’s AddSelection method is called with the input object as argument, and this will copy the selected pixels for variables specified in the settings. The Write method creates the file.

The name of the output files should be specified with a directory and filename; the later could include a template for the orbit number:

! output filename:
! - times are taken from mid of selection, rounded to hours
! - replace templates with column values of listing, for example:
!      %{orbit}, %{processing}, ...
<rcbase>.output.filename       :  /Scratch/CSO/S5p/RPRO/NO2/Europe/%Y/%m/S5p_RPRO_NO2_%{orbit}.nc

A flag is read to decide if existing output files should be renewed or kept:

<rcbase>.renew                  :  True              

Global attributes for the target file should be specified with:

! global attributes:
<rcbase>.output.attrs               :  format Conventions author institution email
!
<rcbase>.output.attr.format         :  1.0
<rcbase>.output.attr.Conventions    :  CF-1.7
<rcbase>.output.attr.author         :  T. Emplate
<rcbase>.output.attr.institution    :  CSO
<rcbase>.output.attr.email          :  t.emplate@cso.org

For testing an (optional) whitelist could be provided with output filenames (no path); if defined, only the listed files will be created:

! testing: create only these files:
<rcbase>.whitelist         :  S5p_RPRO_NO2_123456.nc
class cso_s5p.CSO_S5p_Listing(rcfile, rcbase='', env={}, indent='')

Bases: utopya_rc.UtopyaRc

Create listing file for converted orbit files.

A listing file contains the names of the converted orbit files, and the time range of pixels in the file:

filename                     ;start_time                   ;end_time                     ;orbit
2018/06/S5p_RPRO_NO2_03272.nc;2018-06-01T01:32:46.673000000;2018-06-01T01:36:12.948000000;03272
2018/06/S5p_RPRO_NO2_03273.nc;2018-06-01T03:12:53.649000000;2018-06-01T03:17:43.082000000;03273
2018/06/S5p_RPRO_NO2_03274.nc;2018-06-01T04:52:43.586000000;2018-06-01T04:59:12.377000000;03274
:

This file will be used by the observation operator to selects orbits with pixels valid for a desired time range.

In the settings, define the name of the file to be created:

! csv file that will hold records per file with:
! - timerange of pixels in file
! - orbit number
<rcbase>.file        :  /Scratch/CSO/S5p/RPRO/NO2/Europe/listing.csv

An existing listing file is not replaced, unless the following flag is set:

! renew table?
<rcbase>.renew           :  True

Orbit files are searched within a timerange:

<rcbase>.timerange.start        :  2018-06-01 00:00
<rcbase>.timerange.end          :  2018-06-03 23:59

Specify filename filters to search for orbit files; the patterns are relative to the basedir of the listing file, and might contain templates for the time values. Multiple patterns could be defined; if for a certain orbit number more than one file is found, the first match is used. This could be explored to create a listing that combines reprocessed data with near-real-time data:

<rcbase>.patterns            :  RPRO/NO2/Europe/%Y/%m/S5p_*.nc                                         OFFL/NO2/Europe/%Y/%m/S5p_*.nc

Usually the time range is read from the file, but in case the file does not have a time accordinate, then the following flag might be used to force that the time that matches with the filename is used:

! adhoc: use the time for which file is valid as timerange;
! this is used for the synthetic S4 data that have no time record ...
<rcbase>.use_t              :  True

The orbit column in the listing is extra, and is read from global attributes; the list of extra columns is defined with:

! extra columns to be added, read from global attributes:
<rcbase>.xcolumns           :  orbit