Sentinel-5p CHOCHO data processing¶
This chapter describes the tasks performed for processing Sentinel-5p CHOCHO (glyoxal) data.
Product description¶
The product is described in:
- Lerot, C., Hendrick, F., Van Roozendael, M., Alvarado, L. M. A., Richter, A., De Smedt, I., Theys, N., Vlietinck, J., Yu, H., Van Gent, J., Stavrakou, T., Muller, J.-F., Valks, P., Loyola, D., Irie, H., Kumar, V., Wagner, T., Schreier, S. F., Sinha, V., Wang, T., Wang, P., and Retscher, C.:Glyoxal tropospheric column retrievals from TROPOMI - multi-satellite intercomparison and ground-based validation,Atmos. Meas. Tech., 14, 7775-7807, https://doi.org/10.5194/amt-14-7775-2021, 2021.
Data is provided via two archives:
The GLYRETRO project website. From here also a (draft) Product User Manual is availabe:
- Christophe Lerot:GLYoxal Retrievals from TROPOMI (GLYRETRO) - Product User Manual.
This data needs to be downloaded manually, see below.
Lates data is provided via the S5P-PAL Data Portal (see also the cso_pal module module for a more detailed description of this portal). This portal provides an updated PUM:
- S5P Glyoxal Product User Manual
The CSO tools could inqquire and download from this archive.
Features:
The retrieval product is a column density (mol/m2), which will be treated by CSO as a profile with \(n_r=1\) layers:
\[\mathbf{y}_r\]The simulation of a retrieval product from a model state does not require an apriori profile, and is denoted with:
\[\mathbf{y}_s ~=~ \mathbf{A}\ \mathbf{H}\mathbf{x}\]where:
\(\mathbf{y}_s\) is the simulated retrieval (mol/m2) defined on \(n_r=1\) layers;
\(\mathbf{A}\) is the tropospheric averaging kernel matrix with shape \((n_r,n_a)\); with \(n_a\) the number of a priori layers;
\(\mathbf{x}\) is the atmospheric state, which consists of a 3D array of CHOCHO concentrations;
\(\mathbf{H}\) extracts a simulated profile from the state using horizontal and vertical interpolation; the result should be defined on the \(n_a\) a priori layers and have the units of the retrieval product (mol/m2).
In case \(\mathbf{x}\) is the true atmoshperic state, the retrieval error is quantified by the retrieval error covariance \(\mathbf{R}\) (in this scalar product a variance):
\[\mathbf{y}_s ~-~ \mathbf{A}\ \mathbf{H}\mathbf{x}^{true} ~\sim~ \mathcal{N}\left(\mathbf{o},\mathbf{R}\right)\]The retrieval status and quality is indicated by the
qa_value
. The PUM section 5.4 recommends a minimum of 0.5; excludes cloudy scenes and other problematic retrievals.Some orbit files were found having pixels with an undefined
PRODUCT/time_delta
value, whileqa_value
was 1. For these pixels, also the pixel coordinates (PRODUCT/longitude
etc) are undefined.
Download GlyRetro data¶
In 2022-08 the best way to obtain Level-2 S5p/CHOCHO data was to:
browse to the GLYRETRO project website;
under the Data dropdown select Request data for download;
register for download to obtain a user name and password;
download data per chunk of days; tested with downloads of 5 days per request, max 6 requests at a time.
A download returns a zip file with multiple orbit files. The orbit files are organized in sub-directories per day:
2018/01/01/S5P_RPRO_L2__CHOCHO___20180101T004140_20180101T022310_01130_01_010000_20210201.nc
:
Note that the official S5p filename formatting rules require exactly 10 characters for the product identifier;
in the current product a 12-character key L2__CHOCHO__
is used.
CSO processing¶
(See Tutorial chapter for introduction to CSO scripts and configuration)
An example configuration of the CSO processing of the S5p/CHOCHO data is available via the following settings:
-
Top-level settings that configure the job-tree with various sub-tasks. This is a generic file that could be used for multiple S5 products, edit it to select the CHOCHO processing.
config/Copernicus/cso-user-settings.rc
User-specific settings such as the work directory.
config/Copernicus/cso-s5p-chocho.rc
Specific settings for CHOCHO product.
Start the job-tree using:
./bin/cso config/Copernicus/cso.rc
Selected sub-steps in the processing are described below.
Inquire archives¶
The data files might have been created in different processing streams, and/or using different processor versions. It is therefor useful to first inquire the archives (downloaded, or the PAL archive) to see which processor versions are available for a certain period.
The processing stream is identified by a 4-character key:
OFFL
: Offline, available within weeks after observations;RPRO
: re-processing of all previously made observations.PAL_
: processed data stored on the Product Algorithm Laboratory portal.
The portals provide data files created with the same retrieval algorithm, but most recent data (latest processor version) might be available on only one of the portals. It is therefore necessary to first inquire both archives to see which data is available where, and what the version numbers are.
The CSO_DataSpace_Inquire
class is available to inquire the
Copernicus DataSpace. The settings used by this class allow selection
on for example time range and intersection area.
The result is a csv file which with columns for keywords such as orbit number and processor version,
as well as the filename of the data and the url that should be used to actually download the data:
filename ;start_time ;end_time ;mission;processing;product_id;orbit;collection;processor_version;processing_time
2020/01/01/S5P_OFFL_L2__CHOCHO___20200101T005246_20200101T023416_11487_01_010000_20210128.nc;2020-01-01 00:52:46;2020-01-01 02:34:16;S5P ;OFFL ;L2__CHOCHO;11487;01 ;010000 ;2021-01-28
2020/01/01/S5P_OFFL_L2__CHOCHO___20200101T023416_20200101T041546_11488_01_010000_20210128.nc;2020-01-01 02:34:16;2020-01-01 04:15:46;S5P ;OFFL ;L2__CHOCHO;11488;01 ;010000 ;2021-01-28
:
See the section on File name convention in the Product User Manual for the meaning of all parts of the filename.
A similar class CSO_S5p_Download_Listing
class is available to list the content of the downloaded GlyRetro files.
Also this will produce a table file.
To visualize what is available from the portal, the
CSO_Inquire_Plot
could be used to create an overview figure:

The jobtree configuration to inquire the portals and create the overview figure could look like:
! single step:
cso.s5p.chocho.inquire.class : utopya.UtopyaJobStep
! inquire downloads and archive, plot overview:
cso.s5p.chocho.inquire.tasks : table-glyretro table-pal plot
!~ inquire files downloaded from GlyRetro:
cso.s5p.chocho.inquire.table-glyretro.class : cso.CSO_S5p_Download_Listing
cso.s5p.chocho.inquire.table-glyretro.args : '${PWD}/config/Copernicus/cso-s5p-chocho.rc', \
rcbase='cso.s5p.chocho.inquire-table-glyretro'
!~ inquire files available on PAL:
cso.s5p.chocho.inquire.table-pal.class : cso.CSO_PAL_Inquire
cso.s5p.chocho.inquire.table-pal.args : '${PWD}/config/Copernicus/cso-s5p-chocho.rc', \
rcbase='cso.s5p.chocho.inquire-table-pal'
!~ create plot of available versions:
cso.s5p.chocho.inquire.plot.class : cso.CSO_Inquire_Plot
cso.s5p.chocho.inquire.plot.args : '${PWD}/config/Copernicus/cso-s5p-chocho.rc', \
rcbase='cso.s5p.chocho.inquire-plot'
Conversion to CSO format¶
The ‘cso.s5p.chocho.convert
’ task creates netCDF files with selected pixels,
for example only those within some region or cloud free pixels.
The selection criteria are defined in the settings, and added
to the ‘history
’ attribute of the created files as reminder.
The work is done by the CSO_S5p_Convert
class,
which is initialized using the settings file:
! task initialization:
cso.s5p.chocho.convert.class : cso.CSO_S5p_Convert
cso.s5p.chocho.convert.args : '${PWD}/config/EMEP/cso-s5p-chocho.rc', rcbase='cso.s5p.chocho.convert'
See the class documentation for the general configuration, below some specific choices are described. The example is based on the S5p CHOCHO file from which the header is available in:
Orbit file selection¶
Based on the inquiry the download and conversion could be limitted to files created with the most recent processor versions.
For the S5P files a useful property is also the collection number, a 2-digit number that defines a collection of files (or actually processor versions) that together form a contineous series. The collection number is extracted from the filename, and stored as a column of the listing file.
The following setting is used to select specific files from the archive based on the properities stored in the listing file:
! Provide ';' seperated list of to decide if a particular orbit file should be processed.
! If more than one file is available for a particular orbit (from "OFFL" and "RPRO" processing),
! the file with the first match will be used.
! The expressions should include templates '%{header}' for the column values.
! Example to select files from collection '03', preferably from processing 'RPRO' but otherwise from 'OFFL':
! (%{collection} == '03') and (%{processing} == 'RPRO') ; \
! (%{collection} == '03') and (%{processing} == 'OFFL')
!
cso.s5p.chocho.convert.selection : (%{collection} == '03') and (%{processing} == 'RPRO') ; \
(%{collection} == '03') and (%{processing} == 'OFFL')
Pixel selection¶
The CSO_S5p_Convert
class calls the S5p_File.SelectPixels()
method
to create a pixel selection mask for the input file.
The selection is done using one or more filters.
First provide a list of filter names:
cso.s5p.chocho.convert.filters : lons lats valid quality sza error_ratio ground_pixel
Then provide for each filter the the input variable to be used for testing,
as a path name in the input file.
The next settings is the type of filter to be used, see the S5p_File.SelectPixels()
for supported types,
and the other settings required by the type.
The following is an example of a selection on longitude:
cso.s5p.chocho.convert.filter.lons.var : Geolocation Fields/Longitude
cso.s5p.chocho.convert.filter.lons.type : minmax
cso.s5p.chocho.convert.filter.lons.minmax : -30.0 45.0
cso.s5p.chocho.convert.filter.lons.units : degrees_east
Variable specification¶
The target file is created as an CSO_S5p_File
object.
It’s AddSelection
method is called with the input object as argument,
and this will copy the selected pixels for variables specified in the settings.
The variable specification starts with a list with variable names to be created in the target file:
cso.s5p.chocho.convert.output.vars : longitude longitude_bounds \
latitude latitude_bounds \
track_longitude track_longitude_bounds \
track_latitude track_latitude_bounds \
time \
qa_value \
pressure kernel amf_trop vmr_apri \
vcd vcd_errvar \
cloud_fraction solar_zenith_angle
For each variable settings should be specified that describe the shape of the variable
and how it should be filled from the input.
See the AddSelection
description for all options,
here we show some examples.
The longitude
and latitude
variables are copied almost directly out of the source files,
the only change that is applied is the selection of pixels.
All original attributes are copied, except for the bounds
attribite since that would
give warnings from the CF-compliance checker:
cso.s5p.chocho.convert.output.var.longitude.dims : pixel
cso.s5p.chocho.convert.output.var.longitude.from : PRODUCT/longitude
cso.s5p.chocho.convert.output.var.longitude.attrs : { 'bounds' : None }
cso.s5p.chocho.convert.output.var.latitude.dims : pixel
cso.s5p.chocho.convert.output.var.latitude.from : PRODUCT/latitude
cso.s5p.chocho.convert.output.var.latitude.attrs : { 'bounds' : None }
The pixel boundaries are necessary to know the exact footprint of a pixel,
which is for example used when averaging over a grid or simulation from a model.
These are available in the input files, but without a units
attribute as these
are implied by the pixel center coordinate; the conversion therefore requires that
units are defined explicitly.
For the longitude_bounds
a special processing is needed for pixels crossing the dateline,
as the original data simply uses longitudes modulo 360 degrees:
! corner longitudes; no units in file:
cso.s5p.chocho.convert.output.var.longitude_bounds.dims : pixel corner
cso.s5p.chocho.convert.output.var.longitude_bounds.from : PRODUCT/SUPPORT_DATA/GEOLOCATIONS/longitude_bounds
cso.s5p.chocho.convert.output.var.longitude_bounds.units : degrees_east
! ensure that near dateline the corners form a convex region around center
! (with some points outside [-180,+180] if necessary)
cso.s5p.chocho.convert.output.var.longitude_bounds.special : longitude_bounds
! corner latitudes, no units in file:
cso.s5p.chocho.convert.output.var.latitude_bounds.dims : pixel corner
cso.s5p.chocho.convert.output.var.latitude_bounds.from : PRODUCT/SUPPORT_DATA/GEOLOCATIONS/latitude_bounds
cso.s5p.chocho.convert.output.var.latitude_bounds.units : degrees_north
Also the locations of the pixels in the original track are copied, since these are useful when creating plots. These cannot be copied directly but require special processing:
cso.s5p.chocho.convert.output.var.track_longitude.dims : track_scan track_pixel
cso.s5p.chocho.convert.output.var.track_longitude.special : track_longitude
cso.s5p.chocho.convert.output.var.track_longitude.from : PRODUCT/longitude
cso.s5p.chocho.convert.output.var.track_longitude.attrs : { 'bounds' : None }
cso.s5p.chocho.convert.output.var.track_latitude.dims : track_scan track_pixel
cso.s5p.chocho.convert.output.var.track_latitude.special : track_latitude
cso.s5p.chocho.convert.output.var.track_latitude.from : PRODUCT/latitude
cso.s5p.chocho.convert.output.var.track_latitude.attrs : { 'bounds' : None }
The observattion times are constructed from time steps relative to a reference time; this requires special processing too:
cso.s5p.chocho.convert.output.var.time.dims : pixel
cso.s5p.chocho.convert.output.var.time.special : time-delta
cso.s5p.chocho.convert.output.var.time.tref : PRODUCT/time
cso.s5p.chocho.convert.output.var.time.dt : PRODUCT/delta_time
The observed vertical column density could be copied directly.
The target shape is (pixel,retr)
where retr
is the number of layers in the retrieval product (1 in this case):
! vertical column density:
cso.s5p.chocho.convert.output.var.vcd.dims : pixel retr
cso.s5p.chocho.convert.output.var.vcd.from : PRODUCT/formaldehyde_tropospheric_vertical_column
In the converted files, the retrieval error is always expressed as a (co)variance matrix, to facilitate (future) conversion of profile products. In this example, it is filled from the square of the error standard deviation:
! error variance in vertical column density (after application of kernel),
! fill with square sums of random and systematic errors
! use dims with different names to avoid that cf-checker complains:
cso.s5p.chocho.convert.output.var.vcd_errvar.dims : pixel retr retr0
cso.s5p.chocho.convert.output.var.vcd_errvar.special : square_sum
cso.s5p.chocho.convert.output.var.vcd_errvar.from : PRODUCT/formaldehyde_tropospheric_vertical_column_precision
cso.s5p.chocho.convert.output.var.vcd_errvar.from2 : PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/formaldehyde_tropospheric_vertical_column_kernel_trueness
The averaging kernel is applied on atmospheric layers, defined by pressure levels.
In this product the pressure levels are defined using hybride-sigma-pressure coordinates,
and this requires special processing::
! Convert from hybride coefficient bounds in (2,nlev) aray to 3D half level pressure:
cso.s5p.chocho.convert.output.var.pressure.dims : pixel layeri
cso.s5p.chocho.convert.output.var.pressure.special : pmid_to_pressure
cso.s5p.chocho.convert.output.var.pressure.pmid : PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/glyoxal_profile_apriori_pressure
cso.s5p.chocho.convert.output.var.pressure.units : Pa
cso.s5p.chocho.convert.output.var.pressure.units : Pa
Averaging kernels are converted to matrices with shape ``(layer,retr)``.
! description:
cso.s5p.chocho.convert.output.var.kernel.dims : pixel layer retr
cso.s5p.chocho.convert.output.var.kernel.from : PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/averaging_kernel
Other variables can be copied directly::
! quality flag:
cso.s5p.chocho.convert.output.var.qa_value.dims : pixel
cso.s5p.chocho.convert.output.var.qa_value.from : PRODUCT/qa_value
!~ skip some attributes, cf-checker complains ...
cso.s5p.chocho.convert.output.var.qa_value.attrs : { 'valid_min' : None, 'valid_max' : None }
! cloud property:
cso.s5p.chocho.convert.output.var.cloud_fraction.from : PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_fraction_crb
cso.s5p.chocho.convert.output.var.cloud_fraction.units : 1
cso.s5p.chocho.convert.output.var.cloud_fraction.dims : pixel
cso.s5p.chocho.convert.output.var.solar_zenith_angle.from : PRODUCT/SUPPORT_DATA/GEOLOCATIONS/solar_zenith_angle
cso.s5p.chocho.convert.output.var.solar_zenith_angle.units : degree
cso.s5p.chocho.convert.output.var.solar_zenith_angle.dims : pixel
cso.s5p.chocho.convert.output.var.cloud_pressure_crb.from : PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_pressure_crb
cso.s5p.chocho.convert.output.var.cloud_pressure_crb.units : Pa
cso.s5p.chocho.convert.output.var.cloud_pressure_crb.dims : pixel
cso.s5p.chocho.convert.output.var.amf_troposphere.from : PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/formaldehyde_tropospheric_air_mass_factor
cso.s5p.chocho.convert.output.var.amf_troposphere.units : 1
cso.s5p.chocho.convert.output.var.amf_troposphere.dims : pixels
Output files¶
The name of the target files should be specified with a directory and filename; the later could include a template for the orbit number:
! output directory and filename:
! - times are taken from mid of selection, rounded to hours
! - use '%{orbit}' for orbit number
cso.s5p.chocho.convert.output.dir : /Scratch/CSO/S5p/RPRO/CHOCHO/Europe/%Y/%m
cso.s5p.chocho.convert.output.filename : S5p_RPRO_CHOCHO_%{orbit}.nc
A flag is read to decide if existing files should be renewed or kept:
cso.s5p.chocho.convert.renew : True
The target file is created as an CSO_S5p_File
object.
It’s AddSelection
method is called with the input object as argument,
and this will copy the selected pixels for variables specified in the settings.
The Write
method creates the file.
Global attributes for the target file should be specified with:
! global attributes:
cso.s5p.chocho.convert.output.attrs : format Conventions author institution email
!
cso.s5p.chocho.convert.output.attr.format : 1.0
cso.s5p.chocho.convert.output.attr.Conventions : CF-1.7
cso.s5p.chocho.convert.output.attr.author : Your Name
cso.s5p.chocho.convert.output.attr.institution : CSO
cso.s5p.chocho.convert.output.attr.email : Your.Name@cso.org
The conversion also creates (or updates) a listing file with the names of the created files (relative to the listing file), and the time range of pixels in the file:
! csv file that will hold records per file with:
! - timerange of pixels in file
! - orbit number
cso.s5p.chocho.convert.output.listing.file : /Scratch/CSO/S5p/listing-CHOCHO-Europe.csv
This file will be used by the observation operator to selects orbits with pixels valid for a desired time range. The listing is a csv file that looks something like:
filename ;start_time ;end_time ;orbit
2018/06/S5p_RPRO_CHOCHO_03272.nc;2018-06-01T01:32:46.673000000;2018-06-01T01:36:12.948000000;03272
2018/06/S5p_RPRO_CHOCHO_03273.nc;2018-06-01T03:12:53.649000000;2018-06-01T03:17:43.082000000;03273
2018/06/S5p_RPRO_CHOCHO_03274.nc;2018-06-01T04:52:43.586000000;2018-06-01T04:59:12.377000000;03274
: