Sentinel-5p HCHO data processing¶
This chapter describes the tasks performed for processing Sentinel-5p HCHO data.
Product description¶
The product guides can be found at:
Sentinel-5p / Products and Algoritms
L2__HCHO___
,PUM-HCHO
Product User Manual
Features:
The retrieval product is a column density (mol/m2), which will be treated by CSO as a profile with \(n_r=1\) layers:
\[\mathbf{y}_r\]The simulation of a retrieval product from a model state does not require an apriori profile, and is denoted with:
\[\mathbf{y}_s ~=~ \mathbf{A}^{trop}\ \mathbf{H}\mathbf{x}\]where:
\(\mathbf{y}_s\) is the simulated retrieval (mol/m2) defined on \(n_r=1\) layers;
\(\mathbf{A}^{trop}\) is the tropospheric averaging kernel matrix with shape \((n_r,n_a)\); in this product, \(n_r=1\) the number of a priori layers; see also the remarks below;
\(\mathbf{x}\) is the atmospheric state, which consists of a 3D array of HCHO concentrations;
\(\mathbf{H}\) extracts a simulated profile from the state using horizontal and vertical interpolation; the result should be defined on the \(n_a\) a priori layers and have the units of the retrieval product (mol/m2).
In case \(\mathbf{x}\) is the true atmoshperic state, the retrieval error is quantified by the retrieval error covariance \(\mathbf{R}\) (in this scalar product a variance):
\[\mathbf{y}_s ~-~ \mathbf{A}^{trop}\ \mathbf{H}\mathbf{x}^{true} ~\sim~ \mathcal{N}\left(\mathbf{o},\mathbf{R}^{trop}\right)\]The retrieval status and quality is indicated by the
qa_value
. The recommended minimum is 0.5, this excludes cloudy scenes and other problematic retrievals.
Downloading Sentinel-5p data¶
Sentinel-5p data could be downloaded from the Copernicus Open Access Hub; see the cso_scihub module module for a detailed description.
The cso_scihub.CSO_SciHub_Download
class is available to download data
from this server.
The jobtree configuration could look like:
! download task:
cso.s5p.hcho.task.class : cso.CSO_SciHub_Download
cso.s5p.hcho.task.args : '${PWD}/rc/cso-s5p-hcho.rc', rcbase='cso.s5p.hcho.download-s5phub'
See the class documentation for the general settings that define the download.
Data could be download for different processing streams:
Near real time (NRTI)
Offline (OFFL) : available within weeks after observations;
Reprocessing (RPRO) : re-processing of all previously made observations.
For the Reproecessing stream the download query looks like:
cso.s5p.hcho.download-s5phub.query : platformname:Sentinel-5 AND \
producttype:L2__HCHO__ AND \
processinglevel:L2 AND \
processingmode:Reprocessing
The target directory for downloaded files is specified to have sub-directories per year and month:
! output archive, store in subdirs per month:
cso.s5p.hcho.download-s5phub.output.dir : ${my.work}/Copernicus/S5P/RPRO/HCHO/%Y/%m
The first downloaded files are then:
Copernicus/S5P/RPRO/HCHO/2018/06/S5P_RPRO_L2__HCHO___20180601T002022_20180601T020350_03272_01_010105_20190206T213821.nc
S5P_RPRO_L2__HCHO___20180601T020152_20180601T034520_03273_01_010105_20190206T214248.nc
See the section on File name convention in the Product User Manual for the meaning of the fields.
Conversion to CSO format¶
The ‘cso.s5p.hcho.convert
’ task creates netCDF files with selected pixels,
for example only those within some region or cloud free pixels.
The selection criteria are defined in the settings, and added
to the ‘history
’ attribute of the created files as reminder.
The work is done by the CSO_S5p_Convert
class,
which is initialized using the settings file:
! task initialization:
cso.s5p.hcho.convert.class : cso.CSO_S5p_Convert
cso.s5p.hcho.convert.args : '${PWD}/rc/cso-s5p-hcho.rc', rcbase='cso.s5p.hcho.convert'
See the class documentation for the general configuration, below some specific choices are described. The example is based on the S5p HCHO file from which the header is available in:
Pixel selection¶
The CSO_S5p_Convert
class calls the S5p_File.SelectPixels()
method
to create a pixel selection mask for the input file.
The selection is done using one or more filters.
First provide a list of filter names:
cso.s5p.hcho.convert.filters : lons lats valid quality sza error_ratio ground_pixel
Then provide for each filter the the input variable to be used for testing,
as a path name in the input file.
The next settings is the type of filter to be used, see the S5p_File.SelectPixels()
for supported types,
and the other settings required by the type.
The following is an example of a selection on longitude:
cso.s5p.hcho.convert.filter.lons.var : Geolocation Fields/Longitude
cso.s5p.hcho.convert.filter.lons.type : minmax
cso.s5p.hcho.convert.filter.lons.minmax : -30.0 45.0
cso.s5p.hcho.convert.filter.lons.units : degrees_east
Extension to the product guide¶
It is mentioned in the PUM that the quality flag and assesment are incomplete: Several additions quality filters were based on Vigouroux et al., from which we quote:
Several diagnostic variables are provided together with the measurements. Quality assurance (QA) values are defined to perform a quick selection of the observations. QA > 0.5 filters out most observations presenting an error flag or a solar zenith angle (SZA) larger than 70o, a cloud radiance fraction larger than 0.6 at 340 nm, or an air mass factor smaller than 0.1. The product Readme file reports that, in the current version, the QA values are not always correctly set over snow and ice regions or above an SZA of 75o. They also need tobe further checked over cloudy scenes. In the forthcoming S5P version 2, QA values will be refined and will exclude data with a surface albedo larger than 0.2 and a snow or ice warning as well as remaining SZAs larger than 75o.

Variable specification¶
The target file is created as an CSO_S5p_File
object.
It’s AddSelection
method is called with the input object as argument,
and this will copy the selected pixels for variables specified in the settings.
The variable specification starts with a list with variable names to be created in the target file:
cso.s5p.hcho.convert.output.vars : longitude longitude_bounds \
latitude latitude_bounds \
track_longitude track_longitude_bounds \
track_latitude track_latitude_bounds \
time \
pressure qa_value \
vcd vcd_errvar \
cloud_fraction \
ground_pixel \
cloud_pressure_crb \
solar_zenith_angle \
amf_troposphere \
quality \
kernel
For each variable settings should be specified that describe the shape of the variable
and how it should be filled from the input.
See the AddSelection
description for all options,
here we show some examples.
The longitude
and latitude
variables are copied almost directly out of the source files,
the only change that is applied is the selection of pixels.
All original attributes are copied, except for the bound
attribite since that would
give warnings from the CF-compliance checker:
cso.s5p.hcho.convert.output.var.longitude.dims : pixel
cso.s5p.hcho.convert.output.var.longitude.from : PRODUCT/longitude
cso.s5p.hcho.convert.output.var.longitude.attrs : { 'bounds' : None }
cso.s5p.hcho.convert.output.var.latitude.dims : pixel
cso.s5p.hcho.convert.output.var.latitude.from : PRODUCT/latitude
cso.s5p.hcho.convert.output.var.latitude.attrs : { 'bounds' : None }
Also the locations of the pixels in the original track are copied, since these are useful when creating plots. These cannot be copied directly but require special processing:
cso.s5p.hcho.convert.output.var.track_longitude.dims : track_scan track_pixel
cso.s5p.hcho.convert.output.var.track_longitude.special : track_longitude
cso.s5p.hcho.convert.output.var.track_longitude.from : PRODUCT/longitude
cso.s5p.hcho.convert.output.var.track_longitude.attrs : { 'bounds' : None }
cso.s5p.hcho.convert.output.var.track_latitude.dims : track_scan track_pixel
cso.s5p.hcho.convert.output.var.track_latitude.special : track_latitude
cso.s5p.hcho.convert.output.var.track_latitude.from : PRODUCT/latitude
cso.s5p.hcho.convert.output.var.track_latitude.attrs : { 'bounds' : None }
The observattion times are constructed from time steps relative to a reference time; this requires special processing too:
cso.s5p.hcho.convert.output.var.time.dims : pixel
cso.s5p.hcho.convert.output.var.time.special : time-delta
cso.s5p.hcho.convert.output.var.time.tref : PRODUCT/time
cso.s5p.hcho.convert.output.var.time.dt : PRODUCT/delta_time
The observed vertical column density could be copied directly.
The target shape is (pixel,retr)
where retr
is the number of layers in the retrieval product (1 in this case):
! vertical column density:
cso.s5p.hcho.convert.output.var.vcd.dims : pixel retr
cso.s5p.hcho.convert.output.var.vcd.from : PRODUCT/formaldehyde_tropospheric_vertical_column
In the converted files, the retrieval error is always expressed as a (co)variance matrix, to facilitate (future) conversion of profile products. In this example, it is filled from the square of the error standard deviation:
! error variance in vertical column density (after application of kernel),
! fill with square sums of random and systematic errors
! use dims with different names to avoid that cf-checker complains:
cso.s5p.hcho.convert.output.var.vcd_errvar.dims : pixel retr retr0
cso.s5p.hcho.convert.output.var.vcd_errvar.special : square_sum
cso.s5p.hcho.convert.output.var.vcd_errvar.from : PRODUCT/formaldehyde_tropospheric_vertical_column_precision
cso.s5p.hcho.convert.output.var.vcd_errvar.from2 : PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/formaldehyde_tropospheric_vertical_column_kernel_trueness
The averaging kernel is applied on atmospheric layers, defined by pressure levels.
In this product the pressure levels are defined using hybride-sigma-pressure coordinates,
and this requires special processing::
! Convert from hybride coefficient bounds in (2,nlev) aray to 3D half level pressure:
cso.s5p.hcho.convert.output.var.pressure.dims : pixel layeri
cso.s5p.hcho.convert.output.var.pressure.special : hybounds_to_pressure_hcho
cso.s5p.hcho.convert.output.var.pressure.sp : PRODUCT/SUPPORT_DATA/INPUT_DATA/surface_pressure
cso.s5p.hcho.convert.output.var.pressure.hyab : PRODUCT/SUPPORT_DATA/INPUT_DATA/tm5_constant_a
cso.s5p.hcho.convert.output.var.pressure.hybb : PRODUCT/SUPPORT_DATA/INPUT_DATA/tm5_constant_b
cso.s5p.hcho.convert.output.var.pressure.units : Pa
Averaging kernels are converted to matrices with shape ``(layer,retr)``.
! description:
cso.s5p.hcho.convert.output.var.kernel.dims : pixel layer retr
cso.s5p.hcho.convert.output.var.kernel.from : PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/averaging_kernel
Other variables can be copied directly::
! quality flag:
cso.s5p.hcho.convert.output.var.qa_value.dims : pixel
cso.s5p.hcho.convert.output.var.qa_value.from : PRODUCT/qa_value
!~ skip some attributes, cf-checker complains ...
cso.s5p.hcho.convert.output.var.qa_value.attrs : { 'valid_min' : None, 'valid_max' : None }
! cloud property:
cso.s5p.hcho.convert.output.var.cloud_fraction.from : PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_fraction_crb
cso.s5p.hcho.convert.output.var.cloud_fraction.units : 1
cso.s5p.hcho.convert.output.var.cloud_fraction.dims : pixel
cso.s5p.hcho.convert.output.var.solar_zenith_angle.from : PRODUCT/SUPPORT_DATA/GEOLOCATIONS/solar_zenith_angle
cso.s5p.hcho.convert.output.var.solar_zenith_angle.units : degree
cso.s5p.hcho.convert.output.var.solar_zenith_angle.dims : pixel
cso.s5p.hcho.convert.output.var.cloud_pressure_crb.from : PRODUCT/SUPPORT_DATA/INPUT_DATA/cloud_pressure_crb
cso.s5p.hcho.convert.output.var.cloud_pressure_crb.units : Pa
cso.s5p.hcho.convert.output.var.cloud_pressure_crb.dims : pixel
cso.s5p.hcho.convert.output.var.amf_troposphere.from : PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/formaldehyde_tropospheric_air_mass_factor
cso.s5p.hcho.convert.output.var.amf_troposphere.units : 1
cso.s5p.hcho.convert.output.var.amf_troposphere.dims : pixels
Output files¶
The name of the target files should be specified with a directory and filename; the later could include a template for the orbit number:
! output directory and filename:
! - times are taken from mid of selection, rounded to hours
! - use '%{orbit}' for orbit number
cso.s5p.hcho.convert.output.dir : /Scratch/CSO/S5p/RPRO/HCHO/Europe/%Y/%m
cso.s5p.hcho.convert.output.filename : S5p_RPRO_HCHO_%{orbit}.nc
A flag is read to decide if existing files should be renewed or kept:
cso.s5p.hcho.convert.renew : True
The target file is created as an CSO_S5p_File
object.
It’s AddSelection
method is called with the input object as argument,
and this will copy the selected pixels for variables specified in the settings.
The Write
method creates the file.
Global attributes for the target file should be specified with:
! global attributes:
cso.s5p.hcho.convert.output.attrs : format Conventions author institution email
!
cso.s5p.hcho.convert.output.attr.format : 1.0
cso.s5p.hcho.convert.output.attr.Conventions : CF-1.7
cso.s5p.hcho.convert.output.attr.author : Your Name
cso.s5p.hcho.convert.output.attr.institution : CSO
cso.s5p.hcho.convert.output.attr.email : Your.Name@cso.org
The conversion also creates (or updates) a listing file with the names of the created files (relative to the listing file), and the time range of pixels in the file:
! csv file that will hold records per file with:
! - timerange of pixels in file
! - orbit number
cso.s5p.hcho.convert.output.listing.file : /Scratch/CSO/S5p/listing-HCHO-Europe.csv
This file will be used by the observation operator to selects orbits with pixels valid for a desired time range. The listing is a csv file that looks something like:
filename ;start_time ;end_time ;orbit
2018/06/S5p_RPRO_HCHO_03272.nc;2018-06-01T01:32:46.673000000;2018-06-01T01:36:12.948000000;03272
2018/06/S5p_RPRO_HCHO_03273.nc;2018-06-01T03:12:53.649000000;2018-06-01T03:17:43.082000000;03273
2018/06/S5p_RPRO_HCHO_03274.nc;2018-06-01T04:52:43.586000000;2018-06-01T04:59:12.377000000;03274
:
Catalogue¶
The CSO_Catalogue
class could be used
to create a catalogue of images for the converted files.
Configuration could look like:
! catalogue creation task:
cso.s5p.hcho.catalogue.task.figs.class : cso.CSO_Catalogue
cso.s5p.hcho.catalogue.task.figs.args : '${PWD}/rc/cso-s5p-hcho.rc', \
rcbase='cso.s5p.hcho.catalogue'
The configuration describes where to find a listing file with orbits,
which variables should be plot, the colorbar properties, etc.
See CSO_Catalogue
class description for how
the settings in general look like.
The class creates figures for a list of variables:
! variables to be plotted:
cso.s5p.hcho.catalogue.vars : vcd vcd_errvar qa_value \
cloud_fraction cloud_radiance_fraction
By default the catalogue creator simply creates a map with the value of the a variable on the track. Optionally settings could be used to specifiy a different unit, or the value range for the colorbar:
! convert units:
cso.tutorial.catalogue.var.vcd.units : 1e15 mlc/cm2
! style:
cso.tutorial.catalogue.var.vcd.vmin : 0.0
cso.tutorial.catalogue.var.vcd.vmax : 10.0
Figures are saved to files with the basename of the original orbit file and the plotted variable:
/Scratch/CSO/catalogue/2018/06/01/S5p_RPRO_HCHO_03278__vcd.png
S5p_RPRO_HCHO_03278__qa_value.png
:

To search for interesting features in the data,
the Indexer
class could be used to create index pages.
Configuration could look like:
! index creation task:
cso.s5p.hcho.catalogue.task.index.class : utopya.Indexer
cso.s5p.hcho.catalogue.task.index.args : '${PWD}/rc/cso-s5p-hcho.rc', \
rcbase='cso.s5p.hcho.catalogue-index'
When succesful, the index creator displays an url that could be loaded in a browser:
Browse to:
file:///Scratch/CSO/catalogue/index.html

Configuration of observation operator¶
The observation operator described in chapter ‘Observation operator’ requires settings from an rcfile.
First specify the (relative) location of the listing file with orbit file names and time ranges:
! template for listing with converted files:
<rcbase>.listing : ../S5p/RPRO/HCHO/CAMS/listing.csv
The operator should read variables from the data files that are needed to simulate a retrieval from the model arrays. This includes for example the pressures that define the a priori layers, the averaging kernel, and for this product, the airmass factor and tropopause level. Specify a list of names for these variables:
! data variables:
tutorial.S5p.hcho.dvars : hp yr vr A M nla
Example settings:
! half-level pressures:
!~ dimensions, copied from data file:
tutorial.S5p.hcho.dvar.hp.dims : layeri
!~ source variable:
tutorial.S5p.hcho.dvar.hp.source : pressure
! retrieval:
!~ dimensions, copied from data file:
tutorial.S5p.hcho.dvar.yr.dims : retr
!~ source variable:
tutorial.S5p.hcho.dvar.yr.source : vcd
! retrieval error covariance:
!~ dimensions, copied from data file:
tutorial.S5p.hcho.dvar.vr.dims : retr retr
!~ source variable:
tutorial.S5p.hcho.dvar.vr.source : vcd_errvar
! kernel:
!~ dimensions, copied from data file:
tutorial.S5p.hcho.dvar.A.dims : retr layer
!~ source variable:
tutorial.S5p.hcho.dvar.A.source : kernel_trop
! number of apriori layers in retrieval layer:
!~ dimensions, copied from data file:
tutorial.S5p.hcho.dvar.nla.dims : retr
!~ source variable:
tutorial.S5p.hcho.dvar.nla.source : nla
For the simulated values, also define a list of variable names that should be created:
! state varaiables to be put out from model:
tutorial.S5p.hcho.vars : mod_conc mod_hp mod_tcc mod_cc hx ys shx
Example settings:
! model concentration profile:
!~ model layer dimension:
tutorial.S5p.hcho.var.mod_conc.dims : model_layer
!~ standard attributes:
tutorial.S5p.hcho.var.mod_conc.attrs : long_name units
tutorial.S5p.hcho.var.mod_conc.attr.long_name : model HCHO concentrations
tutorial.S5p.hcho.var.mod_conc.attr.units : ppb
! model hpentration profile:
!~ model layer interfaces:
tutorial.S5p.hcho.var.mod_hp.dims : model_layeri
!~ standard attributes:
tutorial.S5p.hcho.var.mod_hp.attrs : long_name units
tutorial.S5p.hcho.var.mod_hp.attr.long_name : model pressure at layer interfaces
tutorial.S5p.hcho.var.mod_hp.attr.units : Pa
! total cloud cover:
!~ no extra dimensions:
tutorial.S5p.hcho.var.mod_tcc.dims :
!~ standard attributes:
tutorial.S5p.hcho.var.mod_tcc.attrs : long_name units
tutorial.S5p.hcho.var.mod_tcc.attr.long_name : total cloud cover
tutorial.S5p.hcho.var.mod_tcc.attr.units : 1
! cloud cover profiles:
!~ model layer dimension:
tutorial.S5p.hcho.var.mod_cc.dims : model_layer
!~ standard attributes:
tutorial.S5p.hcho.var.mod_cc.attrs : long_name units
tutorial.S5p.hcho.var.mod_cc.attr.long_name : cloud cover
tutorial.S5p.hcho.var.mod_cc.attr.units : 1
! model concentrations at apriori layers:
!~ apriori layers:
tutorial.S5p.hcho.var.hx.dims : layer
!~ how computed:
tutorial.S5p.hcho.var.hx.formula : LayerAverage( hp, mod_hp, mod_conc )
tutorial.S5p.hcho.var.hx.formula_terms : hp: hp mod_hp: mod_hp mod_conc: mod_conc
!~ standard attributes:
tutorial.S5p.hcho.var.hx.attrs : long_name units
tutorial.S5p.hcho.var.hx.attr.long_name : model simulations at apriori layers
tutorial.S5p.hcho.var.hx.attr.units : mol m-2
! simulated retrievals
!~ retrieval layers:
tutorial.S5p.hcho.var.ys.dims : retr
!~ how computed:
tutorial.S5p.hcho.var.ys.formula : A x
tutorial.S5p.hcho.var.ys.formula_terms : A: A x: hx
!~ standard attributes:
tutorial.S5p.hcho.var.ys.attrs : long_name units multiplication_factor_to_convert_to_molecules_percm2
tutorial.S5p.hcho.var.ys.attr.long_name : simulated retrieval
tutorial.S5p.hcho.var.ys.attr.units : mol m-2
tutorial.S5p.hcho.var.ys.attr.multiplication_factor_to_convert_to_molecules_percm2 : float: 6.022141e+19
! partial columns as sum over apriori layers
!~ retrieval layers:
tutorial.S5p.hcho.var.shx.dims : retr
!~ how computed:
tutorial.S5p.hcho.var.shx.formula : PartialColumns( nla, x )
tutorial.S5p.hcho.var.shx.formula_terms : nla: nla x: hx
!~ standard attributes:
tutorial.S5p.hcho.var.shx.attrs : long_name units multiplication_factor_to_convert_to_molecules_percm2
tutorial.S5p.hcho.var.shx.attr.long_name : tropospheric column in local model
tutorial.S5p.hcho.var.shx.attr.units : mol m-2
tutorial.S5p.hcho.var.shx.attr.multiplication_factor_to_convert_to_molecules_percm2 : float: 6.022141e+19
Sim-Catalogue¶
The CSO_Catalogue
class could be used
to create a catalogue of images for the converted files.
Configuration could look like:
! catalogue creation task:
cso.s5p.TRACER.sim-catalogue.task.class : cso.CSO_SimCatalogue
cso.s5p.TRACER.sim-catalogue.task.args : '${PWD}/rc/cso-s5p-TRACER.rc', \
rcbase='cso.s5p.TRACER.sim-catalogue'
The configuration describes where to find a listing file with orbits,
which variables should be plot, the colorbar properties, etc.
See CSO_SimCatalogue
class description for how
the settings in general look like.
The class creates figures for a list of variables:
! variables to be plotted:
cso.s5p.hcho.catalogue.vars : yr ys
By default the catalogue creator simply creates a map with the value of the a variable on the track. Optionally settings could be used to specifiy a different unit, or the value range for the colorbar:
! variable:
cso.s5p.hcho.sim-catalogue.var.yr.source : data:vcd
! convert units:
cso.s5p.hcho.sim-catalogue.var.yr.units : 1e15 mlc/cm2
! style:
cso.s5p.hcho.sim-catalogue.var.yr.vmin : 0.0
cso.s5p.hcho.sim-catalogue.var.yr.vmax : 50.0
! variable:
cso.s5p.hcho.sim-catalogue.var.ys.source : state:y
! convert units:
cso.s5p.hcho.sim-catalogue.var.ys.units : 1e15 mlc/cm2
! style:
cso.s5p.hcho.sim-catalogue.var.ys.vmin : 0.0
cso.s5p.hcho.sim-catalogue.var.ys.vmax : 50.0
Figures are saved to files with the basename of the original orbit file and the plotted variable:
file://${my.run.base}/cso-catalogue/HCHO//2018/06/01/S5p_RPRO_HCHO_20180601_1200_yr.png
S5p_RPRO_HCHO_20180601_1200_ys.png
To search for interesting features in the data,
the Indexer
class could be used to create index pages.
Configuration could look like:
! index creation task:
cso.s5p.hcho.catalogue.task.index.class : utopya.Indexer
cso.s5p.hcho.catalogue.task.index.args : '${PWD}/rc/cso-s5p-hcho.rc', \
rcbase='cso.s5p.hcho.catalogue-index'
When succesful, the index creator displays an url that could be loaded in a browser:
Browse to:
file://${my.run.base}/cso-catalogue/HCHO/index__20180601.html
