TNO Intern

Skip to content
......@@ -87,6 +87,10 @@ A summary of the versions and changes.
| Reformatted using 'black'.
| *(2023-08)*
* | *v2.5*
| Support new Copernicus *DataSpace* portal to download Sentinel data.
| *(2023-11)*
To be included
==============
......
.. Documentation for module.
.. Import documentation from ".py" file:
.. automodule:: cso_scihub
.. automodule:: cso_dataspace
......@@ -27,8 +27,8 @@ Classes used for specific tasks are implemented in the ``cso_*`` modules.
:maxdepth: 1
pymod-cso_inquire
pymod-cso_dataspace
pymod-cso_pal
pymod-cso_scihub
pymod-cso_s5p
pymod-cso_file
pymod-cso_gridded
......
......@@ -104,6 +104,36 @@ Note that the official S5p filename formatting rules require exactly 10 characte
in the current product a 12-character key ``L2__CHOCHO__`` is used.
CSO processing
==============
*(See* :ref:`tutorial` *chapter for introduction to CSO scripts and configuration)*
An example configuration of the CSO processing of the S5p/CHOCHO data is available via
the following settings:
* `config/Copernicus/cso.rc <../../../config/Copernicus/cso.rc>`_
Top-level settings that configure the job-tree with various sub-tasks.
This is a generic file that could be used for multiple S5 products,
edit it to select the CHOCHO processing.
* `config/Copernicus/cso-user-settings.rc <../../../config/Copernicus/cso-user-settings.rc>`_
User-specific settings such as the work directory.
* `config/Copernicus/cso-s5p-chocho.rc <../../../config/Copernicus/cso-s5p-chocho.rc>`_
Specific settings for CHOCHO product.
Start the job-tree using::
./bin/cso config/Copernicus/cso.rc
Selected sub-steps in the processing are described below.
Inquire archives
================
......
......@@ -60,14 +60,46 @@ Notes:
The recommended minimum is 0.5, this excludes cloudy scenes and other problematic retrievals.
.. Label between '.. _' and ':' ; use :ref:`text <label>` for reference
CSO processing
==============
*(See* :ref:`tutorial` *chapter for introduction to CSO scripts and configuration)*
An example configuration of the CSO processing of the S5p/CO data is available via
the following settings:
* `config/Copernicus/cso.rc <../../../config/Copernicus/cso.rc>`_
Top-level settings that configure the job-tree with various sub-tasks.
This is a generic file that could be used for multiple S5 products,
edit it to select the CO processing.
* `config/Copernicus/cso-user-settings.rc <../../../config/Copernicus/cso-user-settings.rc>`_
User-specific settings such as the work directory.
* `config/Copernicus/cso-s5p-co.rc <../../../config/Copernicus/cso-s5p-co.rc>`_
Specific settings for CO product.
Start the job-tree using::
./bin/cso config/Copernicus/cso.rc
Selected sub-steps in the processing are described below.
.. _s5p-co-inquire:
Inquire Sentinel-5p/CO archive
================================
S5p/CO observations are available from the
`Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_;
see the :ref:`cso-scihub` module for a detailed description.
`Copernicus DataSpace <https://dataspace.copernicus.eu/>`_;
see the :ref:`cso-dataspace` module for a detailed description.
Data is available for different processing streams, each identified by a 4-character key:
......@@ -80,8 +112,8 @@ but with different processor versions.
It is therefore necessary to first inquire both archives to see which data is available where,
and what the version numbers are.
The :py:class:`cso_scihub.CSO_SciHub_Inquire` class is available to inquire the
*Copernicus Open Access Hub*. The settings used by this class allow selection
The :py:class:`CSO_DataSpace_Inquire <cso_dataspace.CSO_DataSpace_Inquire>` class is available to inquire the
*Copernicus DataSpace*. The settings used by this class allow selection
on for example time range and intersection area.
The result is a csv file which with columns for keywords such orbit number and processor version,
as well as the filename of the data and the url that should be used to actually download the data::
......@@ -105,17 +137,19 @@ To visualize what is available from the various portals, the
The jobtree configuration to inquire the portals and create the overview figure could look like::
! single step:
cso.s5p.co.inquire-scihub.class : utopya.UtopyaJobStep
cso.s5p.co.inquire.class : utopya.UtopyaJobStep
! two tasks:
cso.s5p.co.inquire-scihub.tasks : table
!~ inquire available files:
cso.s5p.co.inquire-scihub.table.class : cso.CSO_SciHub_Inquire
cso.s5p.co.inquire-scihub.table.args : '${PWD}/config/ESA-S5p/cso-s5p-co.rc', \
rcbase='cso.s5p.co.inquire-s5phub-table'
cso.s5p.co.inquire.tasks : table-dataspace plot
!~ inquire files available on DataSpace:
cso.s5p.co.inquire.table-dataspace.class : cso.CSO_DataSpace_Inquire
cso.s5p.co.inquire.table-dataspace.args : '${PWD}/config/Copernicus/cso-s5p-co.rc', \
rcbase='cso.s5p.co.inquire-table-dataspace'
!~ create plot of available versions:
cso.s5p.co.inquire-scihub.plot.class : cso.CSO_SciHub_InquirePlot
cso.s5p.co.inquire-scihub.plot.args : '${PWD}/config/ESA-S5p/cso-s5p-co.rc', \
rcbase='cso.s5p.co.inquire-s5phub-plot'
cso.s5p.co.inquire.plot.class : cso.CSO_Inquire_Plot
cso.s5p.co.inquire.plot.args : '${PWD}/config/Copernicus/cso-s5p-co.rc', \
rcbase='cso.s5p.co.inquire-plot'
......
......@@ -65,6 +65,36 @@ References
| Atmos. Meas. Tech., 13, 3751-3767, `<https://doi.org/10.5194/amt-13-3751-2020>`_, 2020.
CSO processing
==============
*(See* :ref:`tutorial` *chapter for introduction to CSO scripts and configuration)*
An example configuration of the CSO processing of the S5p/HCHO data is available via
the following settings:
* `config/Copernicus/cso.rc <../../../config/Copernicus/cso.rc>`_
Top-level settings that configure the job-tree with various sub-tasks.
This is a generic file that could be used for multiple S5 products,
edit it to select the HCHO processing.
* `config/Copernicus/cso-user-settings.rc <../../../config/Copernicus/cso-user-settings.rc>`_
User-specific settings such as the work directory.
* `config/Copernicus/cso-s5p-hcho.rc <../../../config/Copernicus/cso-s5p-hcho.rc>`_
Specific settings for HCHO product.
Start the job-tree using::
./bin/cso config/Copernicus/cso.rc
Selected sub-steps in the processing are described below.
.. Label between '.. _' and ':' ; use :ref:`text <label>` for reference
.. _s5p-hcho-inquire:
......@@ -72,8 +102,8 @@ Inquire Sentinel-5p/HCHO archive
================================
S5p/HCHO observations are available from the
`Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_;
see the :ref:`cso-scihub` module for a detailed description.
`Copernicus DataSpace <https://dataspace.copernicus.eu/>`_;
see the :ref:`cso-dataspace` module for a detailed description.
Data is available for different processing streams, each identified by a 4-character key:
......@@ -86,8 +116,8 @@ but with different processor versions.
It is therefore necessary to first inquire both archives to see which data is available where,
and what the version numbers are.
The :py:class:`cso_scihub.CSO_SciHub_Inquire` class is available to inquire the
*Copernicus Open Access Hub*. The settings used by this class allow selection
The :py:class:`CSO_DataSpace_Inquire <cso_dataspace.CSO_DataSpace_Inquire>` class is available to inquire the
*Copernicus DataSpace*. The settings used by this class allow selection
on for example time range and intersection area.
The result is a csv file which with columns for keywords such orbit number and processor version,
as well as the filename of the data and the url that should be used to actually download the data::
......@@ -111,17 +141,19 @@ To visualize what is available from the various portals, the
The jobtree configuration to inquire the portals and create the overview figure could look like::
! single step:
cso.s5p.hcho.inquire-scihub.class : utopya.UtopyaJobStep
cso.s5p.hcho.inquire.class : utopya.UtopyaJobStep
! two tasks:
cso.s5p.hcho.inquire-scihub.tasks : table
!~ inquire available files:
cso.s5p.hcho.inquire-scihub.table.class : cso.CSO_SciHub_Inquire
cso.s5p.hcho.inquire-scihub.table.args : '${PWD}/config/ESA-S5p/cso-s5p-hcho.rc', \
rcbase='cso.s5p.hcho.inquire-s5phub-table'
cso.s5p.hcho.inquire.tasks : table-dataspace plot
!~ inquire files available on DataSpace:
cso.s5p.hcho.inquire.table-dataspace.class : cso.CSO_DataSpace_Inquire
cso.s5p.hcho.inquire.table-dataspace.args : '${PWD}/config/Copernicus/cso-s5p-hcho.rc', \
rcbase='cso.s5p.hcho.inquire-table-dataspace'
!~ create plot of available versions:
cso.s5p.hcho.inquire-scihub.plot.class : cso.CSO_SciHub_InquirePlot
cso.s5p.hcho.inquire-scihub.plot.args : '${PWD}/config/ESA-S5p/cso-s5p-hcho.rc', \
rcbase='cso.s5p.hcho.inquire-s5phub-plot'
cso.s5p.hcho.inquire.plot.class : cso.CSO_Inquire_Plot
cso.s5p.hcho.inquire.plot.args : '${PWD}/config/Copernicus/cso-s5p-hcho.rc', \
rcbase='cso.s5p.hcho.inquire-plot'
......
......@@ -175,6 +175,36 @@ and simulations.
CSO processing
==============
*(See* :ref:`tutorial` *chapter for introduction to CSO scripts and configuration)*
An example configuration of the CSO processing of the S5p/NO2 data is available via
the following settings:
* `config/Copernicus/cso.rc <../../../config/Copernicus/cso.rc>`_
Top-level settings that configure the job-tree with various sub-tasks.
This is a generic file that could be used for multiple S5 products,
edit it to select the NO2 processing.
* `config/Copernicus/cso-user-settings.rc <../../../config/Copernicus/cso-user-settings.rc>`_
User-specific settings such as the work directory.
* `config/Copernicus/cso-s5p-no2.rc <../../../config/Copernicus/cso-s5p-no2.rc>`_
Specific settings for NO2 product.
Start the job-tree using::
./bin/cso config/Copernicus/cso.rc
Selected sub-steps in the processing are described below.
.. Label between '.. _' and ':' ; use :ref:`text <label>` for reference
.. _s5p-no2-inquire:
......@@ -183,8 +213,8 @@ Inquire Sentinel-5p/NO2 archives
S5p/NO2 observations from KNMI have been available from at least these sources:
* `Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_;
see the :ref:`cso-scihub` module for a detailed description.
* `Copernicus DataSpace <https://dataspace.copernicus.eu/>`_;
see the :ref:`cso-dataspace` module for a detailed description.
*This is the operational version.*
......@@ -206,8 +236,8 @@ The portals provide data files created with the same retrieval algorithm, but mo
It is therefore necessary to first inquire both archives to see which data is available where,
and what the version numbers are.
The :py:class:`CSO_SciHub_Inquire <cso_scihub.CSO_SciHub_Inquire>` class is available to inquire the
*Copernicus Open Access Hub*. The settings used by this class allow selection
The :py:class:`CSO_DataSpace_Inquire <cso_dataspace.CSO_DataSpace_Inquire>` class is available to inquire the
*Copernicus DataSpace*. The settings used by this class allow selection
on for example time range and intersection area.
The result is a csv file which with columns for keywords such as orbit number and processor version,
as well as the filename of the data and the url that should be used to actually download the data::
......@@ -220,7 +250,7 @@ as well as the filename of the data and the url that should be used to actually
See the section on *File name convention* in the *Product User Manual* for the meaning of all
parts of the filename.
A similar class :py:class:`CSO_PAL_Inquire <cso_scihub.CSO_PAL_Inquire>` class is available to list the content
A similar class :py:class:`CSO_PAL_Inquire <cso_pal.CSO_PAL_Inquire>` class is available to list the content
of the *Product Algorithm Laboratory* portal. Also this will produce a table file.
To visualize what is available from the various portals, the
......@@ -234,24 +264,23 @@ To visualize what is available from the various portals, the
The jobtree configuration to inquire the portals and create the overview figure could look like::
! single step:
cso.s5p.no2.inquire-scihub.class : utopya.UtopyaJobStep
! inquire from portals, plot overview:
cso.s5p.no2.inquire.tasks : scihub pal plot
!~ create table with available files on portal:
cso.s5p.no2.inquire.scihub.class : cso.CSO_SciHub_Inquire
cso.s5p.no2.inquire.scihub.args : '${PWD}/config/EMEP/cso-s5p-no2.rc', \
rcbase='cso.s5p.no2.inquire-s5phub'
!~ create table with available files on portal:
cso.s5p.no2.inquire.pal.class : cso.CSO_PAL_Inquire
cso.s5p.no2.inquire.pal.args : '${PWD}/config/EMEP/cso-s5p-no2.rc', \
rcbase='cso.s5p.no2.inquire-pal'
cso.s5p.no2.inquire.class : utopya.UtopyaJobStep
! two tasks:
cso.s5p.no2.inquire.tasks : table-dataspace plot
!~ inquire files available on DataSpace:
cso.s5p.no2.inquire.table-dataspace.class : cso.CSO_DataSpace_Inquire
cso.s5p.no2.inquire.table-dataspace.args : '${PWD}/config/Copernicus/cso-s5p-no2.rc', \
rcbase='cso.s5p.no2.inquire-table-dataspace'
!~ create plot of available versions:
cso.s5p.no2.inquire.plot.class : cso.CSO_Inquire_Plot
cso.s5p.no2.inquire.plot.args : '${PWD}/config/EMEP/cso-s5p-no2.rc', \
cso.s5p.no2.inquire.plot.args : '${PWD}/config/Copernicus/cso-s5p-no2.rc', \
rcbase='cso.s5p.no2.inquire-plot'
.. Label between '.. _' and ':' ; use :ref:`text <label>` for reference
.. _s5p-no2-convert:
......
......@@ -77,6 +77,36 @@ Acknowledgements
We hereby thank D. Griffin and V. Fioletov for their valuable input.
CSO processing
==============
*(See* :ref:`tutorial` *chapter for introduction to CSO scripts and configuration)*
An example configuration of the CSO processing of the S5p/SO2 data is available via
the following settings:
* `config/Copernicus/cso.rc <../../../config/Copernicus/cso.rc>`_
Top-level settings that configure the job-tree with various sub-tasks.
This is a generic file that could be used for multiple S5 products,
edit it to select the SO2 processing.
* `config/Copernicus/cso-user-settings.rc <../../../config/Copernicus/cso-user-settings.rc>`_
User-specific settings such as the work directory.
* `config/Copernicus/cso-s5p-so2.rc <../../../config/Copernicus/cso-s5p-so2.rc>`_
Specific settings for SO2 product.
Start the job-tree using::
./bin/cso config/Copernicus/cso.rc
Selected sub-steps in the processing are described below.
.. Label between '.. _' and ':' ; use :ref:`text <label>` for reference
.. _s5p-so2-inquire:
......@@ -84,8 +114,8 @@ Inquire Sentinel-5p/SO2 archive
===============================
S5p/SO2 observations are available from the
`Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_;
see the :ref:`cso-scihub` module for a detailed description.
`Copernicus DataSpace <https://dataspace.copernicus.eu/>`_;
see the :ref:`cso-dataspace` module for a detailed description.
Data is available for different processing streams, each identified by a 4-character key:
......@@ -93,13 +123,12 @@ Data is available for different processing streams, each identified by a 4-chara
* ``OFFL`` : `Offline`, available within weeks after observations;
* ``RPRO`` : re-processing of all previously made observations;
The portals provide data files created with the same retrieval algorithm,
but with different processor versions.
The portal provides data files created with different processor versions.
It is therefore necessary to first inquire both archives to see which data is available where,
and what the version numbers are.
The :py:class:`cso_scihub.CSO_SciHub_Inquire` class is available to inquire the
*Copernicus Open Access Hub*. The settings used by this class allow selection
The :py:class:`CSO_DataSpace_Inquire <cso_dataspace.CSO_DataSpace_Inquire>` class is available to inquire the
*Copernicus DataSpace*. The settings used by this class allow selection
on for example time range and intersection area.
The result is a csv file which with columns for keywords such orbit number and processor version,
as well as the filename of the data and the url that should be used to actually download the data::
......@@ -123,17 +152,19 @@ To visualize what is available from the various portals, the
The jobtree configuration to inquire the portals and create the overview figure could look like::
! single step:
cso.s5p.so2.inquire-scihub.class : utopya.UtopyaJobStep
cso.s5p.so2.inquire.class : utopya.UtopyaJobStep
! two tasks:
cso.s5p.so2.inquire-scihub.tasks : table
!~ inquire available files:
cso.s5p.so2.inquire-scihub.table.class : cso.CSO_SciHub_Inquire
cso.s5p.so2.inquire-scihub.table.args : '${PWD}/config/ESA-S5p/cso-s5p-so2.rc', \
rcbase='cso.s5p.so2.inquire-s5phub-table'
cso.s5p.so2.inquire.tasks : table-dataspace plot
!~ inquire files available on DataSpace:
cso.s5p.so2.inquire.table-dataspace.class : cso.CSO_DataSpace_Inquire
cso.s5p.so2.inquire.table-dataspace.args : '${PWD}/config/Copernicus/cso-s5p-so2.rc', \
rcbase='cso.s5p.so2.inquire-table-dataspace'
!~ create plot of available versions:
cso.s5p.so2.inquire-scihub.plot.class : cso.CSO_SciHub_InquirePlot
cso.s5p.so2.inquire-scihub.plot.args : '${PWD}/config/ESA-S5p/cso-s5p-so2.rc', \
rcbase='cso.s5p.so2.inquire-s5phub-plot'
cso.s5p.so2.inquire.plot.class : cso.CSO_Inquire_Plot
cso.s5p.so2.inquire.plot.args : '${PWD}/config/Copernicus/cso-s5p-so2.rc', \
rcbase='cso.s5p.so2.inquire-plot'
......
......@@ -34,7 +34,7 @@ Job tree
The configuration defines a series of jobs to be created and started.
For example, the following jobs might be defined to process Sentinel-5p data::
cso.tutorial.inquire-scihub
cso.tutorial.inquire
cso.tutorial.convert
cso.tutorial.listing
cso.tutorial.catalogue
......@@ -42,14 +42,14 @@ For example, the following jobs might be defined to process Sentinel-5p data::
This list is actually defined as tree, using lists in which the elements could be a list too::
cso # tree with element "tutorial"
.tutorial # tree with elements "inquire-scihub", "convert", "listing", and "catalogue"
.inquire-scihub # step
.tutorial # tree with elements "inquire", "convert", "listing", and "catalogue"
.inquire # step
.convert # step
.listing # step
.catalogue # step
For each element in the tree, the configuration file should specify
the name of a python class that takes care of the job creating..
the name of a python class that takes care of the job creation.
If the element is a *tree*, use the :py:class:`utopya.UtopyaJobTree <utopya_jobtree.UtopyaJobTree>` class,
and add a line to specify the names of the sub-elements.
For the main ``cso`` job, this looks like::
......@@ -87,7 +87,7 @@ it will only print a message::
! dummy task:
cso.tutorial.inquire.task.class : utopya.UtopyaJobTask
cso.tutorial.inquire.task.args : msg='Inquire SciHub ...'
cso.tutorial.inquire.task.args : msg='Inquire archive ...'
Running a single job step
......@@ -103,7 +103,7 @@ Job files
The job tree defined above will create a series of job files in a work directory,
with each job in a separate sub directory::
/work/yourname/CSO-Tests/cso/cso.jb
/work/yourname/CSO-Tutorial/cso/cso.jb
cso/tutorial/cso.s5p.jb
cso/tutorial/inquire/cso.tutorial.inquire.jb
......@@ -111,7 +111,7 @@ The location of the work directories is specified in the settings using::
! work directory for jobs;
! here path including subdirectories for job name elements:
*.workdir : /work/yourname/CSO-Tests/__NAME2PATH__
*.workdir : /work/yourname/CSO-Tutorial/__NAME2PATH__
The default settings create simple job files that will run in foreground,
perform a specified task, and submit the next job in the tree.
......@@ -130,15 +130,10 @@ Step 1 - Inquire S5p archive
*(See* :ref:`s5p-no2-inquire` *section for full description of S5p/NO2 inquireries)*
The ``cso.tutorial.inquire`` job is configured in ``rc/tutorial.rc`` to crawl
the Copernicus SciHub archive for available Sentinel-5p NO2 observations.
.. IMPORTANT::
Access to the `Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_
requires a (not personal) login and password; see also the :ref:`SciHub-OpenHub` section.
Add the following login/password setting to your ``~/.netrc`` file::
machine s5phub.copernicus.eu login s5pguest password s5pguest
The ``cso.tutorial.inquire`` job is configured in
`config/tutorial/tutorial.rc <../../../config/tutorial/tutorial.rc>`_
to crawl the `Copernicus DataSpace <https://dataspace.copernicus.eu/>`_ archive
for available Sentinel-5p NO2 observations.
To run the inquire job only, limit the element list of the ``cso.tutorial`` job to::
......@@ -150,28 +145,35 @@ observations::
! single step:
cso.tutorial.inquire.class : utopya.UtopyaJobStep
! two tasks:
cso.tutorial.inquire.tasks : table-scihub plot
!~ inquire available files:
cso.tutorial.inquire.table-scihub.class : cso.CSO_SciHub_Inquire
cso.tutorial.inquire.table-scihub.args : '${PWD}/config/tutorial/tutorial.rc', \
rcbase='cso.tutorial.inquire-table-scihub'
!~ create plot of available versions:
cso.tutorial.inquire.tasks : table-dataspace plot
!~ task: inquire available files:
cso.tutorial.inquire.table-dataspace.class : cso.CSO_DataSpace_Inquire
cso.tutorial.inquire.table-dataspace.args : '${__filename__}', \
rcbase='cso.tutorial.inquire-table-dataspace'
!~ task: create plot of available versions:
cso.tutorial.inquire.plot.class : cso.CSO_Inquire_Plot
cso.tutorial.inquire.plot.args : '${PWD}/config/tutorial/tutorial.rc', \
cso.tutorial.inquire.plot.args : '${__filename__}', \
rcbase='cso.tutorial.inquire-plot'
The first setting defines that this is single job step that should do some work.
The ``tasks`` list defines keywords for the two tasks to be performed.
* For the ``table-scihub`` task, define that the :py:class:`CSO_SciHub_Inquire <cso_scihub.CSO_SciHub_Inquire>` class
* For the ``table-dataspace`` task, define that the :py:class:`CSO_DataSpace_Inquire <cso_dataspace.CSO_DataSpace_Inquire>` class
should be used to do the work; the class is accessible from the :py:mod:`cso` module
(implemented in``py/cso.py``).
The arguments that initialize the class specify the name of an rcfile with settings
(``tutorial.rc``) and that the settings start with keywords ``'cso.tutorial.inquire-table-scihub'``.
The first arguments that initialize the class specifies the name of an rcfile with settings;
in this case these settings are in the same file (``tutorial.rc``) that defines the job-tree,
and therefore the special keyword ``%{__filename__}`` could be used.
The second argument ``rcbase`` is optional and specifies that the settings start with keywords
``'cso.tutorial.inquire-table-dataspace'``.
* Similar for the ``plot`` task, the settings define that the
:py:class:`CSO_Inquire_Plot <cso_scihub.CSO_Inquire_Plot>` class should be used to do the work.
:py:class:`CSO_Inquire_Plot <cso_inquire.CSO_Inquire_Plot>` class should be used to do the work.
The tutorial settings will inquire the time range 2018-2023.
......@@ -180,10 +182,10 @@ the work directories.
This is specified with a `user defined` keyword, here chosen to start with '``my.``'::
! base location for work directories:
my.work : /work/${USER}/CSO-Tests
my.work : /work/yourname/CSO-Tutorial
This `user defined` key is for example used to specify the work directory of the
``cso.tutorial.inquire-scihub`` (and other) jobs::
``cso.tutorial.inquire`` (and other) jobs::
! work directory for jobs;
! here path including subdirectories for job name elements:
......@@ -192,26 +194,51 @@ This `user defined` key is for example used to specify the work directory of the
This tells that the work directory of the job should include the jobname expanded
as subdirectories. For this example, the full path becomes::
/work/yourname/CSO-Tests/cso/tutorial/inquire/
/work/yourname/CSO-Tutorial/cso/tutorial/inquire/
The base of the work directories is also used to specify where the inquired table file should be stored::
! output table, date of today:
cso.tutorial.inquire-s5phub-table.output.file : ${my.work}/Copernicus/Copernicus_S5p_NO2_scihub__%Y-%m-%d.csv
cso.tutorial.inquire-dataspace-table.output.file : ${my.work}/Copernicus/Copernicus_S5p_NO2_dataspace__%Y-%m-%d.csv
The created table is then for example::
/work/yourname/CSO-Tests/Copernicus/Copernicus_S5p_NO2_scihub__2023-08-09.csv
/work/yourname/tCSO-Tutorial/Copernicus/Copernicus_S5p_NO2_dataspace__2023-08-09.csv
If not already done, run the ``cso`` script with the tutorial settings::
./bin/cso config/tutorial/tutorial.rc
It could take a long time time to inquire the full time period!
Be patient, or limit the time range in the settings ...
When the inquiry is finished, check if indeed the table with available orbits is created.
To visualize what is available, the :py:class:`cso_inquire.CSO_DataSpace_InquirePlot` class
should have created an overview figure next to the table file::
To visualize what is available from the various portals, the
:py:class:`cso_scihub.CSO_SciHub_InquirePlot` could be used to create an overview figure.
The figure file looks like:
/work/yourname/tCSO-Tutorial/Copernicus/Copernicus_S5p_NO2_dataspace__2023-08-09.png
The figure should look like:
.. figure:: figs/NO2/Copernicus_S5p_NO2.png
:scale: 50 %
:align: center
:alt: Overview of available NO2 processings on SciHub.
:alt: Overview of available NO2 processings on DataSpace.
For the same orbit, multiple data files could be available.
A single S5p data file is uniquely identified by:
* processor version ``x.y.z``
* processing:
* ``NRTI`` : *Near Real Time*, processed within hours after observation;
* ``OFFL`` : *Offline* data, processed within a few weeks after observations;
* ``RPRO`` : *Reproduced* data, processed long after observations using latest processor version.
The collection numbers ``01``, ``02``, etc are used to identify a single timeseries
of the entire data set.
Step 2 - Convert to CSO format
......@@ -219,9 +246,21 @@ Step 2 - Convert to CSO format
*(See* :ref:`s5p-no2-convert` *section for full description of S5p/NO2 conversion)*
The ``cso.tutorial.convert`` job is configured to convert the downloaded orbit files into a common format.
The ``cso.tutorial.convert`` job is configured to convert the original S5p/NO2 data
into a common format. If the original data is not present yet, it is downloaded.
.. IMPORTANT::
Downloading data from the *Copernicus DataSpace* requires a personal login and password.
Add the login/password setting to your ``~/.netrc`` file::
machine zipper.dataspace.copernicus.eu login Your.Name@institute.org password ***********
See also the :ref:`dataspace-account` section in the descrption of the
:py:mod:`cso_dataspace` module.
The conversion includes filter options to selected only pixels within a certain domain, and with
some minimum quality flag, etc; this could strongly limit the data volume.
some minimum quality flag; this will strongly limit the data volume.
It is not necessary to keep the original data, eventually it could be downloaded again when needed.
To run the conversion job only, limit in ``rc/tutorial.rc`` the element list of the ``cso.tutorial`` job to::
......@@ -233,17 +272,18 @@ The conversion job is configured with::
cso.tutorial.convert.class : utopya.UtopyaJobStep
! conversion task:
cso.tutorial.convert.task.class : cso.CSO_S5p_Convert
cso.tutorial.convert.task.args : '${PWD}/config/tutorial/tutorial.rc', \
cso.tutorial.convert.task.args : '${__filename__}', \
rcbase='cso.tutorial.convert'
The conversion is thus done using the :py:class:`CSO_S5p_Convert <cso_s5p.CSO_S5p_Convert>` class
that can be accessed from the :py:mod:`cso` module.
The arguments that initialize the class specify the name of an rcfile with settings
(``tutorial.rc``) and that the settings start with keywords ``'cso.tutorial.convert'``.
The arguments that initialize the class specify the name of a rcfile with settings
(in this case the ``tutorial.rc`` that holds the job-tree definition)
and that the settings start with keywords ``'cso.tutorial.convert'``.
The result of the conversion is a set of files holding the selected pixels per orbit::
/work/yourname/CSO-Tests/CSO-data/S5p/RPRO/NO2/CAMS/2018/06/S5p_RPRO_NO2_03272.nc
/work/yourname/CSO-Tutorial/CSO-data/S5p/RPRO/NO2/CAMS/2018/06/S5p_RPRO_NO2_03272.nc
S5p_RPRO_NO2_03273.nc
:
......@@ -271,7 +311,7 @@ The following is a demo Python code that creates an S5p/NO2 map::
import cso
# sample file:
filename = '/work/yourname/CSO-Tests/CSO-data/S5p/RPRO/NO2/CAMS/2018/06/S5p_RPRO_NO2_03278.nc'
filename = '/work/yourname/CSO-Tutorial/CSO-data/S5p/RPRO/NO2/CAMS/2018/06/S5p_RPRO_NO2_03278.nc'
# read:
orb = cso.CSO_File( filename=filename )
......@@ -327,7 +367,7 @@ This file will be used by the observation operator to selects orbits with pixels
a desired time range.
Also the *catalogue* creator described below uses the listing file to select orbits.
The configuration of describes the name of the listing file to be created,
The configuration of the job specifies the name of the listing file to be created,
and the directories to be scanned for orbit files.
......@@ -337,10 +377,10 @@ Step 5 - Catalogue of figures
*(See* :ref:`s5p-no2-catalogue` *section for main description)*
For a first impression of how the downloaded and converted satellite data looks like,
the ``cso.tutorial.catalogue`` job can be used. This will figures out of the converted files,
the ``cso.tutorial.catalogue`` job can be used. This will create figures out of the converted files,
in particular maps that show values on the track.
To run the catalogue job only, limit in ``rc/tutorial.rc`` the element list of the ``cso.tutorial`` job to::
To run the catalogue job only, limit in ``tutorial.rc`` the element list of the ``cso.tutorial`` job to::
cso.tutorial.elements : catalogue
......@@ -355,25 +395,26 @@ This is configured using::
! catalogue creation task:
cso.tutorial.catalogue.figs.class : cso.CSO_Catalogue
cso.tutorial.catalogue.figs.args : '${PWD}/config/tutorial/tutorial.rc', \
cso.tutorial.catalogue.figs.args : '${__filename__}', \
rcbase='cso.tutorial.catalogue'
! indexer task:
cso.tutorial.catalogue.index.class : utopya.Indexer
cso.tutorial.catalogue.index.args : '${PWD}/config/tutorial/tutorial.rc', \
cso.tutorial.catalogue.index.args : '${__filename__}', \
rcbase='cso.tutorial.catalogue-index'
The ``figs`` task that creates the figures uses the :py:class:`CSO_Catalogue <.cso_catalogue.CSO_Catalogue>` class
that can be accessed from the :py:mod:`cso` module.
The arguments that initialize the class specify the name of an rcfile with settings
(``tutorial.rc``) and that the settings start with keywords ``'cso.tutorial.catalogue'``.
(in this case the ``tutorial.rc`` that holds the job-tree definition)
and that the settings start with keywords ``'cso.tutorial.catalogue'``.
The configuration describes where to find a *listing* file with orbits,
which variables should be plot, the colorbar properties, etc.
The names of the created figures is composed from the base name of the converted files
The names of the created figures are composed from the base name of the converted files
and the variable that is plotted::
/work/yourname/CSO-Tests/CSO-data-catalogue/2018/06/01/S5p_RPRO_NO2_03278__vcd.png
/work/yourname/CSO-Tutorial/CSO-data-catalogue/2018/06/01/S5p_RPRO_NO2_03278__vcd.png
S5p_RPRO_NO2_03278__qa_value.png
:
......@@ -393,7 +434,7 @@ The arguments that initialize the class specify the name of an rcfile with setti
When succesful, the index creator displays an url that could be loaded in a browser::
Browse to:
file:///work/yourname/CSO-Tests/CSO-data-catalogue/index.html
file:///work/yourname/CSO-Tutorial/CSO-data-catalogue/S5p/NO2/CAMS/index.html
.. figure:: figs/NO2/CSO_NO2_catalogue.png
:scale: 50 %
......@@ -430,7 +471,7 @@ For testing, copy the entire sub directory to a work directory;
configuration assumes a location in the work directory of the pre-processor where
the converted orbit files are::
cd /work/yourname/CSO-Tests
cd /work/yourname/CSO-Tutorial
cp -r ~/CSO/oper CSO-oper
cd CSO-oper
......@@ -569,11 +610,11 @@ The job for this is ``cso.tutorial.catalogue``, which is configured using::
cso.tutorial.sim-catalogue.tasks : figs index
! catalogue creation task:
cso.tutorial.sim-catalogue.figs.class : cso.CSO_SimCatalogue
cso.tutorial.sim-catalogue.figs.args : '${PWD}/config/tutorial/tutorial.rc', \
cso.tutorial.sim-catalogue.figs.args : '${__filename__}', \
rcbase='cso.tutorial.sim-catalogue'
! indexer task:
cso.tutorial.sim-catalogue.index.class : utopya.Indexer
cso.tutorial.sim-catalogue.index.args : '${PWD}/config/tutorial/tutorial.rc', \
cso.tutorial.sim-catalogue.index.args : '${__filename__}', \
rcbase='cso.tutorial.sim-catalogue-index'
The ``figs`` task that creates the figures is thus done using the
......@@ -604,7 +645,7 @@ For these variables it is necessary to define whether they should be read from a
The names of the created figures is composed from the base name of the converted files
and the variable that is plotted::
/work/yourname/CSO-Tests/CSO-oper/sim-catalogue/2018/06/01/S5p_RPRO_NO2_20180601_1100_yr.png
/work/yourname/CSO-Tutorial/CSO-oper/sim-catalogue/2018/06/01/S5p_RPRO_NO2_20180601_1100_yr.png
S5p_RPRO_NO2_20180601_1100_ys.png
:
......@@ -625,7 +666,7 @@ The arguments that initialize the class specify the name of an rcfile with setti
When successful, the index creator displays an url that could be loaded in a browser::
Browse to:
file:///work/yourname/CSO-Tests/CSO-oper/sim-catalogue/index.html
file:///work/yourname/CSO-Tutorial/CSO-oper/sim-catalogue/index.html
.. figure:: figs/NO2/CSO_NO2_sim-catalogue.png
:scale: 50 %
......@@ -733,7 +774,7 @@ The configuration of the gridding job is::
cso.tutorial.sim-gridded.class : utopya.UtopyaJobStep
! catalogue creation task:
cso.tutorial.sim-gridded.task.class : cso.CSO_GriddedAverage
cso.tutorial.sim-gridded.task.args : '${PWD}/config/tutorial/tutorial.rc', \
cso.tutorial.sim-gridded.task.args : '${__filename__}', \
rcbase='cso.tutorial.gridded'
It is also possible to create a catalogue of the gridded fields.
......
......@@ -14,6 +14,9 @@
! 2023-08, Arjo Segers
! Replaced `where` constructs by loops after memory errors on some systems.
!
! 2023-11, Arjo Segers
! Close files also in 'read' mode ...
!
!###############################################################################
!
#define TRACEBACK write (csol,'("in ",a," (",a,", line",i5,")")') rname, __FILE__, __LINE__; call csoErr
......@@ -2530,7 +2533,6 @@ contains
subroutine NcFile_Done( self, status )
use NetCDF , only : NF90_Close
use CSO_Comm, only : csoc
! --- in/out ---------------------------------
......@@ -2549,12 +2551,13 @@ contains
! switch:
select case ( self%rwmode )
! read, open:
case ( 'r', 'o' )
! open:
case ( 'o' )
! open/close is managed externally,
! nothing to be done
! write:
case ( 'w' )
! read, write:
case ( 'r', 'w' )
! written on root...
if ( csoc%root ) then
......
......@@ -36,7 +36,7 @@ Actual implementations can be found in submodules:
pymod-cso_file
pymod-cso_inquire
pymod-cso_scihub
pymod-cso_dataspace
pymod-cso_pal
pymod-cso_s5p
pymod-cso_s5p_superobs
......@@ -63,7 +63,7 @@ and are defined according to the following hierchy:
* :py:class:`.UtopyaRc`
* :py:class:`.CSO_Inquire_Plot`
* :py:class:`.CSO_SciHub_Inquire`
* :py:class:`.CSO_DataSpace_Inquire`
* :py:class:`.CSO_PAL_Inquire`
* :py:class:`.CSO_S5p_Convert`
* :py:class:`.CSO_S5p_Listing`
......@@ -112,7 +112,7 @@ and are defined according to the following hierchy:
from cso_file import *
from cso_inquire import *
from cso_scihub import *
from cso_dataspace import *
from cso_pal import *
from cso_s5p import *
from cso_s5p_superobs import *
......
#
# Changes
#
# 2023-10, Arjo Segers
# Tools to access Copernicus DataSpace.
#
########################################################################
###
### help
###
########################################################################
"""
.. _cso-dataspace:
*************
CSO DataSpace
*************
The ``cso_dataspace`` module provides classes for accessing data from the
`Copernicus DataSpace <https://dataspace.copernicus.eu/>`_.
To browse through the data, use the `Browser <https://dataspace.copernicus.eu/browser/>`_.
.. _dataspace-account:
Account setup
=============
To be able to download data from the *DataSpace*, first
`Register and create an account <https://documentation.dataspace.copernicus.eu/Registration.html>`_.
On a Linux system, login/passwords for websites can be stored in the users ``.netrc`` file
in the home directory. Create this file if it does not exist yet, and add the following
line with the login name of the account (your email) and the chosen password::
machine zipper.dataspace.copernicus.eu login Your.Name@institute.org password ***********
The file should be readible and writable for you only::
chmod 400 ~/.netrc
.. _dataspace-api:
DataSpace API's
===============
The *DataSpace* could be access with a number of different
`APIs <https://documentation.dataspace.copernicus.eu/APIs.html>`_.
Currently the `OpenSearch API <https://documentation.dataspace.copernicus.eu/APIs/OpenSearch.html>`_
is used as that was the first that worked as needed.
In future the `STAC API <https://stacspec.org/>`_ might be used,
as this is becoming more and more the standard in the Earth Observation community.
Within CSO it was already used by for example :ref:`pal-api`, but could not get working yet
for the *DataSpace*.
See the `STAC product catalog <https://documentation.dataspace.copernicus.eu/APIs/STAC.html>`_
for more information.
Class hierchy
=============
The classes and are defined according to the following hierchy:
* :py:class:`.UtopyaRc`
* :py:class:`.CSO_DataSpace_Inquire`
* :py:class:`CSO_DataSpace_DownloadFile`
* :py:class:`NullAuth`
Classes
=======
"""
########################################################################
###
### modules
###
########################################################################
# modules:
import logging
import requests
# tools:
import utopya
########################################################################
###
### OpenSearch inquire
###
########################################################################
class CSO_DataSpace_Inquire(utopya.UtopyaRc):
"""
Inquire available Sentinel data from the
`Copernicus DataSpace <https://dataspace.copernicus.eu/>`_.
Before data could be downloaded from the *DataSpace*, setup your :ref:`dataspace-account`.
Currently the `OpenSearch API <https://documentation.dataspace.copernicus.eu/APIs/OpenSearch.html>`_
is used as that was the first that worked as needed;
in future, the `STAC product catalog <https://documentation.dataspace.copernicus.eu/APIs/STAC.html>`_
might be used.
A query is sent to search for products that are available
for a certain time and overlap with a specified region.
The result is a list with orbit files and instructions on how to download them.
In the settings, specify the time range over which files should be downloaded::
<rcbase>.timerange.start : 2018-07-01 00:00
<rcbase>.timerange.end : 2018-07-01 23:59
Specify the base url of the API::
<rcbase>.url : https://finder.creodias.eu/resto/api
Define the collection name with::
<rcbase>.collection : Sentinel5P
Provide a product type::
! product type (always 10 characters!):
<rcbase>.producttype : L2__NO2___
Eventually specify a target area, only orbits with some pixels within the defined box will be downloaded::
! target area, leave empty for globe; format: west,south,east,north
<rcbase>.area :
!<rcbase>.area : -30,30,35,76
The table will also create the url's to download a file;
specifity the template that should be used:
! template for download url given "{product_id}":
<rcbase>.download_url : https://zipper.dataspace.copernicus.eu/odata/v1/Products({product_id})/$value
Name of output csv file::
! output table, here including date of today:
<rcbase>.output.file : ${my.work}/PAL_S5P_NO2_%Y-%m-%d.csv
Example records (with extra whitespace to show the columns)::
orbit;start_time ;end_time ;processing;collection;processor_version;filename ;href
11488;2020-01-01 02:34:16;2020-01-01 04:15:46;RPRO ;03 ;020400 ;S5P_RPRO_L2__CH4____20200101T023416_20200101T041546_11488_03_020400_20221120T003820.nc;https://zipper.dataspace.copernicus.eu/odata/v1/Products(b3f240e6-505d-4cae-97ea-43a8778a318d)/$value
11487;2020-01-01 00:52:46;2020-01-01 02:34:16;RPRO ;03 ;020400 ;S5P_RPRO_L2__CH4____20200101T005246_20200101T023416_11487_03_020400_20221120T003818.nc;https://zipper.dataspace.copernicus.eu/odata/v1/Products(a3d40f81-6c86-44bc-bc4b-457ff069b121)/$value
:
"""
def __init__(self, rcfile, rcbase="", env={}, indent=""):
"""
Inquire oribt files.
"""
# modules:
import sys
import os
import datetime
import calendar
import time
import requests
import pandas
# info ...
logging.info(f"{indent}")
logging.info(f"{indent}** Inquire files available on Copernicus DataSpace")
logging.info(f"{indent}")
# init base object:
utopya.UtopyaRc.__init__(self, rcfile=rcfile, rcbase=rcbase, env=env)
# url of API:
api_url = self.GetSetting("url")
# info ...
logging.info(f"{indent}API url : {api_url}")
# template url for downloads:
download_url = self.GetSetting("download_url")
# info ...
logging.info(f"{indent}download url : {download_url}")
# collection:
collection = self.GetSetting("collection")
# info ...
logging.info(f"{indent}collection : {collection}")
# combine into search url:
search_url = f"{api_url}/collections/{collection}/search.json"
## authorization is done by header dict:.
# headers = { "Authorization" : f"access_token {access_token}" }
# time range:
t1 = self.GetSetting("timerange.start", totype="datetime")
t2 = self.GetSetting("timerange.end", totype="datetime")
# info ...
tfmt = "%Y-%m-%d %H:%M"
logging.info(f"{indent}timerange : [{t1.strftime(tfmt)},{t2.strftime(tfmt)}")
# product type (always 10 characters!):
# L2__NO2___
producttype = self.GetSetting("producttype")
# info ...
logging.info(f"{indent}product type : {producttype}")
# area of interest: west,south:east,north
area = self.GetSetting("area")
# defined?
if len(area) > 0:
# convert from format for "dhusget.sh":
# west,south:east,north
west, south, east, north = map(float, area.replace(":", " ").replace(",", " ").split())
# info ...
logging.info(
f"{indent}area : [{west:.2f},{east:.2f}] x [{south:.2f},{north:.2f}]"
)
# box parameter:
box = f"{west},{east},{south},{north}"
else:
# info ...
logging.info(f"{indent}area : no")
# box parameter:
box = None
# endif
# target file, might include time templates:
output_file__template = self.GetSetting("output.file")
# current time:
output_file = datetime.datetime.now().strftime(output_file__template)
# initialize output table:
output_df = pandas.DataFrame()
# info ...
logging.info(f"{indent}search all items in timerange ...")
# search query could only return a maximum number of records;
# a 'page' of records is requested using a row offset and the number of rows:
row0 = 0
nrow = 100
# initialize search parameters;
# for possible content, see:
# https://documentation.dataspace.copernicus.eu/APIs/OpenSearch.html
params = {}
# fill maximum time range:
tfmt = "%Y-%m-%dT%H:%M:%SZ"
params["startDate"] = t1.strftime(tfmt)
params["completionDate"] = t2.strftime(tfmt)
if box is not None:
params["box"] = box
# endif
# fill product type:
params["productType"] = producttype
# fill paging info:
params["maxRecords"] = nrow
# init counter:
ipage = 0
# loop over pages of query result:
while True:
# increase counter:
ipage += 1
# info ...
logging.info(f"{indent} page {ipage} (entries {row0+1},..,{row0+nrow})")
# fill page number:
params["page"] = ipage
# number of tries:
ntry = 1
maxtry = 5
# repeat a few times if necessary:
while ntry <= maxtry:
# send query to search page; no authorization is needed ...
r = requests.get(search_url, params=params)
# check status, raise error if request failed:
try:
r.raise_for_status()
except Exception as err:
msg = str(err)
logging.error(f"{indent} from query; message received:")
logging.error(f"{indent} %s" % msg)
if ntry == maxtry:
logging.error(f"{indent} tried {ntry} times now, exit ...")
raise Exception
else:
logging.error(f"{indent} wait ..")
time.sleep(10)
logging.error(f"{indent} try again ...")
ntry += 1
continue
# endif
# endtry
# no error, leave:
break
# endwhile
# While testing: save the result as a json file, and load it into a browser.
# This shows a dict with among others the fields:
#
# { ..
# 'features' : [ # list of orbits, in browser named: '0','1',...
# { 'id' : '0f318743-8bb9-55ed-b42d-7721b24f7ede', # download id
# 'properties' : {
# 'title' : "S5P_OFFL_L2__CH4____20220531T224613_20220601T002743_23999_02_020301_20220602T143707.nc",
# ...
# }
# ...
# },
# ...
# ]
# }
#
# save result?
if True:
# targefile:
qfile = "query.json"
# save:
with open(qfile, "w") as f:
f.write(r.text)
# endwith
# endif
# convert response to json dict:
data = r.json()
# check ...
if type(data) != dict:
logging.error(f"request response should be a json dict, found type: {type(data)}")
raise Exception
# endif
# check ...
if "features" not in data.keys():
logging.error(f"element 'features' not found in response")
raise Exception
# endif
# count:
nrec = len(data["features"])
# loop over features:
for feature in data["features"]:
# check ...
if type(feature) != dict:
logging.error(f"feature should be a dict, found type: {type(feature)}")
raise Exception
# endif
# check ...
if "id" not in feature.keys():
logging.error(f"element 'id' not found in feature")
raise Exception
# endif
# get product id:
product_id = feature["id"]
# check ...
if "properties" not in feature.keys():
logging.error(f"element 'properties' not found in feature")
raise Exception
# endif
# check ...
if "title" not in feature["properties"].keys():
logging.error(f"element 'properties/title' not found in feature")
raise Exception
# endif
# get full filename:
filename = feature["properties"]["title"]
#
# S5P_OFFL_L2__NO2____20180701T005930_20180701T024100_03698_01_010002_20180707T022838.nc
# plt proc [product-] [starttime....] [endtime......] orbit cl procrv [prodtime.....]
#
bname = os.path.basename(filename).replace(".nc", "")
# split:
platform_name, processing, rest = bname.split("_", 2)
product_type = rest[0:10]
parts = rest[11:].split("_")
start_time, end_time, orbit, collection, processor_version, prod_time = parts
# convert:
tfmt = "%Y%m%dT%H%M%S"
ts = datetime.datetime.strptime(start_time, tfmt)
te = datetime.datetime.strptime(end_time, tfmt)
# fill download href:
href = download_url.format(product_id=product_id)
# strange, sometimes records seem double ...
# already records present?
if len(output_df) > 0:
# same href already stored?
if href in output_df["href"].values:
## testing ...
# logging.warning(f"ignore double product_id: {product_id}")
# ignore record:
continue
# endif
# endif
# fill record, values should be lists for concatenation below:
rec = {
"orbit": [orbit],
"start_time": [ts],
"end_time": [te],
"processing": [processing],
"collection": [collection],
"processor_version": [processor_version],
"filename": [filename],
"href": [href],
}
# add record:
output_df = pandas.concat((output_df, pandas.DataFrame(rec)), ignore_index=True)
# endfor features
## testing...
# if ipage == 9 :
# logging.warning( f"break after page {ipage} ..." )
# break
## endif
# not a full page? then end is reached ...
if nrec < nrow:
# leave loop over pages:
break
# endif
# increse row offset:
row0 += nrow
# endwhile # pages
# info ..
logging.info("save to: %s ..." % output_file)
# create directory:
dirname = os.path.dirname(output_file)
if len(dirname) > 0:
if not os.path.isdir(dirname):
os.makedirs(dirname)
# endif
# endif
# write:
output_df.to_csv(output_file, sep=";", index=False)
# info ...
logging.info(f"{indent}")
logging.info(f"{indent}** end inquire")
logging.info(f"{indent}")
# enddef __init__
# endclass CSO_DataSpace_Inquire
########################################################################
###
### OpenSearch download
###
########################################################################
class NullAuth(requests.auth.AuthBase):
"""
Force requests to ignore the ``~/.netrc`` file.
Some sites do not support regular authentication, but we still
want to store credentials in the ``~/.netrc`` file and submit them
as form elements. Without this, requests would otherwise use the
``~/.netrc`` which leads, on some sites, to a 401 error.
Use with::
requests.get( url, auth=NullAuth() )
Source:
`<https://github.com/psf/requests/issues/2773#issuecomment-174312831>`_
"""
def __call__(self, r):
return r
# enddef __call__
# endclass NullAuth
# *
class CSO_DataSpace_DownloadFile(object):
"""
Download single file from *Copernicus DataSpace*.
Arguments:
* ``href`` : download url, for example::
https://zipper.dataspace.copernicus.eu/odata/v1/Products('d483baa0-3a61-4985-aa0c-5642a83c9214')/$value
* ``output_file`` : target file
Optional arguments:
* ``maxtry`` : number of times to try again if download fails
* ``timeout`` : delay in seconds between requests
"""
def __init__(self, href, output_file, maxtry=10, timeout=60, indent=""):
"""
Download file.
"""
# modules:
import os
import urllib.parse
import requests
import zipfile
import shutil
# tools:
import cso_file
#
# On linux system, login/passwords for websites and ftp can be stored in "~/.netrc" file:
# ---[~/.netrc]-----------------------------------------------
# machine zipper.dataspace.copernicus.eu login Your.Name@institute.org password ***********
# ------------------------------------------------------------
# Retrieve the login/password from ~/.netrc to avoid hardcoding them in a script.
#
# the "get_netrc_auth" function requires base of url as first argument,
# for example: https://zipper.dataspace.copernicus.eu
# extract parts from download url:
p = urllib.parse.urlparse(href)
url = f"{p.scheme}://{p.netloc}"
# get username and password from ~/.netrc file:
try:
username, password = requests.utils.get_netrc_auth(url, raise_errors=True)
except:
logging.error(f"Could not get username and password from ~/.netrc file for url:")
logging.error(f" {url}")
logging.error(f"For the Copernicus DataSpace, the file should contain:")
logging.error(f" machine {p.netloc} login **** password ****")
raise Exception
# endtry
# convert into token for dataspace website following:
# https://documentation.dataspace.copernicus.eu/APIs/Token.html
# fill data fields:
data = {
"client_id": "cdse-public",
"username": username,
"password": password,
"grant_type": "password",
}
# identity server:
domain = "identity.dataspace.copernicus.eu"
url = f"https://{domain}/auth/realms/CDSE/protocol/openid-connect/token"
try:
# send request:
r = requests.post(url, data=data)
# check status, raise error if request failed:
r.raise_for_status()
except requests.exceptions.HTTPError as err:
# info ..
msg = str(err)
logging.error(f"exception from download; message received:")
logging.error(f" {msg}")
# catch known problem ...
if msg.startswith("401 Client Error: Unauthorized for url:"):
logging.error(f"Interpretation: the (username,password) received from")
logging.error(f"your '~/.netrc' file are incorrect.")
logging.error(f"For the Copernicus DataSpace, the file should contain:")
logging.error(f" machine {p.netloc} login **** password ****")
logging.error(f"If the machine was not found, a default might have been received")
raise Exception
else:
raise Exception(f"Access token creation failed; server response: {r.json()}")
# endif
except:
raise Exception(f"Access token creation failed; server response: {r.json()}")
# endtry # get access token
# extract token from response:
access_token = r.json()["access_token"]
# retry loop ..
ntry = 0
while True:
# try to download and save:
try:
# try to download:
try:
# fill authorization token in header:
headers = {"Authorization": f"Bearer {access_token}"}
# ensure that "~/.netrc" is ignored by passing null-authorization,
# otherwise the token in the header is overwritten by a token formed
# from the login/password in the rcfile if that is found:
r = requests.get(href, auth=NullAuth(), headers=headers, timeout=timeout)
# check status, raise error if request failed:
r.raise_for_status()
# product is a zip-file:
product_file = "product.zip"
# info ..
logging.info(f"{indent} write to {product_file} ...")
# write to temporary target first ..
tmpfile = product_file + ".tmp"
# open destination file for binary write:
with open(tmpfile, "wb") as fd:
# prefered way to write content following:
# https://docs.python-requests.org/en/master/user/quickstart/
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
# endfor
# endwith
# rename:
os.rename(tmpfile, product_file)
# open product file:
arch = zipfile.ZipFile(product_file, mode="r")
# loop over members, probably two files in a directory:
# S5P_RPRO_L2__CH4____20200101T005246_etc/S5P_RPRO_L2__CH4____20200101T005246_etc.cdl
# S5P_RPRO_L2__CH4____20200101T005246_etc.nc
for member in arch.namelist():
# ncfile?
if member.endswith(".nc"):
# this should be the target file ..
if os.path.basename(member) != os.path.basename(output_file):
logging.error(f"member of archive file: {member}")
logging.error(f"differs from target name: {output_file}")
raise Exception
# endif
# info ..
logging.info(f"{indent} extract {member} ...")
# extract here, including leading directory:
arch.extract(member)
# info ..
logging.info(f"{indent} store ...")
# create target dir if necessary:
cso_file.CheckDir(output_file)
# move to destination:
os.rename(member, output_file)
# remove directory tree:
shutil.rmtree(os.path.dirname(member))
# only one file in package; leave loop over members
break
# endif
# endfor # members
# info ..
logging.info(f"{indent} remove product file ...")
# remove package:
os.remove(product_file)
except requests.exceptions.HTTPError as err:
# info ..
msg = str(err)
logging.error("exception from download; message received:")
logging.error(" %s" % msg)
except MemoryError as err:
logging.error("memory error from download; increase resources?")
# quit with error:
raise
except Exception as err:
# info ..
logging.error("from download; message received:")
logging.error(" %s" % str(err))
# quit with error:
raise
# endtry
# error from download or save:
except:
# increase counter:
ntry += 1
# switch:
if ntry == maxtry:
logging.warning(f"{indent} tried {maxtry} times ...")
raise Exception
else:
logging.warning(f"{indent} exception from download; try again ...")
continue # while-loop
# endif
# endtry
# leave retry loop,
# either because download was ok,
# or because maximum number of retries was reached:
break
# endwhile # retry
# enddef __init__
# endclass CSO_DataSpace_DownloadFile
########################################################################
###
### end
###
########################################################################
......@@ -14,6 +14,9 @@
# 2023-08, Arjo Segers
# Reformatted using 'black'.
#
# 2023-11, Arjo Segers
# Added "CheckDir" method.
#
########################################################################
###
......@@ -53,6 +56,34 @@ import logging
########################################################################
def CheckDir(filename):
"""
Check if ``filename`` has a directory path;
if so, create that directory if it does not exist yet.
"""
# modules:
import os
# directory name, could be empty:
dname = os.path.dirname(filename)
# directory defined?
if len(dname) > 0:
# not present yet?
if not os.path.isdir(dname):
# create including subdirs:
os.makedirs(dname)
# endif # dname present
# endif # dname defined
# enddef CheckDir
# *
def Pack_DataArray(da, dtype="i2"):
"""
......
......@@ -532,8 +532,10 @@ class CSO_GriddedAverage(utopya.UtopyaRc):
datafiles.sort()
# info ..
logging.info(indent + " found %i file(s) matching: %s"
% (len(datafiles), infile_curr) )
logging.info(
indent
+ " found %i file(s) matching: %s" % (len(datafiles), infile_curr)
)
# endif # listing or filenames
......
......@@ -67,28 +67,29 @@ import utopya
class CSO_Inquire_Plot(utopya.UtopyaRc):
"""
Create plot of processing version versus time to indicate the available orbits in the SciHub archive.
Create plot of data version versus time to indicate the available orbits.
The information on orbits is taken from a csv table created by :py:class:`CSO_SciHub_Inquire` class.
The information on orbits is taken from a csv table created by for example
the :py:class:`CSO_DataSpace_Inquire` class.
Specifify the name of the table file in the settings::
! listing file:
cso.tutorial.inquire-s5phub-plot.file : ${my.work}/Copernicus/Copernicus_S5p_NO2_s5phub_%Y-%m-%d.csv
<rcbase>.file : ${my.work}/Copernicus/Copernicus_S5p_NO2_%Y-%m-%d.csv
The date templates are by default filled for the current day.
Alternatively, specify an explicit date::
!~ specify dates ("yyyy-mm-dd") to use historic table:
cso.tutorial.inquire-s5phub-plot.filedate : 2022-01-28
<rcbase>.filedate : 2022-01-28
The plot could also be created by combining multiple tables;
use a semi-colon to seperate the file names (and eventually the dates)::
! listing files:
cso.tutorial.inquire-s5phub-plot.file : ${my.work}/Copernicus/Copernicus_S5p_NO2_s5phub_%Y-%m-%d.csv ; \\
<rcbase>.file : ${my.work}/Copernicus/Copernicus_S5p_NO2_%Y-%m-%d.csv ; \\
${my.work}/Copernicus/Copernicus_S5p_NO2_pal_%Y-%m-%d.csv
!~ specify dates ("yyyy-mm-dd") to use historic tables:
!cso.tutorial.inquire-s5phub-plot.filedate : 2022-01-28 ; 2022-01-28
!<rcbase>.filedate : 2022-01-28 ; 2022-01-28
The created plot shows a time line and on the vertical ax the processor versions;
a bar indicates when a certain version was used to process orbits:
......@@ -108,7 +109,7 @@ class CSO_Inquire_Plot(utopya.UtopyaRc):
The following flag is used to ensure that the plot is renewed::
! renew existing plots?
cso.tutorial.inquire-s5phub-plot.renew : True
<rcbase>.renew : True
"""
......
......@@ -107,7 +107,7 @@ class CSO_PAL_Inquire(utopya.UtopyaRc):
Name of output csv file::
! output table, date of today:
cso.s5p.no2.inquire-s5phub.output.file : ${my.work}/PAL_S5P_NO2_%Y-%m-%d.csv
<rcbase>.output.file : ${my.work}/PAL_S5P_NO2_%Y-%m-%d.csv
Example records::
......@@ -238,8 +238,14 @@ class CSO_PAL_Inquire(utopya.UtopyaRc):
platform_name, rest = bname.split("_", 1)
processing = rest[0:4]
product_type = rest[5:15]
(start_time,end_time,orbit,collection,\
processor_version,production_time) = rest[16:].split("_")
(
start_time,
end_time,
orbit,
collection,
processor_version,
production_time,
) = rest[16:].split("_")
# convert:
tfmt = "%Y%m%dT%H%M%S"
......
......@@ -754,8 +754,9 @@ class ColorbarFigure(Figure):
# endif
# get red/green/blue arrays for extensions:
(red_under, green_under, blue_under) = \
matplotlib.colors.colorConverter.to_rgb(color_under)
(red_under, green_under, blue_under) = matplotlib.colors.colorConverter.to_rgb(
color_under
)
red_over, green_over, blue_over = matplotlib.colors.colorConverter.to_rgb(color_over)
# initialise color dictionary:
......@@ -1701,8 +1702,9 @@ def mid2corners(xx):
# *
def GetGrid( shp, xx=None, yy=None, x=None, y=None,
xm=None, ym=None, xxm=None, yym=None, domain=None):
def GetGrid(
shp, xx=None, yy=None, x=None, y=None, xm=None, ym=None, xxm=None, yym=None, domain=None
):
"""
Return 2D grid arrays with corner points.
......
......@@ -2118,8 +2118,10 @@ class CSO_Catalogue_RegionsTimeSeries(cso_catalogue.CSO_CatalogueBase):
# store:
if (len(reg_used) == 0) or (reg_code not in reg_used["code"].values):
reg_used = pandas.concat(
[ reg_used,
pandas.DataFrame({"code": reg_code, "name": reg_name}), ],
[
reg_used,
pandas.DataFrame({"code": reg_code, "name": reg_name}),
],
ignore_index=True,
)
# endif
......@@ -2463,15 +2465,20 @@ class CSO_Statistics_RegionsTables(utopya.UtopyaRc):
rbias_label = "(sim-obs)/obs"
# add record:
df = pandas.concat(
[ df,
pandas.DataFrame( {
[
df,
pandas.DataFrame(
{
"iso2": [reg_code2],
"iso3": [reg_code],
"name": [reg_name],
"time": [tlab],
obs_label: [obs],
sim_label: [sim],
rbias_label: [rbias], } ), ],
rbias_label: [rbias],
}
),
],
ignore_index=True,
)
......
......@@ -18,6 +18,10 @@
# 2023-08, Arjo Segers
# Reformatted using 'black'.
#
# 2023-09, Arjo Segers
# Fixed bug in definition of listing file dates from rcfile settings.
#
#
########################################################################
###
......@@ -692,6 +696,15 @@ class CSO_S5p_File(cso_file.CSO_File):
* ``square`` : create a variable as the square of the input; requires a ``.from`` setting.
Optionally swap layers, for example to have profiles in upward direction
(surface to top) rather than downward (top to bottom)::
<rcbase>.output.var.longitude.swap_layers : True
Optionally provide a target data type; by default original data type in the input file is used::
<rcbase>.output.var.longitude.dtype : f4
Optionally provide target units too.
In the (unlikely) case that the original variable has no ``units`` attribute,
this setting is required to define the (assumed) units.
......@@ -2030,7 +2043,7 @@ class CSO_S5p_Convert(utopya.UtopyaRc):
<rcbase>.timerange.end : 2018-06-03 23:59
The input files are searched in a table created by an *inquire* class,
for example :py:class:`CSO_SciHub_Inquire <cso_scihub.CSO_SciHub_Inquire>`
for example :py:class:`CSO_DataSpace_Inquire <cso_dataspace.CSO_DataSpace_Inquire>`
or :py:class:`CSO_PAL_Inquire <cso_pal.CSO_PAL_Inquire>`.
These have scanned the archives to examine which processings and versions are available,
and stored the result in a csv file.
......@@ -2038,7 +2051,7 @@ class CSO_S5p_Convert(utopya.UtopyaRc):
that is taken from another key::
! listing of available source files,
! created by 'inquire-s5phub' job:
! created by for example 'inquire' job:
<rcbase>.inquire.file : /data/Copernicus/S5p/Copernicus_S5P_NO2_%Y-%m-%d.csv
!! date used in filename, leave empty for today:
!<rcbase>.inquire.filedate : 2022-01-28
......@@ -2053,7 +2066,7 @@ class CSO_S5p_Convert(utopya.UtopyaRc):
! remove downloaded input files after convert?
<rcbase>.downloads.cleanup : False
The input files keep the same name as used in the SciHub archive, for example::
The input files keep the same name as used in the *DataSpace* archive, for example::
/data/Copernicus/S5P/OFFL/NO2/2018/07/S5P_OFFL_L2__NO2____20180701T005930_20180701T024100_03698_01_010002_20180707T022838.nc
start_time end_time orbit
......@@ -2141,10 +2154,11 @@ class CSO_S5p_Convert(utopya.UtopyaRc):
import datetime
import fnmatch
import pandas
import numpy
# tools:
import cso_file
import cso_scihub
import cso_dataspace
import utopya
# info ...
......@@ -2169,7 +2183,7 @@ class CSO_S5p_Convert(utopya.UtopyaRc):
# inquire tables:
filename__templates = self.GetSetting("inquire.file").split(";")
# time stamp in file?
filedates = self.GetSetting("inquire.filedate", default="")
filedates = self.GetSetting("inquire.filedate", default="").split(";")
if len(filedates) == 0:
filedates = [""] * len(filename__templates)
elif len(filedates) != len(filename__templates):
......@@ -2454,8 +2468,13 @@ class CSO_S5p_Convert(utopya.UtopyaRc):
if not os.path.isfile(input_file):
# info ..
logging.info(" not present yet, download ...")
# check ..
if ("href" not in rec.keys()) or numpy.isnan(rec["href"]):
logging.error(f"cannot download, no 'href' element in record ...")
raise Exception
# endif
# download ...
cso_scihub.CSO_SciHub_DownloadFile(rec["href"], input_file)
cso_dataspace.CSO_DataSpace_DownloadFile(rec["href"], input_file)
# store name:
downloads.append(input_file)
# endif
......@@ -2532,7 +2551,7 @@ class CSO_S5p_Convert(utopya.UtopyaRc):
for key in ["orbit", "processing", "processor_version", "collection"]:
attrs[key] = rec[key]
# endfor
attrs["orbit_file"] = input_file
attrs["orbit_file"] = os.path.basename(input_file)
# write:
csf.Write(
filename=output_filename,
......@@ -2813,6 +2832,265 @@ class CSO_S5p_Listing(utopya.UtopyaRc):
# endclass CSO_S5p_Listing
########################################################################
###
### create listing file for downloaded S5P files
###
########################################################################
class CSO_S5p_Download_Listing(utopya.UtopyaRc):
"""
Create *listing* file for files download from S5P data portals.
A *listing* file contains the names of the converted orbit files,
the time range of pixels in the file, and other information extracted from the filenames:
filename ;mission;processing;product_id;start_time ;end_time ;orbit;collection;processor_version;processing_time
RPRO/CH4/2018/04/S5P_RPRO_L2__CH4____20180430T001851_20180430T020219_02818_01_010301_20190513T141133.nc;S5P ;RPRO ;L2__CH4___;2018-04-30T00:18:51;2018-04-30T02:02:19;02818;01 ;010301 ;2019-05-13T14:11:33
RPRO/CH4/2018/04/S5P_RPRO_L2__CH4____20180430T020021_20180430T034349_02819_01_010301_20190513T135953.nc;S5P ;RPRO ;L2__CH4___;2018-04-30T02:00:21;2018-04-30T03:43:49;02819;01 ;010301 ;2019-05-13T13:59:53
:
This file could be used to scan for available versions and how they were produced.
In the settings, define the name of the file to be created::
! csv file that will hold records per file with:
! - timerange of pixels in file
! - orbit number
! time templates are replaced with todays date
<rcbase>.file : /Scratch/Copernicus/S5p/listing-CH4__%Y-%m-%d.csv
An existing listing file is not replaced,
unless the following flag is set::
! renew table?
<rcbase>.renew : True
Orbit files are searched within a timerange::
<rcbase>.timerange.start : 2018-06-01 00:00
<rcbase>.timerange.end : 2018-06-03 23:59
Specify filename filters to search for orbit files;
the patterns are relative to the basedir of the listing file,
and might contain templates for the time values.
Multiple patterns could be defined; if for a certain orbit number more than one
file is found, the first match is used.
This could be explored to create a listing that combines reprocessed data
with near-real-time data::
<rcbase>.patterns : RPRO/CH4/%Y/%m/S5p_*.nc \
OFFL/CH4/%Y/%m/S5p_*.nc
"""
def __init__(self, rcfile, rcbase="", env={}, indent=""):
"""
Convert data.
"""
# modules:
import os
import datetime
import glob
import collections
# tools:
import cso_file
# info ...
logging.info(indent + "")
logging.info(indent + "** create listing file")
logging.info(indent + "")
# init base object:
utopya.UtopyaRc.__init__(self, rcfile=rcfile, rcbase=rcbase, env=env)
# renew output?
renew = self.GetSetting("renew", totype="bool")
# table file to be written:
lst_file = self.GetSetting("file")
# evaluate current time:
lst_file = datetime.datetime.now().strftime(lst_file)
# create?
if (not os.path.isfile(lst_file)) or renew:
# info ..
logging.info(indent + "create %s ..." % lst_file)
# time range:
t1 = self.GetSetting("timerange.start", totype="datetime")
t2 = self.GetSetting("timerange.end", totype="datetime")
# info ...
tfmt = "%Y-%m-%d %H:%M"
logging.info(indent + " timerange: [%s,%s]" % (t1.strftime(tfmt), t2.strftime(tfmt)))
# base directory:
bdir = os.path.dirname(lst_file)
# create?
if len(bdir) > 0:
if not os.path.isdir(bdir):
os.makedirs(bdir)
# endif
# current directory?
if len(bdir) == 0:
bdir = "."
# info ...
logging.info(indent + " base directory: %s ..." % bdir)
# initiallize for (re)creation:
listing = cso_file.CSO_Listing(lst_file, indent=indent + " ")
# info ...
logging.info(indent + " cleanup records if necessary ...")
# remove entries that do not exist anymore:
listing.Cleanup(indent=indent + " ")
# filename pattern templates:
pattern_templates = self.GetSetting("patterns").split()
# collection of scanned patterns:
patterns = []
# loop over days:
t = t1
while t <= t2:
# loop over patterns:
for pattern_template in pattern_templates:
# expand time values:
pattern = t.strftime(pattern_template)
# skip if already scanned ...
if pattern in patterns:
continue
# store:
patterns.append(pattern)
# info ...
logging.info(indent + "scan %s ..." % pattern)
# list relative to basedir:
cwd = os.getcwd()
os.chdir(bdir)
fnames = glob.glob(pattern)
os.chdir(cwd)
# empty ?
if len(fnames) == 0:
logging.info(indent + " empty ..")
continue
# endif
# sort in place:
fnames.sort()
# loop over files:
for fname in fnames:
# absolute path:
filename = os.path.join(bdir, fname)
# already in table?
if fname in listing:
# info ...
logging.info(indent + " keep entry %s ..." % fname)
else:
# info ...
logging.info(indent + " add entry %s ..." % fname)
# Example filename:
# S5P_RPRO_L2__CH4____20180430T001851_20180430T020219_02818_01_010301_20190513T141133.nc
#
# Some products have incorrect product id (should be 10 characters):
# S5P_OFFL_L2__CHOCHO___20200101T005246_20200101T023416_11487_01_010000_20210128.nc
# The extracted product id is then truncated to 10 characters.
#
# basename:
bname, ext = os.path.splitext(os.path.basename(filename))
# extract:
try:
mission, processing, rest = bname.split("_", 2)
if rest.startswith("L2__CHOCHO__"):
product_id = rest[0:10]
(
start_time,
end_time,
orbit,
collection,
processor_version,
prod_time,
) = rest[13:].split("_")
else:
product_id = rest[0:10]
(
start_time,
end_time,
orbit,
collection,
processor_version,
prod_time,
) = rest[11:].split("_")
# endif
except:
logging.error("could not extract filename parts; expected format:")
logging.error(
" S5P_RPRO_L2__CH4____20180430T001851_20180430T020219_02818_01_010301_20190513T141133"
)
logging.error("found:")
logging.error(" %s" % bname)
raise
# endif
# fill data record:
data = collections.OrderedDict()
tfmt = "%Y%m%dT%H%M%S"
data["start_time"] = datetime.datetime.strptime(start_time, tfmt)
data["end_time"] = datetime.datetime.strptime(end_time, tfmt)
data["mission"] = mission
data["processing"] = processing
data["product_id"] = product_id
data["orbit"] = orbit
data["collection"] = collection
data["processor_version"] = processor_version
if len(prod_time) == 8:
data["processing_time"] = datetime.datetime.strptime(
prod_time, "%Y%m%d"
)
else:
data["processing_time"] = datetime.datetime.strptime(
prod_time, tfmt
)
# endif
# update record:
listing.UpdateRecord(fname, data, indent=indent + " ")
# endif # new record?
# endfor # filenames
# endfor # patterns
## testing ...
# break
# next
t = t + datetime.timedelta(0, 3600)
# endwhile
# save:
listing.Close(indent=indent + " ")
else:
# info ..
logging.info(indent + "keep %s ..." % lst_file)
# endif
# info ...
logging.info(indent + "")
logging.info(indent + "** end listing")
logging.info(indent + "")
# enddef __init__
# endclass CSO_S5p_Download_Listing
########################################################################
###
### end
......
#
# Changes
#
# 2022-09, Arjo Segers
# Updated documentation.
#
# 2023-06, Arjo Segers
# Use "pandas.concat()" instead of "df.append()" to avoid warnings.
#
# 2023-08, Arjo Segers
# Updated logging messages.
#
# 2023-08, Arjo Segers
# Reformatted using 'black'.
#
########################################################################
###
### help
###
########################################################################
"""
.. _cso-scihub:
*********************
``cso_scihub`` module
*********************
The :py:mod:`cso_scihub` module provides classes for accessing data from the
`Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_.
Below first the data hub itself is described, followed by how the CSO
pre-processor could be used for batch download of a selection.
Copernicus Open Access Hub
==========================
The `Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_ is the official
portal for Copernicus satellite data.
.. figure:: figs/sci-hub.png
:scale: 50 %
:align: center
:alt: Sci Hub main page.
*Home page of Copernicus Open Access Hub* (https://scihub.copernicus.eu/)
Different hubs are provided for different data sets.
Below we describe:
* **Open Hub** Provides access to Sentinel-1,2,3 data.
* **S-5P Pre-Ops** Provides access to (pre-operational) Sentinel-5P data.
.. _SciHub-OpenHub:
Open Hub
--------
The *Open Hub* provides access to Sentinel-1,2,3 data.
Since this is the oldest data, most examples in the User Guide refer to this hub.
To download data it is necessary to register.
On the main page, select:
* `User Guide <https://scihub.copernicus.eu/userguide/>`_,
follow instructions on `Self Registration <https://scihub.copernicus.eu/userguide/>`_ .
The user name and password should be stored in your home directory in the ``~/.netrc`` file
(ensure that it has read/write permissions only for you)::
# access Copernicus Open Access Hub
machine scihub.copernicus.eu login ********** password *********
S-5P Pre-Ops Hub
----------------
On the main page, select *'S-5P Pre-Ops'* for the
`Sentinel-5P Pre-Operations Data Hub <https://s5phub.copernicus.eu/dhus/>`_ .
Access is open, the login/password are shown in the login pop-up.
The user name and password should be stored in your home directory in the ``~/.netrc`` file
(ensure that it has read/write permissions only for you)::
# access Copernicus Open Access Hub
machine s5phub.copernicus.eu login s5pguest password s5pguest
Data can be selected and downloaded interactively.
In the search bar, open the *'Advanced Search'* menu and specify a selection.
The figure below shows an example for Level2 NO2 data.
.. figure:: figs/s5p-hub-advanced-search.png
:scale: 75 %
:align: center
:alt: Advanced Search menu at S-5P Hub.
*Advanced Search menu at S-5P Hub.*
The result of a search is a list of product files.
Each product file contains data from one orbit.
The name of a product file contains the scan period, the orbit number,
and a processing time::
S5P_OFFL_L2__NO2____20190701T102357_20190701T120527_08882_01_010302_20190707T120219
| | \_prodid_/ \_start-time__/ \__end-time___/ orbit | | \__processed__/
| stream | processor
mission collection
The search query is shown above the product list, and contains the time
and product selections; the later could be useful for batch download::
( beginPosition:[2019-07-01T00:00:00.000Z TO 2019-07-01T23:59:59.999Z] AND
endPosition:[2019-07-01T00:00:00.000Z TO 2019-07-01T23:59:59.999Z] )
AND ( ( platformname:Sentinel-5 AND
producttype:L2__NO2___ AND
processinglevel:L2 AND
processingmode:Offline ) )
.. _scihub-opensearch-download:
OpenSearch API
--------------
For batch processing one can use the
`OpenSearch API <https://scihub.copernicus.eu/userguide/OpenSearchAPI>`_ .
First a search query is send to the server, and the result then contains links that can be used to download
selected files.
.. _scihub-batch-download:
Batch download
--------------
An alternative for batch processing is to use the
`API Hub <https://scihub.copernicus.eu/twiki/do/view/SciHubWebPortal/APIHubDescription>`_ .
This page also contains a link to the `section of the User Guide <https://scihub.copernicus.eu/userguide/8BatchScripting>`_
with instructions for batch processing.
From the User Guide one can download the *'dhusget.sh'* script.
This will take care of searching the archive, downloading files, checking if the download
is complete, etc.
The call to download all NO2 data for a single day that overlaps with a longitude/latitude
box over Europe looks like::
./dhusget.sh \\
-d 'https://s5phub.copernicus.eu/dhus' \\
-S '2018-07-01T00:00:00.000Z' \\
-E '2018-07-02T00:00:00.000Z' \\
-c '-30,30:45,76' \\
-F 'platformname:Sentinel-5 AND producttype:L2__NO2___ AND processinglevel:L2 AND processingmode:Reprocessing' \\
-o product \\
-O /work/Sentinel-5P/TROPOMI/NO2/ \\
-D
The download script from the API HUb is included in the CSO package,
with a few minor modifications:
`bin/dhusget.sh <../../../bin/dhusget.sh>`_
Batch download via CSO
======================
The CSO download tools use by default the :ref:`scihub-opensearch-download`,
but could also use the :ref:`scihub-batch-download` option.
The :py:class:`.CSO_SciHub_Download` class performs the download using the OpenSearch API.
This class is prefered since:
* existing files could be kept without a new download;
* error messages are cleaner.
See the documentation of the class for the settings to be used.
Alternatively, the :py:class:`.CSO_SciHub_Download_DHuS` class could be used
which will call the ``dhusget.sh`` script to perform the download.
This will give a lot of intermediate log files, and error messages that are not easily interpreted.
Class hierchy
=============
The classes and are defined according to the following hierchy:
* :py:class:`.CSO_SciHub_DownloadFile`
* :py:class:`.UtopyaRc`
* :py:class:`.CSO_SciHub_Inquire`
* :py:class:`.CSO_SciHub_DownloadFile`
* :py:class:`.CSO_SciHub_Download`
Classes
=======
"""
########################################################################
###
### modules
###
########################################################################
# modules:
import logging
# tools:
import utopya
########################################################################
###
### OpenSearch inquire
###
########################################################################
class CSO_SciHub_Inquire(utopya.UtopyaRc):
"""
Inquire available Sentinel data from the `Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_
using the OpenSearch API:
* `Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_
* `User Guide <https://scihub.copernicus.eu/userguide/>`_
* `6 OpenSearch API <https://scihub.copernicus.eu/userguide/OpenSearchAPI>`_
A query is sent to search for products that are available
for a certain time and overlap with a specified region.
The result is a list with orbit files and instructions on how to download them.
In the settings, specify the url of the hub, which is either the Open Data hub or the S5-P hub::
! server url; provide login/password in ~/.netrc:
<rcbase>.url : https://s5phub.copernicus.eu/dhus
Specify the time range over which files should be downloaded::
<rcbase>.timerange.start : 2018-07-01 00:00
<rcbase>.timerange.end : 2018-07-01 23:59
Specify a target area, only orbits with some pixels within the defined box will be downloaded::
! target area, leave empty for globe; format: west,south:east,north
<rcbase>.area :
!<rcbase>.area : -30,30:35,76
A query is used to select the required data.
The search box on the hub could be used for inspiration on the format.
Note that the '``producttype``' should have exactly 10 characters,
with the first 3 used for the retrieval level, and the last 6 for the product;
empty characters should have an underscore instead::
! search query, obtained from interactive download:
<rcbase>.query : platformname:Sentinel-5 AND \\
producttype:L2__NO2___ AND \\
processinglevel:L2
Name of ouptut csv file::
! output table, date of today:
<rcbase>.output.file : ${my.work}/Copernicus_S5P_NO2_%Y-%m-%d.csv
"""
def __init__(self, rcfile, rcbase="", env={}, indent=""):
"""
Download orbits using OpenSearch API.
"""
# modules:
import sys
import os
import datetime
import requests
import xml.etree.ElementTree
import pandas
# info ...
logging.info(indent + "")
logging.info(indent + "** Inquire files available on Copernicus Data Hub")
logging.info(indent + "")
# init base object:
utopya.UtopyaRc.__init__(self, rcfile=rcfile, rcbase=rcbase, env=env)
# domain:
url = self.GetSetting("url")
# info ...
logging.info(indent + "url : %s" % url)
# area of interest: west,south:east,north
area = self.GetSetting("area")
# defined?
if len(area) > 0:
# convert from format for "dhusget.sh":
# west,south:east,north
west, south, east, north = map(float, area.replace(":", " ").replace(",", " ").split())
else:
# globe:
west, south, east, north = -180, -90, 180, 90
# endif
# info ...
logging.info(
indent + "area : [%8.2f,%8.2f] x [%8.2f,%8.2f]" % (west, east, south, north)
)
# query, probably obtained from interactive download page:
product = self.GetSetting("query")
# remove excesive whitespace:
product = " ".join(product.split())
# info ...
logging.info(indent + "product : %s" % product)
# target file, might include time templates:
output_file__template = self.GetSetting("output.file")
# current time:
output_file = datetime.datetime.now().strftime(output_file__template)
# new output table:
output_df = pandas.DataFrame()
# time range:
t1 = self.GetSetting("timerange.start", totype="datetime")
t2 = self.GetSetting("timerange.end", totype="datetime")
# info ...
tfmt = "%Y-%m-%d %H:%M"
logging.info(indent + "timerange: [%s,%s]" % (t1.strftime(tfmt), t2.strftime(tfmt)))
# targefile:
qfile = "query-%i.xml" % os.getpid()
# timeout of requests in seconds:
timeout = 60
# search query could only return a maximum number of records;
# a 'page' of records is requested using a row offet and the number of rows:
row0 = 0
nrow = 100 # maximum allowed is 100 ...
# time labels:
tfmt = "%Y-%m-%dT%H:%M:%SZ"
# time of interest:
toi = "beginPosition:[%s TO %s]" % (t1.strftime(tfmt), t2.strftime(tfmt))
# geographic area:
# 'POLYGON((P1Lon P1Lat, P2Lon P2Lat, ..., PnLon PnLat))'
area = "POLYGON((%f %f, %f %f, %f %f, %f %f, %f %f))" % (
west,
south,
east,
south,
east,
north,
west,
north,
west,
south,
)
# area of interest:
aoi = 'footprint:"Intersects(%s)"' % area
# info ..
logging.info(indent + " time of interest: %s" % toi)
logging.info(
indent
+ " area of interest: [%8.2f,%8.2f] x [%8.2f,%8.2f]" % (west, east, south, north)
)
logging.info(indent + " product : %s" % product)
# combine:
query = "%s AND %s AND (%s)" % (toi, product, aoi)
# init counter:
ipage = 0
# loop over pages of query result:
while True:
# increase counter:
ipage += 1
# info ...
logging.info(indent + " page %i (entries %i,..,%i)" % (ipage, row0 + 1, row0 + nrow))
# send query to search page;
# result is an xml tree with entries for matching orbit files;
# a maximum of 'nrow' entries is returned, from a zero-based 'start' index onwards:
r = requests.get(
os.path.join(url, "search"),
params=dict(q=query, start=row0, rows=nrow),
timeout=timeout,
)
# check status, raise error if request failed:
try:
r.raise_for_status()
except requests.exceptions.HTTPError as err:
msg = str(err)
logging.error("from query; message received:")
logging.error(" %s" % msg)
if msg.startswith("401 Client Error: Unauthorized for url:"):
logging.error(
'Interpretation: the (username,password) received from your "~/.netrc" file is incorrect.'
)
logging.error("For the S5p-hub, the file should contain the following entry:")
logging.error(
" machine s5phub.copernicus.eu login s5pguest password s5pguest"
)
# get authorization as test:
try:
username, password = requests.utils.get_netrc_auth(url, raise_errors=True)
logging.error('The username extracted from "~/.netrc" for this url is:')
logging.error(" %s" % username)
logging.error("Maybe this is the default entry?")
except Exception as err:
logging.error(
'The "~/.netrc" file could not be parsed correctly; the error raised from:'
)
logging.error(
' username,password = requests.utils.get_netrc_auth( "%s", raise_errors=True )'
% url
)
logging.error("is:")
logging.error(" %s" % str(err))
# endtry
# endif
except Exception as err:
msg = str(err)
logging.error("from query; message received:")
logging.error(" %s" % msg)
sys.exit(1)
# endtry
# save:
with open(qfile, "w") as f:
f.write(r.text)
# endwith
#
# Example content:
#
# <?xml version="1.0" encoding="utf-8"?>
# <feed xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns="http://www.w3.org/2005/Atom">
# <title>Data Hub Service search results for: beginPosition:[2018-07-01T00:00:00Z TO 2018-07-02T00:00:00Z] AND platformname:Sentinel-5 AND producttype:L2__NO2___ AND processinglevel:L2 AND processingmode:Reprocessing AND (footprint:"Intersects(POLYGON((-30.000000 30.000000, 45.000000 30.000000, 45.000000 76.000000, -30.000000 76.000000, -30.000000 30.000000)))")</title>
# ...
# <entry>
# <title>S5P_RPRO_L2__NO2____20180701T024001_20180701T042329_03699_01_010202_20190211T185158</title>
# <link href="https://s5phub.copernicus.eu/dhus/odata/v1/Products('a66d4240-4999-4724-9903-aa2db410bbad')/$value"/>
# <str name="filename">S5P_RPRO_L2__NO2____20180701T024001_20180701T042329_03699_01_010202_20190211T185158.nc</str>
# <str name="uuid" >a66d4240-4999-4724-9903-aa2db410bbad</str>
# ...
# </entry>
# ...
# </feed>
#
# open query result:
tree = xml.etree.ElementTree.parse(qfile)
# get root element ("feed"):
root = tree.getroot()
# the tag contains an url prefix:
# root.tag = "{http://www.w3.org/2005/Atom}feed"
# prefixed?
if root.tag.startswith("{"):
# extract prefix:
prefix, tag = root.tag[1:].split("}")
else:
# no prefix:
prefix = ""
# endif
# create namespace to translate the "atom:" to the prefix:
ns = {"atom": prefix}
# find "entry" elements:
entries = root.findall("atom:entry", namespaces=ns)
# count:
nrec = len(entries)
# info ..
logging.info(indent + " number of records found: %i" % nrec)
# check ..
if (ipage == 1) and (nrec == 0):
logging.warning(
" no records found for this day; something wrong in query? continue with next day ..."
)
break
# endif
# loop over entries:
for entry in entries:
# get target filename, this is the content of the "str" element
# with attribute "name" equal to "filename":
# <str name="filename">S5P_RPRO_L2__NO2____20180701T024001_20180701T042329_03699_01_010202_20190211T185158.nc</str>
filename = entry.find('atom:str[@name="filename"]', ns).text
# info ...
logging.info(indent + " file : %s" % filename)
#
# filenames:
#
# S5P_OFFL_L2__NO2____20180701T005930_20180701T024100_03698_01_010002_20180707T022838.nc
# plt proc [product-] [starttime....] [endtime......] orbit cl prvers [prodtime.....]
#
bname = os.path.basename(filename).replace(".nc", "")
# split:
platform_name, processing, rest = bname.split("_", 2)
product_type = rest[0:10]
start_time, end_time, orbit, collection, processor_version, production_time = rest[
11:
].split("_")
# convert:
tfmt = "%Y%m%dT%H%M%S"
ts = datetime.datetime.strptime(start_time, tfmt)
te = datetime.datetime.strptime(end_time, tfmt)
# find first "link" element:
link = entry.find("atom:link", ns)
# extract download link:
href = link.attrib["href"]
# row:
rec = {
"orbit": [orbit],
"start_time": [ts],
"end_time": [te],
"processing": [processing],
"collection": [collection],
"processor_version": [processor_version],
"filename": [filename],
"href": [href],
}
# add record:
output_df = pandas.concat((output_df, pandas.DataFrame(rec)), ignore_index=True)
# endfor # entries
# cleanup:
os.remove(qfile)
# leave loop over pages?
if nrec < nrow:
break
# increse offset:
row0 += nrow
# endwhile # pages
# info ..
logging.info("save to: %s ..." % output_file)
# create directory:
dirname = os.path.dirname(output_file)
if len(dirname) > 0:
if not os.path.isdir(dirname):
os.makedirs(dirname)
# endif
# write:
output_df.to_csv(output_file, sep=";", index=False)
# info ...
logging.info(indent + "")
logging.info(indent + "** end SciHub inquire")
logging.info(indent + "")
# enddef __init__
# endclass CSO_SciHub_Inquire
########################################################################
###
### OpenSearch download
###
########################################################################
class CSO_SciHub_DownloadFile(object):
"""
Download single file from SciHub.
Arguments:
* ``href`` : download url: ``https://s5phub.copernicus.eu/dhus/odata/v1/Products('d483baa0-3a61-4985-aa0c-5642a83c9214')/$value``
* ``output_file`` : target file
Optional arguments:
* ``maxtry`` : number of times to try again if download fails
* ``timeout`` : delay in seconds between requests
"""
def __init__(self, href, output_file, maxtry=10, timeout=60, indent=""):
"""
Download file.
"""
# modules:
import os
import requests
# retry loop ..
ntry = 0
while True:
# try to download and save:
try:
# try to download:
try:
# download:
r = requests.get(href, timeout=timeout)
# check status, raise error if request failed:
r.raise_for_status()
# info ..
logging.info(indent + " write to: %s ..." % os.path.dirname(output_file))
# target dir:
dname = os.path.dirname(output_file)
if len(dname) > 0:
if not os.path.isdir(dname):
os.makedirs(dname)
# endif
# write to temporary target first ..
tmpfile = output_file + ".tmp"
# open destination file for binary write:
with open(tmpfile, "wb") as fd:
# prefered way to write content following:
# https://docs.python-requests.org/en/master/user/quickstart/
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
# endfor
# endwith
# rename:
os.rename(tmpfile, output_file)
except requests.exceptions.HTTPError as err:
# info ..
msg = str(err)
logging.error("exception from download; message received:")
logging.error(" %s" % msg)
# catch known problem ...
if msg.startswith("401 Client Error: Unauthorized for url:"):
logging.error(
'Interpretation: the (username,password) received from your "~/.netrc" file is incorrect.'
)
logging.error(
"For the S5p-hub, the file should contain the following entry:"
)
logging.error(
" machine s5phub.copernicus.eu login s5pguest password s5pguest"
)
# try to get authorization as test:
try:
username, password = requests.utils.get_netrc_auth(
url, raise_errors=True
)
logging.error(
'The username extracted from "~/.netrc" for this url is:'
)
logging.error(" %s" % username)
logging.error("Maybe this is the default entry?")
except Exception as err:
logging.error(
'The "~/.netrc" file could not be parsed correctly; the error raised from:'
)
logging.error(
' username,password = requests.utils.get_netrc_auth( "%s", raise_errors=True )'
% url
)
logging.error("is:")
logging.error(" %s" % str(err))
# endtry
# should be solved first, leave retry loop ...
break
# endif
except MemoryError as err:
logging.error(
"memory error from download; need to increase allocated resources?"
)
# quit with error:
raise
except Exception as err:
# info ..
logging.error("from download; message received:")
logging.error(" %s" % str(err))
# quit with error:
raise
# endtry
# error from download or save:
except:
# increase counter:
ntry += 1
# switch:
if ntry == maxtry:
logging.warning(
indent + " exception from download; tried %i times ..." % maxtry
)
raise Exception
else:
logging.warning(indent + " exception from download; try again ...")
continue # while-loop
# endif
# endtry
# leave retry loop,
# either because download was ok,
# or because maximum number of retries was reached:
break
# endwhile # retry
# enddef __init__
# endclass CSO_SciHub_DownloadFile
# *
class CSO_SciHub_Download(utopya.UtopyaRc):
"""
Download Sentinel data from the `Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_
using the OpenSearch API:
* `Copernicus Open Access Hub <https://scihub.copernicus.eu/>`_
* `User Guide <https://scihub.copernicus.eu/userguide/>`_
* `6 OpenSearch API <https://scihub.copernicus.eu/userguide/OpenSearchAPI>`_
To download orbit files, first a query is sent to search for products that are available
for a certain time and overlap with a specified region.
The result is a list with orbit files and instructions on how to download them.
In the settings, specify the url of the hub, which is either the Open Data hub or the S5-P hub::
! server url; provide login/password in ~/.netrc:
<rcbase>.url : https://s5phub.copernicus.eu/dhus
Specify the time range over which files should be downloaded::
<rcbase>.timerange.start : 2018-07-01 00:00
<rcbase>.timerange.end : 2018-07-01 23:59
Specify a target area, only orbits with some pixels within the defined box will be downloaded::
! target area, leave empty for globe; format: west,south:east,north
<rcbase>.area :
!<rcbase>.area : -30,30:35,76
A query is used to select the required data.
The search box on the hub could be used for inspiration on the format.
Note that the '``producttype``' should have exactly 10 characters,
with the first 3 used for the retrieval level, and the last 6 for the product;
empty characters should have an underscore instead::
! search query, obtained from interactive download:
<rcbase>.query : platformname:Sentinel-5 AND \\
producttype:L2__NO2___ AND \\
processinglevel:L2 AND \\
processingmode:Offline
The target directory for downloaded file could include templates for time values::
! output archive, store per month:
<rcbase>.output.dir : /data/Copernicus/S5P/OFFL/NO2/%Y/%m
Use the following flag to keep files that are already present::
! renew existing files?
<rcbase.renew : False
"""
def __init__(self, rcfile, rcbase="", env={}, indent=""):
"""
Download orbits using OpenSearch API.
"""
# modules:
import sys
import os
import datetime
import requests
import xml.etree.ElementTree
# info ...
logging.info(indent + "")
logging.info(indent + "** Download from Copernicus Data Hub")
logging.info(indent + "")
# init base object:
utopya.UtopyaRc.__init__(self, rcfile=rcfile, rcbase=rcbase, env=env)
# domain:
url = self.GetSetting("url")
# info ...
logging.info(indent + "url : %s" % url)
# area of interest: west,south:east,north
area = self.GetSetting("area")
# convert from format for "dhusget.sh":
# west,south:east,north
west, south, east, north = map(float, area.replace(":", " ").replace(",", " ").split())
# info ...
logging.info(
indent + "area : [%8.2f,%8.2f] x [%8.2f,%8.2f]" % (west, east, south, north)
)
# query, probably obtained from interactive download page:
product = self.GetSetting("query")
# remove excesive whitespace:
product = " ".join(product.split())
# info ...
logging.info(indent + "product : %s" % product)
# output dir, might contain time templates:
output_dir__template = self.GetSetting("output.dir")
# renew existing files?
renew = self.GetSetting("renew", totype="bool")
# time range:
t1 = self.GetSetting("timerange.start", totype="datetime")
t2 = self.GetSetting("timerange.end", totype="datetime")
# info ...
tfmt = "%Y-%m-%d %H:%M"
logging.info(indent + "timerange: [%s,%s]" % (t1.strftime(tfmt), t2.strftime(tfmt)))
# targefile:
qfile = "query-%i.xml" % os.getpid()
# timeout of requests in seconds:
timeout = 60
# search query could only return a maximum number of records;
# a certain 'page' of records is requesing a row offet and number:
row0 = 0
nrow = 100 # maximum allowed is 100 ...
# number of retries:
maxtry = 10
# loop over days:
t = t1
while t < t2:
# info ...
logging.info(indent + "%s ..." % t.strftime("%Y-%m-%d"))
# start and end time:
ts = t
# add 24h:
tx = t + datetime.timedelta(1)
# convert to midnight:
tx24 = datetime.datetime(tx.year, tx.month, tx.day, 0, 0)
# not after t2 ...
te = min(tx24, t2)
# time labels:
tfmt = "%Y-%m-%dT%H:%M:%SZ"
# time of interest:
toi = "beginPosition:[%s TO %s]" % (ts.strftime(tfmt), te.strftime(tfmt))
# geographic area:
# 'POLYGON((P1Lon P1Lat, P2Lon P2Lat, ..., PnLon PnLat))'
area = "POLYGON((%f %f, %f %f, %f %f, %f %f, %f %f))" % (
west,
south,
east,
south,
east,
north,
west,
north,
west,
south,
)
# area of interest:
aoi = 'footprint:"Intersects(%s)"' % area
# info ..
logging.info(indent + " time of interest: %s" % toi)
logging.info(
indent
+ " area of interest: [%8.2f,%8.2f] x [%8.2f,%8.2f]" % (west, east, south, north)
)
logging.info(indent + " product : %s" % product)
# combine:
query = "%s AND %s AND (%s)" % (toi, product, aoi)
# init counter:
ipage = 0
# current:
output_dir = t.strftime(output_dir__template)
# info ..
logging.info(indent + " output directory: %s" % output_dir)
# create?
if not os.path.isdir(output_dir):
os.makedirs(output_dir)
# loop over pages of query result:
while True:
# increase counter:
ipage += 1
# info ...
logging.info(
indent + " page %i (entries %i,..,%i)" % (ipage, row0 + 1, row0 + nrow)
)
# send query to search page;
# result is an xml tree with entries for matching orbit files;
# a maximum of 'nrow' entries is returned, from a zero-based 'start' index onwards:
r = requests.get(
os.path.join(url, "search"),
params=dict(q=query, start=row0, rows=nrow),
timeout=timeout,
)
# check status, raise error if request failed:
try:
r.raise_for_status()
except requests.exceptions.HTTPError as err:
msg = str(err)
logging.error("from query; message received:")
logging.error(" %s" % msg)
if msg.startswith("401 Client Error: Unauthorized for url:"):
logging.error(
'Interpretation: the (username,password) received from your "~/.netrc" file is incorrect.'
)
logging.error(
"For the S5p-hub, the file should contain the following entry:"
)
logging.error(
" machine s5phub.copernicus.eu login s5pguest password s5pguest"
)
# get authorization as test:
try:
username, password = requests.utils.get_netrc_auth(
url, raise_errors=True
)
logging.error(
'The username extracted from "~/.netrc" for this url is:'
)
logging.error(" %s" % username)
logging.error("Maybe this is the default entry?")
except Exception as err:
logging.error(
'The "~/.netrc" file could not be parsed correctly; the error raised from:'
)
logging.error(
' username,password = requests.utils.get_netrc_auth( "%s", raise_errors=True )'
% url
)
logging.error("is:")
logging.error(" %s" % str(err))
# endtry
# endif
except Exception as err:
msg = str(err)
logging.error("from query; message received:")
logging.error(" %s" % msg)
sys.exit(1)
# endtry
# save:
with open(qfile, "w") as f:
f.write(r.text)
# endwith
#
# Example content:
#
# <?xml version="1.0" encoding="utf-8"?>
# <feed xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns="http://www.w3.org/2005/Atom">
# <title>Data Hub Service search results for: beginPosition:[2018-07-01T00:00:00Z TO 2018-07-02T00:00:00Z] AND platformname:Sentinel-5 AND producttype:L2__NO2___ AND processinglevel:L2 AND processingmode:Reprocessing AND (footprint:"Intersects(POLYGON((-30.000000 30.000000, 45.000000 30.000000, 45.000000 76.000000, -30.000000 76.000000, -30.000000 30.000000)))")</title>
# ...
# <entry>
# <title>S5P_RPRO_L2__NO2____20180701T024001_20180701T042329_03699_01_010202_20190211T185158</title>
# <link href="https://s5phub.copernicus.eu/dhus/odata/v1/Products('a66d4240-4999-4724-9903-aa2db410bbad')/$value"/>
# <str name="filename">S5P_RPRO_L2__NO2____20180701T024001_20180701T042329_03699_01_010202_20190211T185158.nc</str>
# <str name="uuid" >a66d4240-4999-4724-9903-aa2db410bbad</str>
# ...
# </entry>
# ...
# </feed>
#
# open query result:
tree = xml.etree.ElementTree.parse(qfile)
# get root element ("feed"):
root = tree.getroot()
# the tag contains an url prefix:
# root.tag = "{http://www.w3.org/2005/Atom}feed"
# prefixed?
if root.tag.startswith("{"):
# extract prefix:
prefix, tag = root.tag[1:].split("}")
else:
# no prefix:
prefix = ""
# endif
# create namespace to translate the "atom:" to the prefix:
ns = {"atom": prefix}
# find "entry" elements:
entries = root.findall("atom:entry", namespaces=ns)
# count:
nrec = len(entries)
# info ..
logging.info(indent + " number of records found: %i" % nrec)
# check ..
if (ipage == 1) and (nrec == 0):
logging.warning(
" no records found for this day; something wrong in query? continue with next day ..."
)
break
# endif
# loop over entries:
for entry in entries:
# get target filename, this is the content of the "str" element
# with attribute "name" equal to "filename":
# <str name="filename">S5P_RPRO_L2__NO2____20180701T024001_20180701T042329_03699_01_010202_20190211T185158.nc</str>
filename = entry.find('atom:str[@name="filename"]', ns).text
# full path:
output_file = os.path.join(output_dir, filename)
# info ...
logging.info(indent + " file : %s" % filename)
# (re)new?
if (not os.path.isfile(output_file)) or renew:
# find first "link" element:
link = entry.find("atom:link", ns)
# extract download link:
href = link.attrib["href"]
# info ..
logging.info(indent + " download: %s" % href)
# sinlge file:
CSO_SciHub_DownloadFile(
href, output_file, maxtry=maxtry, timeout=timeout, indent=indent
)
else:
# info ..
logging.info(indent + " keep in: %s ..." % output_dir)
# endif
## testing ...
# break
# endfor # entries
# cleanup:
os.remove(qfile)
# leave loop over pages?
if nrec < nrow:
break
# increse offset:
row0 += nrow
# endwhile # pages
# start of next day:
t = te
# endwhile # day loop
# info ...
logging.info(indent + "")
logging.info(indent + "** end SciHub Download")
logging.info(indent + "")
# enddef __init__
# endclass CSO_SciHub_Download
########################################################################
###
### create listing file
###
########################################################################
class CSO_SciHub_Listing(utopya.UtopyaRc):
"""
Create *listing* file for files download from SciHub or other portals that use equivalent filenames.
A *listing* file contains the names of the converted orbit files,
the time range of pixels in the file, and other information extracted from the filenames:
filename ;mission;processing;product_id;start_time ;end_time ;orbit;collection;processor_version;processing_time
RPRO/CH4/2018/04/S5P_RPRO_L2__CH4____20180430T001851_20180430T020219_02818_01_010301_20190513T141133.nc;S5P ;RPRO ;L2__CH4___;2018-04-30T00:18:51;2018-04-30T02:02:19;02818;01 ;010301 ;2019-05-13T14:11:33
RPRO/CH4/2018/04/S5P_RPRO_L2__CH4____20180430T020021_20180430T034349_02819_01_010301_20190513T135953.nc;S5P ;RPRO ;L2__CH4___;2018-04-30T02:00:21;2018-04-30T03:43:49;02819;01 ;010301 ;2019-05-13T13:59:53
:
This file could be used to scan for available versions and how they were produced.
In the settings, define the name of the file to be created::
! csv file that will hold records per file with:
! - timerange of pixels in file
! - orbit number
! time templates are replaced with todays date
<rcbase>.file : /Scratch/Copernicus/S5p/listing-CH4__%Y-%m-%d.csv
An existing listing file is not replaced,
unless the following flag is set::
! renew table?
<rcbase>.renew : True
Orbit files are searched within a timerange::
<rcbase>.timerange.start : 2018-06-01 00:00
<rcbase>.timerange.end : 2018-06-03 23:59
Specify filename filters to search for orbit files;
the patterns are relative to the basedir of the listing file,
and might contain templates for the time values.
Multiple patterns could be defined; if for a certain orbit number more than one
file is found, the first match is used.
This could be explored to create a listing that combines reprocessed data
with near-real-time data::
<rcbase>.patterns : RPRO/CH4/%Y/%m/S5p_*.nc \
OFFL/CH4/%Y/%m/S5p_*.nc
"""
def __init__(self, rcfile, rcbase="", env={}, indent=""):
"""
Convert data.
"""
# modules:
import os
import datetime
import glob
import collections
# tools:
import cso_file
# info ...
logging.info(indent + "")
logging.info(indent + "** create listing file")
logging.info(indent + "")
# init base object:
utopya.UtopyaRc.__init__(self, rcfile=rcfile, rcbase=rcbase, env=env)
# renew output?
renew = self.GetSetting("renew", totype="bool")
# table file to be written:
lst_file = self.GetSetting("file")
# evaluate current time:
lst_file = datetime.datetime.now().strftime(lst_file)
# create?
if (not os.path.isfile(lst_file)) or renew:
# info ..
logging.info(indent + "create %s ..." % lst_file)
# time range:
t1 = self.GetSetting("timerange.start", totype="datetime")
t2 = self.GetSetting("timerange.end", totype="datetime")
# info ...
tfmt = "%Y-%m-%d %H:%M"
logging.info(indent + " timerange: [%s,%s]" % (t1.strftime(tfmt), t2.strftime(tfmt)))
# base directory:
bdir = os.path.dirname(lst_file)
# create?
if len(bdir) > 0:
if not os.path.isdir(bdir):
os.makedirs(bdir)
# endif
# current directory?
if len(bdir) == 0:
bdir = "."
# info ...
logging.info(indent + " base directory: %s ..." % bdir)
# initiallize for (re)creation:
listing = cso_file.CSO_Listing(lst_file, indent=indent + " ")
# info ...
logging.info(indent + " cleanup records if necessary ...")
# remove entries that do not exist anymore:
listing.Cleanup(indent=indent + " ")
# filename pattern templates:
pattern_templates = self.GetSetting("patterns").split()
# collection of scanned patterns:
patterns = []
# loop over days:
t = t1
while t <= t2:
# loop over patterns:
for pattern_template in pattern_templates:
# expand time values:
pattern = t.strftime(pattern_template)
# skip if already scanned ...
if pattern in patterns:
continue
# store:
patterns.append(pattern)
# info ...
logging.info(indent + "scan %s ..." % pattern)
# list relative to basedir:
cwd = os.getcwd()
os.chdir(bdir)
fnames = glob.glob(pattern)
os.chdir(cwd)
# empty ?
if len(fnames) == 0:
logging.info(indent + " empty ..")
continue
# endif
# sort in place:
fnames.sort()
# loop over files:
for fname in fnames:
# absolute path:
filename = os.path.join(bdir, fname)
# already in table?
if fname in listing:
# info ...
logging.info(indent + " keep entry %s ..." % fname)
else:
# info ...
logging.info(indent + " add entry %s ..." % fname)
# Example filename:
# S5P_RPRO_L2__CH4____20180430T001851_20180430T020219_02818_01_010301_20190513T141133.nc
#
# Some products have incorrect product id (should be 10 characters):
# S5P_OFFL_L2__CHOCHO___20200101T005246_20200101T023416_11487_01_010000_20210128.nc
# The extracted product id is then truncated to 10 characters.
#
# basename:
bname, ext = os.path.splitext(os.path.basename(filename))
# extract:
try:
mission, processing, rest = bname.split("_", 2)
if rest.startswith("L2__CHOCHO__"):
product_id = rest[0:10]
(
start_time,
end_time,
orbit,
collection,
processor_version,
prod_time,
) = rest[13:].split("_")
else:
product_id = rest[0:10]
(
start_time,
end_time,
orbit,
collection,
processor_version,
prod_time,
) = rest[11:].split("_")
# endif
except:
logging.error("could not extract filename parts; expected format:")
logging.error(
" S5P_RPRO_L2__CH4____20180430T001851_20180430T020219_02818_01_010301_20190513T141133"
)
logging.error("found:")
logging.error(" %s" % bname)
raise
# endif
# fill data record:
data = collections.OrderedDict()
tfmt = "%Y%m%dT%H%M%S"
data["start_time"] = datetime.datetime.strptime(start_time, tfmt)
data["end_time"] = datetime.datetime.strptime(end_time, tfmt)
data["mission"] = mission
data["processing"] = processing
data["product_id"] = product_id
data["orbit"] = orbit
data["collection"] = collection
data["processor_version"] = processor_version
if len(prod_time) == 8:
data["processing_time"] = datetime.datetime.strptime(
prod_time, "%Y%m%d"
)
else:
data["processing_time"] = datetime.datetime.strptime(
prod_time, tfmt
)
# endif
# update record:
listing.UpdateRecord(fname, data, indent=indent + " ")
# endif # new record?
# endfor # filenames
# endfor # patterns
## testing ...
# break
# next
t = t + datetime.timedelta(0, 3600)
# endwhile
# save:
listing.Close(indent=indent + " ")
else:
# info ..
logging.info(indent + "keep %s ..." % lst_file)
# endif
# info ...
logging.info(indent + "")
logging.info(indent + "** end listing")
logging.info(indent + "")
# enddef __init__
# endclass CSO_SciHub_Listing
#
# * plot listing
#
class CSO_SciHub_ListingPlot(utopya.UtopyaRc):
"""
Create timeseries plot of number of orbits per processor version.
Information taken from *listing* file created by :py:class:`CSO_SciHub_Listing` class.
"""
def __init__(self, rcfile, rcbase="", env={}, indent=""):
"""
Convert data.
"""
# modules:
import os
import pandas
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# info ...
logging.info(indent + "")
logging.info(indent + "** create listing plot")
logging.info(indent + "")
# init base object:
utopya.UtopyaRc.__init__(self, rcfile=rcfile, rcbase=rcbase, env=env)
# renew output?
renew = self.GetSetting("renew", totype="bool")
# table file to be written:
lst_file = self.GetSetting("file")
# info ..
logging.info(indent + "listing file: %s" % lst_file)
# target file:
fig_file = lst_file.replace(".csv", ".png")
# create?
if (not os.path.isfile(fig_file)) or renew:
# info ..
logging.info(indent + "create %s ..." % fig_file)
# frequency:
# freq = 'MS' ; freqlabel = 'month'
freq = "W"
freqlabel = "week"
# freq = 'DS' ; freqlabel = 'day'
# color list:
colors = ["purple", "blue", "cyan", "gold", "green", "red", "magenta", "brown"]
# read:
df = pandas.read_csv(
lst_file,
sep=";",
index_col="filename",
dtype="str",
parse_dates=["start_time", "end_time", "processing_time"],
)
# time range:
t1 = df["start_time"].min()
t2 = df["start_time"].max()
# full years, extra space for legends:
t1 = pandas.Timestamp(year=t1.year, month=1, day=1)
t2 = max(
t2 + pandas.Timedelta(180, "days"),
pandas.Timestamp(year=t2.year + 1, month=1, day=1),
)
# streams: 'OFFL', 'RPRO', ..
streams = df["stream"].unique()
streams.sort()
# procesors: '010101', ...
procs = df["processor"].unique()
procs.sort()
# count:
nproc = len(procs)
# convert processor labels '010203' to version 'v1.2.3':
proclabs = {}
for proc in procs:
proclabs[proc] = "v%i.%i.%i" % (int(proc[0:2]), int(proc[2:4]), int(proc[4:6]))
# endfor
# list of labeled processors:
labeled = []
# storage for handles used for legends:
proch = {}
streamh = {}
# new:
fig = plt.figure(figsize=(12, 4))
ax = fig.add_axes([0.05, 0.07, 0.92, 0.90])
# loop:
for iproc in range(nproc):
# current:
proc = procs[iproc]
color = colors[iproc]
# loop:
for stream in streams:
# select:
df2 = df[(df["stream"] == stream) & (df["processor"] == proc)]
# any?
if len(df2) > 0:
# group by month, count number of orbits:
nn = (
df2.set_index("start_time")
.groupby(pandas.Grouper(freq=freq))["orbit"]
.count()
)
# annote:
proclab = proclabs[proc]
if proc not in labeled:
label = proclab
labeled.append(proc)
else:
label = None
# endif
# style:
if freq in ["MS"]:
style = dict(color=color, linestyle="-", marker="o")
if stream == "OFFL":
style["linestyle"] = "--"
else:
style = dict(
color=color, linestyle="None", marker="o", markerfacecolor="None"
)
if stream == "RPRO":
style["markerfacecolor"] = color
# endif
# plot non-zero values:
p = ax.plot(nn[nn > 0], label=label, **style)
# store handle for legends:
if proclab not in proch.keys():
proch[proclab] = p[0]
if stream not in streamh.keys():
streamh[stream] = p[0]
# endif
# endfor # streams
# endfor # processors
# time axis:
ax.set_xlim((t1, t2))
ax.xaxis.set_minor_locator(mdates.MonthLocator())
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y"))
ax.grid(axis="x")
# y-axis:
ax.set_ylabel("# orbits / %s" % freqlabel)
ax.set_ylim((0, None))
# annote:
# plt.legend()
# legend for streams:
ax.add_artist(
plt.legend(list(streamh.values()), list(streamh.keys()), loc="upper right")
)
ax.add_artist(plt.legend(list(proch.values()), list(proch.keys()), loc="center right"))
# save:
fig.savefig(fig_file)
else:
# info ..
logging.info(indent + "keep %s ..." % fig_file)
# endif # renew
# info ...
logging.info(indent + "")
logging.info(indent + "** end listing plot")
logging.info(indent + "")
# enddef __init__
# endclass CSO_SciHub_ListingPlot
########################################################################
###
### end
###
########################################################################