# Getting started with openEO and Xarray and Dask
As a back-end provider who wants to provide its datasets, processes and infrastructure to a broader audience through a standardized interface you may want to implement a driver for openEO.
First of all, you should read carefully the getting started guide for service providers.
Note
The Xarray-Dask implementation for openEO is not a full-fledged out-of-the-box openEO back-end, but can be part of the infrastructure for the data management and processing part. In detail it can be used as data source for EO Data Discovery and e.g. in combination with a Dask cluster as processing back-end for Data Processing. In any case, a HTTP REST interface must be available in front of process implementations to properly answer openEO requests.
There are two main components involved with openEO and Xarray:
# Process Graph Parser for Python
- Repository: openeo-pg-parser-networkx (opens new window)
This pg-parser parses OpenEO process graphs from raw JSON into fully traversible networkx graph objects.
The ProcessRegistry can be imported from the pg-parser and includes Process objects, that include a
- spec: Process definition (e.g. https://github.com/Open-EO/openeo-processes)
- implementation: Callable process implementation (https://github.com/Open-EO/openeo-processes-dask/tree/main/openeo_processes_dask/process_implementations)
- namespace
The ProcessRegistry automatically maps from the name of a process to the spec and to the implementation.
Every Process in the ProcessRegistry requires a spec, while implementation and namespace are optional.
An example on how to use the pg-parser can be found here (opens new window).
# Python Processes for openEO
- Repository: openeo-processes-dask (opens new window)
This package includes the implementations of openEO processes, using Xarray and Dask. Currently, the load_collection and save_result process are not included as these implementations can differ widely for different backends.
The specs can be found in the openeo-processes-dask as a submodule. That way, the specification and the implementation are stored close to each other.
# The load_collection and save_result process
As mentioned before, the load_collection and save_result processes are back-end-specific and therefore not included in openeo-processes-dask (opens new window). The load_collection (opens new window) process should return a raster-cube object - to be compliant with the openeo-processes-dask implementations, this should be realized by a xarray.DataArray loaded with dask.
# Connection to ODC and STAC
For testing purposes with DataArrays - which can be loaded from one file - the xarray.open_dataarray() function can be used to implement a basic version of load_collection.
Large data sets can be organised as opendatacube Products or as STAC Collections.
opendatacube Products: The implementation ofload_collectioncan include theopendatacubefunctiondatacube.Datacube.load(). It is recommended to use thedask_chunksparameter, when loading the data. The function returns axarray DataSet, in order to be compliant withopeneo-processes-dask, it can be converted to aDataArrayusing theDataset.to_array(dim='bands')function. A sampleload_collectionprocess using OpenDatacube can be found here (opens new window).STAC Collections: Alternatively, theload_collectionprocess can be implemented using theodc.stac.load()function. To make use ofdask, thechunksparameter must be set. Just as in the previous case, the resultingxarray DataSetcan be converted to aDataArraywithDataset.to_array(dim='bands'). A similar implementation is the one of theload_stacprocess available here (opens new window).
# openEO Client Side Processing
The client-side processing functionality allows to test and use openEO with its processes locally, i.e. without any connection to an openEO back-end. It relies on the projects openeo-pg-parser-networkx (opens new window), which provides an openEO process graph parsing tool, and openeo-processes-dask (opens new window), which provides an Xarray and Dask implementation of most openEO processes.
You can find more information and usage examples in the openEO Python client documentation available here (opens new window).
# Adding a new process
To add a new process, there are changes required in the openeo-processes-dask (opens new window).
- Add the process spec
- Add the process implementation
The HTTP rest interface should have a processes endpoint that reflects the process specs from openeo-processes-dask.
# Add the process spec
Currently, openeo-processes-dask (opens new window) includes the process definitions as a submodule in the openeo-processes-dask/specs. The submodule can be found under https://github.com/eodcgmbh/openeo-processes, which is a fork from https://github.com/Open-EO/openeo-processes to reflect which processes (with their implementations) are actually available in openeo-processes-dask.
# Add the process implementation
- Select a process from processes.openeo.org (opens new window) which does not yet have an implementation in openeo-processes-dask (opens new window).
- Clone openeo-processes-dask (opens new window), checkout a new branch, and start implementing the missing process. Make sure you properly handle all parameters defined for this process. Add a test for your process in
openeo-processes-dask/testsideally using dask. Thecreate_fake_rastercubefrom theopeneo-processes-dask/tests/mockdatacan be used for testing, with thebackendparameter set tonumpyordask. - Push your code and open a PR.
# HTTP REST Interface
The next step would be to set up a HTTP REST interface (i.e. an implementation of the openEO HTTP API) for the new openEO environment. It must be available in front of the process implementations to properly answer openEO client requests. Currently, the EODC (opens new window) and Eurac Research (opens new window) back-ends use Xarray and Dask and thus are the first implementations of back-ends to look at.
- EODC is using a Python implementation, the openeo-fastapi (opens new window).
- Eurac Research relies on a Java based implementation, the openeo-spring-driver (opens new window)
If you have any questions, please contact us.