Cleaning methods when loading optical bands#
Let’s take a peek on the cleaning methods of optical bands and their potential respective time-consumption.
Warning: The durations shown hereunder may not be representative of your computer’s performances. Please take it as a hint about relative performances between constellations.
To summarize:
RAW
is fast and dirtyNODATA
is used by default, still relatively fast and puts nodata outside detectors footprintCLEAN
is the most complete method (used before version0.11.0
) but can be very slow and as the defective pixels are relatively rare. This may be overkill for your usage.
Note that the keywords are working with both load
and stack
functions.
Try with Landsat-8#
Let’s open a Landsat-8 OLCI collection 2 tile. Landsat COL-2 products manage their nodata and defective pixels through two flag files:
QA_PIXELS
QA_RADSAT
See more about these files here
# Imports
import os
from eoreader.reader import Reader
from eoreader.bands import GREEN
from eoreader.keywords import CLEAN_OPTICAL
from eoreader.products import CleanMethod
/home/docs/checkouts/readthedocs.org/user_builds/eoreader/envs/stable/lib/python3.9/site-packages/dask/dataframe/__init__.py:42: FutureWarning:
Dask dataframe query planning is disabled because dask-expr is not installed.
You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.
warnings.warn(msg, FutureWarning)
# Open the product
folder = os.path.join("/home", "ds2_db3", "CI", "eoreader", "optical")
path = os.path.join(folder, "LC08_L1TP_200030_20201220_20210310_02_T1.tar")
reader = Reader()
prod = reader.open(path)
There is no existing products in EOReader corresponding to /home/ds2_db3/CI/eoreader/optical/LC08_L1TP_200030_20201220_20210310_02_T1.tar.
Time the RAW method#
The RAW
method is simple: just open the given tile with no pixel processing.
%%timeit
prod.load(
GREEN,
**{CLEAN_OPTICAL: CleanMethod.RAW}
)
prod.clean_tmp()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 get_ipython().run_cell_magic('timeit', '', 'prod.load(\n GREEN, \n **{CLEAN_OPTICAL: CleanMethod.RAW}\n)\nprod.clean_tmp()\n')
File ~/checkouts/readthedocs.org/user_builds/eoreader/envs/stable/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2517, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
2515 with self.builtin_trap:
2516 args = (magic_arg_s, cell)
-> 2517 result = fn(*args, **kwargs)
2519 # The code below prevents the output from being displayed
2520 # when using magics with decorator @output_can_be_silenced
2521 # when the last Python token in the expression is a ';'.
2522 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):
File ~/checkouts/readthedocs.org/user_builds/eoreader/envs/stable/lib/python3.9/site-packages/IPython/core/magics/execution.py:1185, in ExecutionMagics.timeit(self, line, cell, local_ns)
1183 for index in range(0, 10):
1184 number = 10 ** index
-> 1185 time_number = timer.timeit(number)
1186 if time_number >= 0.2:
1187 break
File ~/checkouts/readthedocs.org/user_builds/eoreader/envs/stable/lib/python3.9/site-packages/IPython/core/magics/execution.py:173, in Timer.timeit(self, number)
171 gc.disable()
172 try:
--> 173 timing = self.inner(it, self.timer)
174 finally:
175 if gcold:
File <magic-timeit>:1, in inner(_it, _timer)
AttributeError: 'NoneType' object has no attribute 'load'
Time the NODATA method#
Only the detector nodata is processed by the NODATA
method.
The bands will be set to nodata
outside of the detector footprint (instead of keeping the raw nodata value)
%%timeit
prod.load(
GREEN,
**{CLEAN_OPTICAL: CleanMethod.NODATA}
)
prod.clean_tmp()
The slowest run took 9.07 times longer than the fastest. This could mean that an intermediate result is being cached.
8.87 s ± 9.71 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Time the CLEAN method#
Every defective pixel given by the provider by the CLEAN
method.
These pixels will be set to nodata
.
%%timeit
prod.load(
GREEN,
**{CLEAN_OPTICAL: CleanMethod.CLEAN}
)
prod.clean_tmp()
4.1 s ± 323 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Try another product: Sentinel-2#
Let’s open a Sentinel-2 (processing baseline < 04.00, ~acquired before end of 2021, with flag files provided as vectors).
The invalid pixel are retrived from the files:
DETFOO
: Detector footprint (nodata outside the detectors)NODATA
: Pixel nodata (inside the detectors) (QT_NODATA_PIXELS
)DEFECT
: Defective pixelsSATURA
: Saturated PixelsTECQUA
: Technical quality mask (MSI_LOST
,MSI_DEG
)
Note: Open the 20 m bands, to have array shapes comparable to Landsat-8.
# Open the product
path = os.path.join(folder, "S2B_MSIL2A_20200114T065229_N0213_R020_T40REQ_20200114T094749.SAFE")
prod = reader.open(path)
Time the RAW method#
The RAW
method is simple: just open the given tile with no pixel processing.
%%timeit
prod.load(
GREEN,
pixel_size=20.,
**{CLEAN_OPTICAL: CleanMethod.RAW}
)
prod.clean_tmp()
4.86 s ± 231 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Time the NODATA method#
Only the detector nodata is processed by the NODATA
method.
The bands will be set to nodata
outside of the detector footprint (instead of keeping the raw nodata value)
%%timeit
prod.load(
GREEN,
pixel_size=20.,
**{CLEAN_OPTICAL: CleanMethod.NODATA}
)
prod.clean_tmp()
5.4 s ± 469 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Time the CLEAN method#
Every defective pixel given by the provider by the CLEAN
method.
These pixels will be set to nodata
.
%%timeit
prod.load(
GREEN,
pixel_size=20.,
**{CLEAN_OPTICAL: CleanMethod.CLEAN}
)
prod.clean_tmp()
5.62 s ± 507 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Try with the latest Sentinel-2 baseline#
Let’s open a Sentinel-2 (processing baseline >= 04.00, ~acquired after end of 2021, with flag files provided as rasters).
The invalid pixel are retrived from the file:
QUALIT
: RegroupingTECQUA
,DEFECT
,NODATA
,SATURA
The nodata pixels (outside detector footprints) are now retrieved from null pixels, as a radiometric offset has been added.
See here for more information about the processing baseline update.
Note: Open the 20 m bands, to have array shapes comparable to Landsat-8.
# Open the product
path = os.path.join(folder, "S2B_MSIL2A_20210517T103619_N7990_R008_T30QVE_20211004T113819.SAFE")
prod = reader.open(path)
Time the RAW method#
The RAW
method is simple: just open the given tile with no pixel processing.
%%timeit
prod.load(
GREEN,
pixel_size=20.,
**{CLEAN_OPTICAL: CleanMethod.RAW}
)
prod.clean_tmp()
4.79 s ± 262 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Time the NODATA method#
Only the detector nodata is processed by the NODATA
method.
The bands will be set to nodata
outside of the detector footprint (instead of keeping the raw nodata value)
%%timeit
prod.load(
GREEN,
pixel_size=20.,
**{CLEAN_OPTICAL: CleanMethod.NODATA}
)
prod.clean_tmp()
5.57 s ± 136 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Time the CLEAN method#
Every defective pixel given by the provider by the CLEAN
method.
These pixels will be set to nodata
.
%%timeit
prod.load(
GREEN,
pixel_size=20.,
**{CLEAN_OPTICAL: CleanMethod.CLEAN}
)
prod.clean_tmp()