Downloading time series from Theia can take a while, depending on the area and the time period covered by the images. Also, it consumes a lot of storage locally, since all the archives in .zip format have to be downloaded (the GeoDataHub will probably distribute Cloud Optimized Geotifs). From my experience, a lot of people download an archive, extract the files, keep the files they need (or sometimes, compute what they want from the product, and store only the compressed result), delete the archive, then download the next archive. This approach is not very optimal!

Theia-picker is a small python package enabling to download archives, or individual files from the remote archive. When individual files are downloaded, only the bytes relative to the compressed file in the remote archive are downloaded. Then they are decompressed and written as the file. This is particularly interesting when only a few files are needed. No need to download the entire archive! Only the bytes for the requested files are downloaded. This should improve workflows that download the products archives just to grab 3 or 4 spectral bands…

To ensure that the downloads are correctly performed, theia-picker computes checksums (MD5 for the archives, CRC32 for the extracted individual files). When files checksums don’t match with the expected version, they are downloaded again.

How is it done? Compressed zip archives include information about their contents in a data block at the end of the file called the Central Directory [1]. From this data block, all the compressed files information can be retrieved.

To access this data block, one can use HTTP-range requests. These requests are HTTP GET with an additional header that specify a range of bytes to access {‘range’: ‘bytes=startend‘}. This is enough to retrieve the files information, and also download and decompress them.

At least, this is the theory… In practice, you can still try that with the Theia server: it won’t work very well! I don’t know why exactly, but that is why every single packages for remote zip retrieval fail: the server just closes the connection before sending all the requested bytes. Theia-picker’s workaround consist in always asking all bytes after using byte-range headers {‘range’: ‘bytes=start-‘} (without ‘end’, which still enforces the standard [2]) and closes the connection when the desired length of bytes is received.

Theia-picker is open-source (Licence Apache-2.0) and anyone can open a PR on github. Currenlty, it has not been extensively tested, and feedbacks are welcome. Also, the API is quite minimal (in particular for the search of products) and contributions are welcome!

An example copied from github’s readme.

Rémi Cresson @ INRAE




Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *

Ce site utilise Akismet pour réduire les indésirables. En savoir plus sur comment les données de vos commentaires sont utilisées.