DAS data management: exercises
Section outline
-
-
This notebook provides a comprehensive tutorial on using the
boto3library in Python to access and download data from S3-compatible object storage.Steps covered:
- Installation & Imports: Setting up
boto3and configuring it forUNSIGNEDaccess, which allows you to retrieve public data without needing AWS credentials. - Resource vs. Client: Demonstrating the two ways to interact with S3:
- Resource: A higher-level, object-oriented interface (e.g., using
s3r.Bucket). - Client: A lower-level interface that maps closely to the actual service API (e.g., using
s3c.list_objects).
- Resource: A higher-level, object-oriented interface (e.g., using
- Data Exploration: How to list buckets, iterate through objects, and inspect metadata like file sizes and keys.
- File Operations: Practical examples of downloading files (like
README.txtand3u2023.json) directly to the local environment and reading their contents.
The examples specifically use the GFZ Potsdam S3 endpoint to explore DAS datasets.
- Installation & Imports: Setting up
-
This notebook demonstrates a workflow for accessing and validating Distributed Acoustic Sensing (DAS) metadata.
Steps covered:
- Environment Setup: Installation of
boto3(for S3 access) andjsonschema(for data validation); - S3 Data Retrieval: Configures a
boto3client to access a public S3 endpoint (GFZ Potsdam) using unsigned requests. It downloads a metadata file named3u2023.json - Schema Fetching: Uses the
requestslibrary to fetch the official FDSN DAS-Metadata JSON schema (v2.0) from GitHub; - Validation:
- Initial validation of the downloaded metadata against the schema to ensure compliance.
- A demonstration of how schema validation works by intentionally
introducing a type error (changing an array to a string) and showing the
resulting
ValidationError
- Environment Setup: Installation of
-
This notebook demonstrates how to load data from an HDF5 file, convert it to a Zarr array, and then visualise it.
Steps covered:
- Setting up: Installing the zarr library and importing necessary packages;
- Data Loading: Downloading an HDF5 data file from a URL and loading it into an in-memory buffer;
- HDF5 Inspection: Opening the HDF5 file and inspecting its 'data' group and 'header' metadata to extract important parameters like time, channels, and units;
- Zarr Conversion: Creating a
zarr.storage.MemoryStoreand converting the HDF5 data into an Xarray Dataset, which is then saved to the Zarr store with specified chunking; - Zarr Access and Visualization: Demonstrating how to
open and access the Zarr array directly or as an Xarray Dataset. Finally, it visualises slices of the data using
matplotlib.pyplot.imshow, showing the strain rate over time and channel.
-