Section outline

    • This notebook provides a comprehensive tutorial on using the boto3 library in Python to access and download data from S3-compatible object storage.

      Steps covered:

      1. Installation & Imports: Setting up boto3 and configuring it for UNSIGNED access, which allows you to retrieve public data without needing AWS credentials.
      2. Resource vs. Client: Demonstrating the two ways to interact with S3:
        • Resource: A higher-level, object-oriented interface (e.g., using s3r.Bucket).
        • Client: A lower-level interface that maps closely to the actual service API (e.g., using s3c.list_objects).
      3. Data Exploration: How to list buckets, iterate through objects, and inspect metadata like file sizes and keys.
      4. File Operations: Practical examples of downloading files (like README.txt and 3u2023.json) directly to the local environment and reading their contents.

      The examples specifically use the GFZ Potsdam S3 endpoint to explore DAS datasets.

    • This notebook demonstrates a workflow for accessing and validating Distributed Acoustic Sensing (DAS) metadata. 

      Steps covered:

      1. Environment Setup: Installation of boto3 (for S3 access) and jsonschema (for data validation);
      2. S3 Data Retrieval: Configures a boto3 client to access a public S3 endpoint (GFZ Potsdam) using unsigned requests. It downloads a metadata file named 3u2023.json
      3. Schema Fetching: Uses the requests library to fetch the official FDSN DAS-Metadata JSON schema (v2.0) from GitHub;
      4. Validation:
        • Initial validation of the downloaded metadata against the schema to ensure compliance.
        • A demonstration of how schema validation works by intentionally introducing a type error (changing an array to a string) and showing the resulting ValidationError

    • This notebook demonstrates how to load data from an HDF5 file, convert it to a Zarr array, and then visualise it.  

      Steps covered:

      1. Setting up: Installing the zarr library and importing necessary packages;
      2. Data Loading: Downloading an HDF5 data file from a URL and loading it into an in-memory buffer;
      3. HDF5 Inspection: Opening the HDF5 file and inspecting its 'data' group and 'header' metadata to extract important parameters like time, channels, and units;
      4. Zarr Conversion: Creating a zarr.storage.MemoryStore and converting the HDF5 data into an Xarray Dataset, which is then saved to the Zarr store with specified chunking;
      5. Zarr Access and Visualization: Demonstrating how to open and access the Zarr array directly or as an Xarray Dataset. Finally, it visualises slices of the data using matplotlib.pyplot.imshow, showing the strain rate over time and channel.