Access arbitrary raster metadata from STAC
The goal of this task is to extend geowatch.cli.prepare_ta2_dataset such it is easier to configure which raster bands we need to pull from STAC.
The relevant files for the STAC part of the pipeline are:
- geowatch.stac.stac_search_builder - Helper file that lets us construct a STAC search
- geowatch.cli.stac_search - performs the basic STAC search and produces a ".input" file.
- geowatch.cli.baseline_framework_ingress - Resolves item references in the ".input" file to produce a STAC Catalog which contains links to available rasters and metadata
- geowatch.cli.stac_to_kwcoco - Finalizes the conversion from STAC to kwcoco. Note the output kwcoco here usually references remote images via GDAL's vsi feature.
Given a STAC catalog json file, relevant items will have an "asset" field, which might look like this:
{"assets": {
"quality": {
"href": "s3://path/to/remote/rasterQA.tif",
"type": "image/tiff; application=geotiff",
"title": "Quality Assurance Band",
"description": "Quality assurance bitmask image"
},
"dsm": {
"href": "s3://path/to/remote/rasterDSM.tif",
"type": "image/tiff; application=geotiff",
"title": "DSM",
"description": "Digital surface model used for orthorectification"
},
"visual": {
"href": "s3://path/to/remote/rasterTCI.tif",
"type": "image/tiff; application=geotiff",
"title": "True Color Image",
"description": "True color image (RGB), for visualization"
},
"camera": {
"href": "s3://path/to/remote/rasterCAM.json",
"type": "application/json",
"title": "Camera Model",
"description": "Composite RPC camera model used for orthorectification"
},
"B01": {
"href": "s3://path/to/remote/rasterB01.tif",
"type": "image/tiff; application=geotiff",
"title": "B01 - CoastalBlue",
"eo:bands": [ { "name": "B01", "center_wavelength": 426, "common_name": "coastal" }
]
},
"B02": {
"href": "s3://path/to/remote/rasterB02.tif",
"type": "image/tiff; application=geotiff",
"title": "B02 - Blue",
"eo:bands": [ { "name": "B02", "center_wavelength": 481, "common_name": "blue" }
]
},
"B03": {
"href": "s3://path/to/remote/rasterB03.tif",
"type": "image/tiff; application=geotiff",
"title": "B03 - Green",
"eo:bands": [ { "name": "B03", "center_wavelength": 547, "common_name": "green" }
]
},
"B04": {
"href": "s3://path/to/remote/rasterB04.tif",
"type": "image/tiff; application=geotiff",
"title": "B04 - Yellow",
"eo:bands": [ { "name": "B04", "center_wavelength": 605, "common_name": "yellow" }
]
},
"B05": {
"href": "s3://path/to/remote/rasterB05.tif",
"type": "image/tiff; application=geotiff",
"title": "B05 - Red",
"eo:bands": [ { "name": "B05", "center_wavelength": 661, "common_name": "red" }
]
},
"B06": {
"href": "s3://path/to/remote/rasterB06.tif",
"type": "image/tiff; application=geotiff",
"title": "B06 - RedEdge",
"eo:bands": [ { "name": "B06", "center_wavelength": 724 } ]
},
"B07": {
"href": "s3://path/to/remote/rasterB07.tif",
"type": "image/tiff; application=geotiff",
"title": "B07 - NIR1",
"eo:bands": [ { "name": "B07", "center_wavelength": 832, "common_name": "nir08" }
]
},
"B08": {
"href": "s3://path/to/remote/rasterB08.tif",
"type": "image/tiff; application=geotiff",
"title": "B08 - NIR2",
"eo:bands": [ { "name": "B08", "center_wavelength": 948, "common_name": "nir09" } ]
}
}
}
We want to ensure that kwcoco-to-geojson grabs the "dsm" band in this instance. Ideally, the stac_to_kwcoco CLI makes it easy for the user to specify which bands they are interested in (perhaps with a sensorchan spec). (Now that I'm looking at it, the kwcoco file may contain all assets from the STAC item, but we should verify this. If that is the case, then getting the DSM data would be part of the coco-align stage of the data pipeline).
There is example code that walks through the steps of a STAC query to the STAC-to-KWCOCO conversion into a remote dataset and then cropping/alignment into a local dataset. There were a few things that were broken in it, which I've fixed on dev/0.14.3
, so work off of that branch.
The example generates its own demodata as the KHQ building. Here are the basic steps:
# Create a demo region file
xdoctest geowatch.demo.demo_region demo_khq_region_fpath
DATASET_SUFFIX=DemoKHQ-2022-06-10-V2
DEMO_DPATH=$HOME/.cache/geowatch/demo/datasets
REGION_FPATH="$HOME/.cache/geowatch/demo/annotations/KHQ_R001.geojson"
SITE_GLOBSTR="$HOME/.cache/geowatch/demo/annotations/KHQ_R001_sites/*.geojson"
START_DATE=$(jq -r '.features[] | select(.properties.type=="region") | .properties.start_date' "$REGION_FPATH")
END_DATE=$(jq -r '.features[] | select(.properties.type=="region") | .properties.end_date' "$REGION_FPATH")
# Shrink time window to test with less data
START_DATE=2016-12-02
END_DATE=2020-12-31
REGION_ID=$(jq -r '.features[] | select(.properties.type=="region") | .properties.region_id' "$REGION_FPATH")
SEARCH_FPATH=$DEMO_DPATH/stac_search.json
RESULT_FPATH=$DEMO_DPATH/all_sensors_kit/${REGION_ID}.input
CATALOG_FPATH=$DEMO_DPATH/all_sensors_kit/${REGION_ID}_catalog.json
KWCOCO_FPATH=$DEMO_DPATH/all_sensors_kit/${REGION_ID}.kwcoco.zip
KWCOCO_FIELDED_FPATH=$DEMO_DPATH/all_sensors_kit/${REGION_ID}-fielded.kwcoco.zip
KWCOCO_ALIGNED_DPATH=$DEMO_DPATH/all_sensors_kit/cropped/
KWCOCO_ALIGNED_FPATH=$DEMO_DPATH/all_sensors_kit/cropped/${REGION_ID}-fielded.kwcoco.zip
mkdir -p "$DEMO_DPATH"
# Create the search json wrt the sensors and processing level we want
python -m geowatch.stac.stac_search_builder \
--start_date="$START_DATE" \
--end_date="$END_DATE" \
--cloud_cover=40 \
--sensors=sentinel-s2-l2a-cogs \
--out_fpath "$SEARCH_FPATH"
cat "$SEARCH_FPATH"
# Delete this to prevent duplicates
rm -f "$RESULT_FPATH"
# Create the .input file
# use max_products_per_region to keep the result small
python -m geowatch.cli.stac_search \
--region_file "$REGION_FPATH" \
--search_json "$SEARCH_FPATH" \
--mode area \
--verbose 2 \
--max_products_per_region 10 \
--outfile "${RESULT_FPATH}"
cat "$RESULT_FPATH"
python -m geowatch.cli.baseline_framework_ingress \
--input_path="$RESULT_FPATH" \
--catalog_fpath="${CATALOG_FPATH}" \
--virtual=True \
--jobs=avail \
--aws_profile=iarpa \
--requester_pays=0
AWS_DEFAULT_PROFILE=iarpa python -m geowatch.cli.stac_to_kwcoco \
--input_stac_catalog="${CATALOG_FPATH}" \
--outpath="$KWCOCO_FPATH" \
--jobs=8 \
--from_collated=False \
--ignore_duplicates=0
# Check that the resulting kwcoco has what you want in it
geowatch stats "$KWCOCO_FPATH"
# Use kwcoco info to dump a single image dictionary
kwcoco info "$KWCOCO_FPATH" -g 1 -i 0
# Prefetch header metadata from remote assets
AWS_DEFAULT_PROFILE=iarpa python -m geowatch.cli.coco_add_watch_fields \
--src="$KWCOCO_FPATH" \
--dst="$KWCOCO_FIELDED_FPATH" \
--overwrite=warp \
--workers=8 \
--enable_video_stats=False \
--target_gsd=10 \
--remove_broken=True \
--skip_populate_errors=False
# Use kwcoco info to see what info fielding added
kwcoco info "$KWCOCO_FPATH" -g 1 -i 0
# Perform the crop to create an aligned dataset with videos
AWS_DEFAULT_PROFILE=iarpa python -m geowatch.cli.coco_align \
--regions "$REGION_FPATH" \
--context_factor=1 \
--geo_preprop=auto \
--keep=img \
--force_nodata=None \
--include_channels="None" \
--exclude_channels="None" \
--visualize=False \
--debug_valid_regions=False \
--rpc_align_method orthorectify \
--sensor_to_time_window "None" \
--verbose=0 \
--aux_workers=0 \
--target_gsd=10 \
--force_min_gsd=None \
--workers=26 \
--tries=2 \
--asset_timeout=4hours \
--image_timeout=8hours \
--hack_lazy=False \
--src="$KWCOCO_FIELDED_FPATH" \
--dst="$KWCOCO_FIELDED_FPATH" \
--dst_bundle_dpath=$KWCOCO_ALIGNED_DPATH
I ran through this example myself, and I think it will try to extract the dsms, but it might break. So I suppose the task is to verify that this works with DSMs and fix it if it doesn't.