Support Pure Multi-spectral Images
This MR got a little bigger than I originally anticipated, but I think that's ok given that its mostly docs and bugfixes that snuck in.
I wanted to provide motivation for the design decisions I made when adding support for multispectral imagery, and thus I started this document (included in this MR):
The Problem
After reading this document you should have some understanding of design decisions in the existing kwcoco 0.1.12 data structure.
This MR will extend the kwcoco schema to better handle "pure" multi-spectral image sources such as Sentinel 2 data. The use-case is where an "image" is broken up into several files, each containing its own band. For example a single S2 observation may include the following files:
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B01.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B02.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B03.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B04.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B05.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B06.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B07.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B08.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B8A.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B09.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B10.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B11.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B12.tif \
For reference the band info is:
Sentinal 2 Band Table
=====================
Band Resolution Central Wavelength Description
B1 60 m 443 nm Ultra blue (Coastal and Aerosol)
B2 10 m 490 nm Blue
B3 10 m 560 nm Green
B4 10 m 665 nm Red
B5 20 m 705 nm Visible and Near Infrared (VNIR)
B6 20 m 740 nm Visible and Near Infrared (VNIR)
B7 20 m 783 nm Visible and Near Infrared (VNIR)
B8 10 m 842 nm Visible and Near Infrared (VNIR)
B8a 20 m 865 nm Visible and Near Infrared (VNIR)
B9 60 m 940 nm Short Wave Infrared (SWIR)
B10 60 m 1375 nm Short Wave Infrared (SWIR)
B11 20 m 1610 nm Short Wave Infrared (SWIR)
B12 20 m 2190 nm Short Wave Infrared (SWIR)
To incorporate this to the existing kwcoco schema in the most natural way possible, I chose to use the existing "auxiliary" field that I had been using in VIAME for storing additional data like disparity maps, that are associated with the main image. I had always planned to use "auxiliary" to handle multi-spectral imagery in the future, but in hindsight the name might be slightly awkward. That being said bear with me, and I think the rest of the design is decent.
Recap of 0.1.12 Image Spec
To recap lets look at what already exists. For each COCO image, you might have some auxiliary information, so maybe an example image dictionary looks like this (in fact we can use the kwcoco dummy data to generate an example):
>>> import kwcoco
>>> import ubelt as ub
>>> img = kwcoco.CocoDataset.demo('vidshapes1-aux').imgs[1]
>>> print(ub.repr2(img, nl=3))
{
'auxiliary': [
{
'channels': 'disparity',
'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-aux_jplbnutgqueqmv/_assets/aux/aux_disparity/img_00001.tif',
},
{
'channels': 'flowx|flowy',
'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-aux_jplbnutgqueqmv/_assets/aux/aux_flowx|flowy/img_00001.tif',
},
],
'channels': 'rgb',
'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-aux_jplbnutgqueqmv/_assets/images/img_00001.png',
'frame_index': 0,
'height': 600,
'id': 1,
'video_id': 1,
'width': 600,
}
This indicates that the image _assets/images/img_00001.png
has two auxiliary files, one indicates a disparity image and another indicates 2D flow vectors.
Issues with that Spec
Using this existing structure we can't quite handle the multi-spectral case for these reasons:
- There is no concept of a "main" image.
To handle the case of "no main image", I made the decision that "file_name" is now allowed to be None. However, that causes a problem. The existing fast lookup table dset.index.file_name_to_img
requires that an image has a file name. After thinking about it for awhile I realize that "file_name" is really not the best reverse lookup index anyway. This is because when you "reroot" a dataset the "file_name" attribute changes relative to your "bundle directory".
Instead what I propose to do is add a new property to each image called "name". Moving forward it will be recommended that each image specify a unique "name" in addition to a "file_name", that does not change when a dataset is rerooted. While this might often be redundant with "file_name", this does give the user flexibility to modify files names (say by pointing to the raw data in a dvc cache?) will still preserving a semantically meaningful "key" to reference the underlying image.
In this MR I add the "name" to the image spec, and I create the corresponding dset.index.name_to_img
lookup table. Images can choose to not have names (in which case the image will not be referenced in that index), but if they do have names, they must be unique. I also had to add some logic to CocoDataset.union
to handle name conflicts when merging datasets. At the moment its a band-aid solution, but it does not lock us into anything that can't be improved in the future.
- Different bands in MS images can have different resolutions.
In S2 data the different bands might have different resolutions. More generally bands might be slightly misaligned with one another. Towards this end, I've added a "width", "height", and "transformation" field to each auxiliary item in the json spec. The first two simply specify the size of the auxiliary image. The "transform" field is some structure that warps "annotation coordinates" into the space of a particular auxiliary band (currently I made up a json-spec for a general 2D affine transform). Note, exactly what the "annotation coordinates" are is slightly weird, because there is no "main image file", so we can't accurately say they are in "main image pixel coordinates". However, this is just a technicality. Even though there isn't a main image, we can still assign the main image a "height" and "width" which specifies the window the annotations can operate in. Then each "transform" just needs to respect this. Typically this will just be the size of the largest auxiliary image, and that particular auxiliary image will be given the identity transform. By inverting all of the transforms and projecting the data onto a canvas with this "main height/width", we can create a HxWxC matrix that can be passed to a CV algorithm.
Proposed Spec
I've extended the demo data to generate something inspired by sentinal 2, and this should give a rough idea of what this new spec involves.
>>> import kwcoco
>>> import ubelt as ub
>>> img = kwcoco.CocoDataset.demo('vidshapes1-multispectral').imgs[1]
>>> print(ub.repr2(img, nl=3))
{
'id': 1,
'file_name': None,
'width': 600,
'height': 600,
'frame_index': 0,
'video_id': 1,
'name': 'generated-1-0',
'auxiliary': [
{
'channels': 'B1',
'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B1/img_00001.tif',
'height': 600,
'transform': {'matrix': [[1.0, 0, 0], [0, 1.0, 0], [0, 0, 1]], 'type': 'affine'},
'width': 600,
},
{
'channels': 'B8',
'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B8/img_00001.tif',
'height': 100,
'transform': {'matrix': [[0.16666666666666666, 0, 0], [0, 0.16666666666666666, 0], [0, 0, 1]], 'type': 'affine'},
'width': 100,
},
{
'channels': 'B8a',
'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B8a/img_00001.tif',
'height': 200,
'transform': {'matrix': [[0.3333333333333333, 0, 0], [0, 0.3333333333333333, 0], [0, 0, 1]], 'type': 'affine'},
'width': 200,
},
{
'channels': 'B10',
'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B10/img_00001.tif',
'height': 600,
'transform': {'matrix': [[1.0, 0, 0], [0, 1.0, 0], [0, 0, 1]], 'type': 'affine'},
'width': 600,
},
{
'channels': 'B11',
'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B11/img_00001.tif',
'height': 200,
'transform': {'matrix': [[0.3333333333333333, 0, 0], [0, 0.3333333333333333, 0], [0, 0, 1]], 'type': 'affine'},
'width': 200,
},
],
}
Notice how the main image does not have a file_name, but it does have a "name", "height", and "width". Note that I can also image leaving the 'transform' field unpopulated, but then give it the ability to populate itself with geotiff header metadata if it exists.
TODO list
-
Add name to the schema -
Implement the name_to_img lookup index. -
Implement "multispectral" toydata that only contains auxiliary channels -
Test the new name_to_img lookup index. -
Handle kwcoco show
? - (probably wont do this here)