Support Pure Multi-spectral Images (!18) · Merge requests · computer-vision / kwcoco

Jon Crall requested to merge dev/image_name into dev/0.1.13 Mar 24, 2021

This MR got a little bigger than I originally anticipated, but I think that's ok given that its mostly docs and bugfixes that snuck in.

I wanted to provide motivation for the design decisions I made when adding support for multispectral imagery, and thus I started this document (included in this MR):

https://gitlab.kitware.com/computer-vision/kwcoco/-/blob/dev/image_name/docs/source/getting_started.md

The Problem

After reading this document you should have some understanding of design decisions in the existing kwcoco 0.1.12 data structure.

This MR will extend the kwcoco schema to better handle "pure" multi-spectral image sources such as Sentinel 2 data. The use-case is where an "image" is broken up into several files, each containing its own band. For example a single S2 observation may include the following files:

imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B01.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B02.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B03.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B04.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B05.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B06.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B07.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B08.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B8A.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B09.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B10.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B11.tif \
imgs_1/S2A_OPER_MSI_L1C_TL_MTI__20160120T104345_A003020_T39QZG_B12.tif \

For reference the band info is:

    Sentinal 2 Band Table
    =====================
    Band    Resolution    Central Wavelength    Description
    B1            60 m                443 nm    Ultra blue (Coastal and Aerosol)
    B2            10 m                490 nm    Blue
    B3            10 m                560 nm    Green
    B4            10 m                665 nm    Red
    B5            20 m                705 nm    Visible and Near Infrared (VNIR)
    B6            20 m                740 nm    Visible and Near Infrared (VNIR)
    B7            20 m                783 nm    Visible and Near Infrared (VNIR)
    B8            10 m                842 nm    Visible and Near Infrared (VNIR)
    B8a           20 m                865 nm    Visible and Near Infrared (VNIR)
    B9            60 m                940 nm    Short Wave Infrared (SWIR)
    B10           60 m               1375 nm    Short Wave Infrared (SWIR)
    B11           20 m               1610 nm    Short Wave Infrared (SWIR)
    B12           20 m               2190 nm    Short Wave Infrared (SWIR)

To incorporate this to the existing kwcoco schema in the most natural way possible, I chose to use the existing "auxiliary" field that I had been using in VIAME for storing additional data like disparity maps, that are associated with the main image. I had always planned to use "auxiliary" to handle multi-spectral imagery in the future, but in hindsight the name might be slightly awkward. That being said bear with me, and I think the rest of the design is decent.

Recap of 0.1.12 Image Spec

To recap lets look at what already exists. For each COCO image, you might have some auxiliary information, so maybe an example image dictionary looks like this (in fact we can use the kwcoco dummy data to generate an example):

>>> import kwcoco
>>> import ubelt as ub
>>> img = kwcoco.CocoDataset.demo('vidshapes1-aux').imgs[1]
>>> print(ub.repr2(img, nl=3))
{
    'auxiliary': [
        {
            'channels': 'disparity',
            'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-aux_jplbnutgqueqmv/_assets/aux/aux_disparity/img_00001.tif',
        },
        {
            'channels': 'flowx|flowy',
            'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-aux_jplbnutgqueqmv/_assets/aux/aux_flowx|flowy/img_00001.tif',
        },
    ],
    'channels': 'rgb',
    'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-aux_jplbnutgqueqmv/_assets/images/img_00001.png',
    'frame_index': 0,
    'height': 600,
    'id': 1,
    'video_id': 1,
    'width': 600,
}

This indicates that the image _assets/images/img_00001.png has two auxiliary files, one indicates a disparity image and another indicates 2D flow vectors.

Issues with that Spec

Using this existing structure we can't quite handle the multi-spectral case for these reasons:

There is no concept of a "main" image.

To handle the case of "no main image", I made the decision that "file_name" is now allowed to be None. However, that causes a problem. The existing fast lookup table dset.index.file_name_to_img requires that an image has a file name. After thinking about it for awhile I realize that "file_name" is really not the best reverse lookup index anyway. This is because when you "reroot" a dataset the "file_name" attribute changes relative to your "bundle directory".

Instead what I propose to do is add a new property to each image called "name". Moving forward it will be recommended that each image specify a unique "name" in addition to a "file_name", that does not change when a dataset is rerooted. While this might often be redundant with "file_name", this does give the user flexibility to modify files names (say by pointing to the raw data in a dvc cache?) will still preserving a semantically meaningful "key" to reference the underlying image.

In this MR I add the "name" to the image spec, and I create the corresponding dset.index.name_to_img lookup table. Images can choose to not have names (in which case the image will not be referenced in that index), but if they do have names, they must be unique. I also had to add some logic to CocoDataset.union to handle name conflicts when merging datasets. At the moment its a band-aid solution, but it does not lock us into anything that can't be improved in the future.

Different bands in MS images can have different resolutions.

In S2 data the different bands might have different resolutions. More generally bands might be slightly misaligned with one another. Towards this end, I've added a "width", "height", and "transformation" field to each auxiliary item in the json spec. The first two simply specify the size of the auxiliary image. The "transform" field is some structure that warps "annotation coordinates" into the space of a particular auxiliary band (currently I made up a json-spec for a general 2D affine transform). Note, exactly what the "annotation coordinates" are is slightly weird, because there is no "main image file", so we can't accurately say they are in "main image pixel coordinates". However, this is just a technicality. Even though there isn't a main image, we can still assign the main image a "height" and "width" which specifies the window the annotations can operate in. Then each "transform" just needs to respect this. Typically this will just be the size of the largest auxiliary image, and that particular auxiliary image will be given the identity transform. By inverting all of the transforms and projecting the data onto a canvas with this "main height/width", we can create a HxWxC matrix that can be passed to a CV algorithm.

Proposed Spec

I've extended the demo data to generate something inspired by sentinal 2, and this should give a rough idea of what this new spec involves.

>>> import kwcoco
>>> import ubelt as ub
>>> img = kwcoco.CocoDataset.demo('vidshapes1-multispectral').imgs[1]
>>> print(ub.repr2(img, nl=3))
{
    'id': 1,
    'file_name': None,
    'width': 600,
    'height': 600,
    'frame_index': 0,
    'video_id': 1,
    'name': 'generated-1-0',
    'auxiliary': [
        {
            'channels': 'B1',
            'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B1/img_00001.tif',
            'height': 600,
            'transform': {'matrix': [[1.0, 0, 0], [0, 1.0, 0], [0, 0, 1]], 'type': 'affine'},
            'width': 600,
        },
        {
            'channels': 'B8',
            'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B8/img_00001.tif',
            'height': 100,
            'transform': {'matrix': [[0.16666666666666666, 0, 0], [0, 0.16666666666666666, 0], [0, 0, 1]], 'type': 'affine'},
            'width': 100,
        },
        {
            'channels': 'B8a',
            'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B8a/img_00001.tif',
            'height': 200,
            'transform': {'matrix': [[0.3333333333333333, 0, 0], [0, 0.3333333333333333, 0], [0, 0, 1]], 'type': 'affine'},
            'width': 200,
        },
        {
            'channels': 'B10',
            'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B10/img_00001.tif',
            'height': 600,
            'transform': {'matrix': [[1.0, 0, 0], [0, 1.0, 0], [0, 0, 1]], 'type': 'affine'},
            'width': 600,
        },
        {
            'channels': 'B11',
            'file_name': '/home/joncrall/.cache/kwcoco/demo_vidshapes/vidshapes1-multispectral_qjooadgqnnbjbl/_assets/aux/aux_B11/img_00001.tif',
            'height': 200,
            'transform': {'matrix': [[0.3333333333333333, 0, 0], [0, 0.3333333333333333, 0], [0, 0, 1]], 'type': 'affine'},
            'width': 200,
        },
    ],
}

Notice how the main image does not have a file_name, but it does have a "name", "height", and "width". Note that I can also image leaving the 'transform' field unpopulated, but then give it the ability to populate itself with geotiff header metadata if it exists.

TODO list

Add name to the schema
Implement the name_to_img lookup index.
Implement "multispectral" toydata that only contains auxiliary channels
Test the new name_to_img lookup index.
Handle kwcoco show? - (probably wont do this here)

Edited Mar 28, 2021 by Matthew Bernstein

Support Pure Multi-spectral Images

The Problem

Recap of 0.1.12 Image Spec

Issues with that Spec

Proposed Spec

Merge request reports