Bounding Box Head for MultimodalTransformer
The Dataloader is able to generate bounding boxes, so the main task is to add a head to the geowatch.tasks.fusion.methods.channelwise_transformer.MultimodalTransformer class (similar to how class / saliency heads are implemented) and hook up the forward pass so the predicted boxes are fed to a loss function. I have a proof of concept for this in geowatch/dev/poc/devcheck_mmdet_head.py This uses the mmdetection library to leverage existing standardized work