Skip to content

Add hierarchial pretraining algorithm

Colorado Reed requested to merge cjrd-add-hpt-algo into master

This PR adds the hierarchical pretraining (HPT) capabilities, where the pipeline looks something like: pretrained source_network => additional [self-supervised] pretraining on {source, source+target, source_then_target, target} => domain_adaptation.

HPT operates by calling out to a child process (using torch distributed), and in order to do this, it dynamically generates the HPT configs, writes them to disk, writes the source network to disk, writes the appropriate data lists to disk, then executes the HPT process, then rereads the updated network weights from disk and applies them back to the source network.

I followed BU-NLP as much as I could, in terms of the "correct" way to do this.

I'm still doing some testing across the various ways to use the LEARN pipeline, but I think the PR is in good enough shape for initial review =)

Edited by Christopher Funk

Merge request reports