Page 1 of 1

Transfer learning (load weights) fails with same model

Posted: Mon Nov 08, 2021 10:08 am
by acaint

Hello,

after training a model very far using faces A to B, I wished to see if I could give a boost for C to B with transfer learning.
I loaded the project of A to B, and only changed the input from A to C and the model directory.

Then I set the Load Weights to point the model directory of A to B.

The training could not start, however, due to error
11/08/2021 12:00:42 CRITICAL Error caught! Exiting...
11/08/2021 12:00:42 ERROR Caught exception in thread: '_training_0'
11/08/2021 12:00:42 ERROR You are attempting to load weights from a 'dfl_sae_df' model into a 'dfl_sae' model. This is not supported.

I wish to point out that both model configurations are exactly the same, DF as architecture in both.

Any instructions? Thanks in advance.


Re: Transfer learning (load weights) fails with same model

Posted: Mon Nov 08, 2021 11:47 am
by torzdf

Possibly an oversight on my part re: the dfl-sae model. Is it possible for you to zip up and share your model for me to analyze? You can link it to me via private message


Re: Transfer learning (load weights) fails with same model

Posted: Mon Nov 08, 2021 3:27 pm
by acaint

Hi, thanks for quick reply. The model file is pretty large, over 2GB h5 file, and does not zip much smaller.

I'll try if I can reproduce the issue with smaller config.

Here is the dfl_sae_state.json , I am not sure if it is helpful.

Code: Select all

{
  "name": "dfl_sae",
  "sessions": {
    "1": {
      "timestamp": 1636302930.190047,
      "no_logs": false,
      "loss_names": [
        "total",
        "face_a_0",
        "face_a_1",
        "face_a_2",
        "mask_a",
        "face_b_0",
        "face_b_1",
        "face_b_2",
        "mask_b"
      ],
      "batchsize": 16,
      "iterations": 9501,
      "config": {
        "learning_rate": 5e-05,
        "epsilon_exponent": -7,
        "allow_growth": true,
        "nan_protection": true,
        "convert_batchsize": 4,
        "eye_multiplier": 3,
        "mouth_multiplier": 2,
        "clipnorm": true
      }
    },
    "2": {
      "timestamp": 1636313397.1599169,
      "no_logs": false,
      "loss_names": [
        "total",
        "face_a_0",
        "face_a_1",
        "face_a_2",
        "mask_a",
        "face_b_0",
        "face_b_1",
        "face_b_2",
        "mask_b"
      ],
      "batchsize": 16,
      "iterations": 35532,
      "config": {
        "learning_rate": 5e-05,
        "epsilon_exponent": -7,
        "allow_growth": true,
        "nan_protection": true,
        "convert_batchsize": 4,
        "eye_multiplier": 3,
        "mouth_multiplier": 2,
        "clipnorm": true
      }
    }
  },
  "lowest_avg_loss": {
    "a": 0.06255769795097876,
    "b": 0.06906740904378239
  },
  "iterations": 45033,
  "config": {
    "centering": "face",
    "coverage": 90.0,
    "optimizer": "adam",
    "learning_rate": 5e-05,
    "epsilon_exponent": -7,
    "allow_growth": true,
    "mixed_precision": true,
    "nan_protection": true,
    "convert_batchsize": 16,
    "loss_function": "ssim",
    "mask_loss_function": "mse",
    "l2_reg_term": 100,
    "eye_multiplier": 3,
    "mouth_multiplier": 2,
    "penalized_mask_loss": true,
    "mask_type": "bisenet-fp_face",
    "mask_blur_kernel": 3,
    "mask_threshold": 4,
    "learn_mask": true,
    "input_size": 256,
    "clipnorm": true,
    "architecture": "df",
    "autoencoder_dims": 640,
    "encoder_dims": 64,
    "decoder_dims": 32,
    "multiscale_decoder": true
  }
}

Re: Transfer learning (load weights) fails with same model

Posted: Tue Nov 09, 2021 12:33 pm
by torzdf

This should be fixed in the latest update. Please try updating.


Re: Transfer learning (load weights) fails with same model

Posted: Wed Nov 10, 2021 11:37 am
by torzdf

I had to roll back this fix. Will look for a different way. See here for reference:

viewtopic.php?f=6&t=1805


Re: Transfer learning (load weights) fails with same model

Posted: Wed Nov 10, 2021 6:38 pm
by acaint

Hi,

yes, thank you for quick action.
I was able to get it working by starting training with the transfer learning. It then crashed to 1 iteration as described in the other thread. But when I made copies of the generated 1 iteration model and its json, renamed them to old filenames, then it could proceed again.

However the graph and analysis -pages got frozen. But I use tensorboard to get over that.

In any case, the attempt is very much appreciated. I got the transfer learning working and the model is now training with some file name shenanigans.

2021-11-09 21:01 2 477 014 592 dfl_sae.h5 <-- same file as dfl_sae_df.h5 but renamed
2021-11-10 20:31 2 477 014 592 dfl_sae_df.h5
2021-11-10 19:22 2 477 014 592 dfl_sae_df.h5.bk
2021-11-10 20:31 2 170 dfl_sae_df_state.json
2021-11-10 19:22 2 169 dfl_sae_df_state.json.bk
2021-11-09 21:01 1 564 dfl_sae_state.json <-- same file as dfl_sae_df_state.json but renamed

It seems to only have them at start, it does not write to them...


Re: Transfer learning (load weights) fails with same model

Posted: Sun Nov 14, 2021 12:13 pm
by torzdf

Ok, this should now be fixed in a way which doesn't break backwards compatibility.


Re: Transfer learning (load weights) fails with same model

Posted: Thu Nov 18, 2021 6:46 pm
by acaint

Thank you, it is working brilliantly! =)