[Guide] Introducing - Phaze-A

MaxHunter · Post by **MaxHunter** » Tue Feb 07, 2023 7:29 pm

I can't upload the .JSON.

I call this Phaze-A setting, "Max-512" (because it's about as far as you can go on a 24Gb card) with Mixed Precision turned on.

Size: 23.25Gb (according to System Output)

Batch: 1

Recommended Learning Rate: 4.625e-6/-7 (With MS_SSM@100; Logcosh@50; FFL@100; LPIPS_VGG16@25) E.G.s 2.3-ish

Alt Learning Rate: 6.25e-6/-7 (With MS_SSM@100; Logcosh@50)

Code: Select all


{
"output_size": 512,
  "shared_fc": "none",
  "enable_gblock": true,
  "split_fc": true,
  "split_gblock": false,
  "split_decoders": true,
  "enc_architecture": "efficientnet_v2_l",
  "enc_scaling": 100,
  "enc_load_weights": true,
  "bottleneck_type": "dense",
  "bottleneck_norm": "none",
  "bottleneck_size": 512,
  "bottleneck_in_encoder": true,
  "fc_depth": 1,
  "fc_min_filters": 1280,
  "fc_max_filters": 1280,
  "fc_dimensions": 8,
  "fc_filter_slope": -0.5,
  "fc_dropout": 0.0,
  "fc_upsampler": "upscale_hybrid",
  "fc_upsamples": 1,
  "fc_upsample_filters": 512,
  "fc_gblock_depth": 3,
  "fc_gblock_min_nodes": 512,
  "fc_gblock_max_nodes": 512,
  "fc_gblock_filter_slope": -0.5,
  "fc_gblock_dropout": 0.0,
  "dec_upscale_method": "upscale_hybrid",
  "dec_upscales_in_fc": 0,
  "dec_norm": "none",
  "dec_min_filters": 160,
  "dec_max_filters": 640,
  "dec_slope_mode": "full",
  "dec_filter_slope": -0.33,
  "dec_res_blocks": 1,
  "dec_output_kernel": 3,
  "dec_gaussian": true,
  "dec_skip_last_residual": false,
  "freeze_layers": "keras_encoder",
  "load_layers": "encoder",
  "fs_original_depth": 4,
  "fs_original_min_filters": 128,
  "fs_original_max_filters": 1024,
  "fs_original_use_alt": false,
  "mobilenet_width": 1.0,
  "mobilenet_depth": 1,
  "mobilenet_dropout": 0.001,
  "mobilenet_minimalistic": false,
  "__filetype": "faceswap_preset",
  "__section": "train|model|phaze_a"

}

Explanation and a few thoughts:

This was based on the STOJO setting with some of the @Icarus modifications, and added modifications by myself. It uses Efficienetv2 L @100, and instead of the Subpixel that Icarus likes to use, I used the Upscale Hybrid to save some VRAM.

The learning rate was based on a ratio formula suggested by @couleurs and @torzdf original 5e-5. If you would like to use it as a basis for you own Learning Rates it looks like this: ✓Batch Size ÷ 8 * 5 (Square root of your batch size divide by 8 times 5). So, in this instance, √1÷8x5=.625. .626e-5 or 6.25e-6.

I don't know if I'd have this peer reviewed but it worked for me. For further reading see viewtopic.php?t=2083&start=20.

After finding your base learning rate you then adjust by your percent difference of you egs. when adding different losses.

For example, when I added FFL and VGG16 (to my MS_SSIM & Logcosh learning rate) there was a rough 26% difference in egs, and I subtracted 26% from 6.25e-6, which is how I came about 4.625. I am not saying this is ideal, I am just saying it's stable for my rtx3090. It's possible you can up this learning rate. (Please report back if you've found better learning rates so everyone can benefit. )

Again, I'm not sure if this formula is something I'd bring to a PhD in computer science, but it worked for this Mathematically challenged person. Maybe it will help you.

I had G-Block split originally but turned it off due to "Googley Eyes". Let me know if you have the same problem, and/or how you fixed it.

As a comparison, it took me around 2.1 million iterations (over 9million e.g.s) with a slightly modified DNY512 w/fs original, with a (struggling) LR of 1e-5 to reach losses of: face_a: 0.05342 / face_b: 0.03503. (The last 100K had No Warp)

This setting w/efficienetv2_L @100, took roughly 700K ITs (1.75 million e.g.s) to reach the same losses and IMHO better visual fidelity (Last 100K had No Warp) And the LR had no warnings, ooms, or problems.

If anyone has any suggestions to the settings or learning rate, please post a reply. Nothing is set in stone, we are all learning, and we're all building off other's suggestions. What may seem obvious and silly to you, will save hours for newbies and others.

Post by **torzdf** » Thu Feb 09, 2023 11:57 am

You can save a preset in the Phaze-A config settings and upload it here, if you want.

Post by **bryanlyon** » Tue Feb 14, 2023 6:55 pm

Also, a note, you can "save draft" at the bottom of a post that lets you edit it before you decide it's ready for you to post it .

MaxHunter · Post by **MaxHunter** » Tue Feb 14, 2023 7:09 pm

I couldn't upload the JSON, so I just re-edited the above post with final thoughts, and deleted my last post as it was duplicated in the new edit.

MaxHunter · Post by **MaxHunter** » Tue Feb 14, 2023 7:18 pm

@bryanlyon
Yeah I know, but I was writing the post on my phone and then was going to insert the JSON from my computer in another room and thought it would be just a quick edit, but turned into a SNAFU of sorts. LOL faceslap Sorry. Story of my life. LOL

Post by **bryanlyon** » Tue Feb 14, 2023 7:26 pm

Not a problem, just trying to help you avoid edits on your posts.

Post by **torzdf** » Tue Feb 14, 2023 11:39 pm

Pro tip: Use a code block and mark it as JSON....

I have to put this in a code block so it doesn't render, but you do it like this:

Code: Select all

```json

{"test": "json"}```

This will render as

Code: Select all

{"test": "json"}

People can then just press the copy button on the code block

MaxHunter · Post by **MaxHunter** » Wed Feb 15, 2023 6:13 pm

@torzdf
Thanks. It took me a few tries to do it, and find the "code box" edit feature, but looks like we got it now.

Hotel85 · Post by **Hotel85** » Sun Sep 17, 2023 3:33 pm

Hi folks,

Today, I tried out the DFL-SAEHD-DF preset from Phaze A. It's really impressive with an input size of 192 and a batch size of 16 (110 EGs/sec). I was expecting it to be much slower.

And here comes the newbie question:
Why is it that the original DFL-SAE-DF is much slower (80 EGs/sec)?

Thank you

Post by **torzdf** » Sun Sep 17, 2023 6:53 pm

Without having looked at the actual layouts of the models (I don't use those presets myself), I would guess that the latter is a 'deeper' model. That is, it has more parameters to train.

You can check this yourself by initiating each with the "summary" option checked and looking at the model structures.

SuperMario20 · Post by **SuperMario20** » Thu Jul 18, 2024 1:26 am

Lately, I have been experimenting with my mid-end card (RTX 4070 16GB) and discovered the best settings that worked for me when training 512px faces. I created my forum account today, solely so that I could post this here to share my findings with others. (Yes, Phaze-A 512 Training is possible on a 16GB Gaming GPU!)

I found the maximum setting that would work in each area and applied them together to make my card work to the limit. I have discovered through this that any resolution above 512 is basically impossible on a mid-end card due to the heavy VRAM requirements of each setting. Even just getting G-Block to work was difficult with 512. If you disabled almost all layering settings, you could probably get a higher training resolution, at a severe detail cost.

If the below preset doesn't work for your card, enabling "Mixed-Precision" should fix this.
I am using 5e-5 learning rate, and -4 epsilon exponent.
The filters are extremely low on quantity, but most of this is necessary in order to train on a lower VRAM card at 512px.
I am running at batch size 11. For the first few thousand iterations batch size 12 worked, but eventually it hit an OOM, then switching to batch size 11 has worked for the past 200k iterations.

Code: Select all

{
  "output_size": 512,
  "shared_fc": "none",
  "enable_gblock": true,
  "split_fc": true,
  "split_gblock": false,
  "split_decoders": false,
  "enc_architecture": "efficientnet_v2_b0",
  "enc_scaling": 100,
  "enc_load_weights": false,
  "bottleneck_type": "dense",
  "bottleneck_norm": "none",
  "bottleneck_size": 512,
  "bottleneck_in_encoder": true,
  "fc_depth": 1,
  "fc_min_filters": 128,
  "fc_max_filters": 128,
  "fc_dimensions": 4,
  "fc_filter_slope": -0.25,
  "fc_dropout": 0.0,
  "fc_upsampler": "upsample2d",
  "fc_upsamples": 1,
  "fc_upsample_filters": 128,
  "fc_gblock_depth": 2,
  "fc_gblock_min_nodes": 128,
  "fc_gblock_max_nodes": 128,
  "fc_gblock_filter_slope": -0.25,
  "fc_gblock_dropout": 0.0,
  "dec_upscale_method": "resize_images",
  "dec_upscales_in_fc": 2,
  "dec_norm": "None",
  "dec_min_filters": 16,
  "dec_max_filters": 256,
  "dec_slope_mode": "full",
  "dec_filter_slope": -0.25,
  "dec_res_blocks": 1,
  "dec_output_kernel": 3,
  "dec_gaussian": true,
  "dec_skip_last_residual": false,
  "freeze_layers": "encoder",
  "load_layers": "encoder",
  "fs_original_depth": 8,
  "fs_original_min_filters": 16,
  "fs_original_max_filters": 512,
  "fs_original_use_alt": true,
  "mobilenet_width": 1.0,
  "mobilenet_depth": 1,
  "mobilenet_dropout": 0.001,
  "mobilenet_minimalistic": false,
  "__filetype": "faceswap_preset",
  "__section": "train|model|phaze_a"
}

JDBAU · Post by **JDBAU** » Sat Jul 27, 2024 7:41 am

I'm currently training a dyn1024 model on a RTX 4080 Super, and it's super alright, super slow.

Anyone have much experience with dny1024? Is the detail good on default settings?

Post by **torzdf** » Thu Aug 01, 2024 4:23 pm

I don't think I've heard of anyone training it to completion, for the reasons you state. It takes a very long time.

Would be interested to know how you get on.

Faceswap Forum

[Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A

Re: [Guide] Introducing - Phaze-A