[Guide] Introducing - Phaze-A

Want to understand the training process better? Got tips for which model to use and when? This is the place for you


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for discussing tips and understanding the process involved with Training a Faceswap model.

If you have found a bug are having issues with the Training process not working, then you should post in the Training Support forum.

Please mark any answers that fixed your problems so others can find the solutions.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: [Guide] Introducing - Phaze-A

Post by torzdf »

RisingZen wrote: Fri Mar 04, 2022 11:59 am

I can't wrap my head around how the number of upscales are determined for the decoder. The google docs sheet allows the user to change the number of data points (upscales), but Phaze-A does not offer an option to set it. Am I missing something here?

You can calculate the number of upscales required. This is not something that you have direct control over. It is basically determined by the dimensional size of the input to the decoder (i.e. of the size of the the output of the fully connected layers) and the size of the final image. Eg. If the output of the fully connected layers is 8x8[x1024] and the final image is 64x64[x3] then there will be 3 upscales (8 > 16 > 32 > 64).

An easier method though is to just run the model in "summary" mode, then you can count the number of upscales that will be created,

Is there a specific reason for giving the G-Block three hidden layers?

This was the number that StyleGAN used in their original implementation.

Thanks for testing. I wish I had more feedback/guidance, but testing was pretty much exactly what this model was created for as, as you can see, testing all possible combinations and knowing what impact they will have is a difficult and time-consuming task... especially for 1 person.

My word is final


Tags:
User avatar
RisingZen
Posts: 10
Joined: Wed Oct 14, 2020 6:01 pm
Has thanked: 7 times
Been thanked: 4 times

Re: [Guide] Introducing - Phaze-A

Post by RisingZen »

I have moved over to refactoring the code of models as to have more flexibility on where specific layers are called. It's sadly not often the case that observations from custom models can be translated to Phaze-A settings, but some improvements such as EfficientNet v2 definitely will find their way into the previous post.

Glad to see that more people started to experiment with Phaze-A. It has been been a fun time to work with it and if it would not have been for that I wouldn't have delved deeper into ML.

Last edited by RisingZen on Sun Nov 06, 2022 1:33 pm, edited 5 times in total.
User avatar
RisingZen
Posts: 10
Joined: Wed Oct 14, 2020 6:01 pm
Has thanked: 7 times
Been thanked: 4 times

Re: [Guide] Introducing - Phaze-A

Post by RisingZen »

Merged the posts

Last edited by RisingZen on Sun Nov 06, 2022 12:36 pm, edited 1 time in total.
User avatar
filou_7
Posts: 1
Joined: Sat Mar 26, 2022 9:26 pm

Re: [Guide] Introducing - Phaze-A

Post by filou_7 »

Did I understand correctly that I can create an original with 128 pixels with phase a instead of 64 if I load the original preset? that would be great. I get a crash with many trainers and settings, even though I'm working with mac pro 6.1 and AMD RX580 8GB..

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: [Guide] Introducing - Phaze-A

Post by torzdf »

filou_7 wrote: Sat Mar 26, 2022 9:40 pm

Did I understand correctly that I can create an original with 128 pixels with phase a instead of 64 if I load the original preset? that would be great. I get a crash with many trainers and settings, even though I'm working with mac pro 6.1 and AMD RX580 8GB..

Yes you can. You can load the original preset then tweak it to how you want it. Make sure you understand the part about upscalers (and upscale curve) in the decoders, because that is ultimately going to control your final output resolution.

Comparing the dfaker and original presets may also help, as dfaker is just original with another upscaler to take it to 128px.

My word is final

User avatar
adam_macchiato
Posts: 16
Joined: Tue Jul 26, 2022 5:26 am
Has thanked: 4 times

Re: Loss cant go down spent over 48 hrs ?

Post by adam_macchiato »

Thank you for your advise , Would you please suggest some setting about 1080p video for using V2 with rtx 3080 n 3090 ?
because for now , v2-s with stojo present , 1080p video , some close-up shot the face go blur , but v2-l with stojo present much better .

suggestion like : extraction px
phaze-A output px
enl scaling... etc . ?

thank you very much ,,,,,

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: [Guide] Introducing - Phaze-A

Post by torzdf »

adam_macchiato wrote: Sat Aug 06, 2022 12:56 pm

Thank you for your advise , Would you please suggest some setting about 1080p video for using V2 with rtx 3080 n 3090 ?
because for now , v2-s with stojo present , 1080p video , some close-up shot the face go blur , but v2-l with stojo present much better .

suggestion like : extraction px
phaze-A output px
enl scaling... etc . ?

thank you very much ,,,,,

Not really. I don't have a card of those specs, nor all the answers about what works and what doesn't.

My word is final

User avatar
Yaboyscotty
Posts: 3
Joined: Thu Jul 18, 2019 8:23 am

Re: [Guide] Introducing - Phaze-A

Post by Yaboyscotty »

What type of batch size should I be looking at while using phaze a?

I have a 3090

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: [Guide] Introducing - Phaze-A

Post by torzdf »

I couldn't tell you batch size, just based on settings. See here for finding the best batch size yourself:

viewtopic.php?p=388

My word is final

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 176 times
Been thanked: 13 times

Re: [Guide] Introducing - Phaze-A

Post by MaxHunter »

Wow!! I went this entire time without knowing there were presets?!!! Is this in the tutorial?

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: [Guide] Introducing - Phaze-A

Post by torzdf »

MaxHunter wrote: Thu Aug 11, 2022 1:40 am

Wow!! I went this entire time without knowing there were presets?!!! Is this in the tutorial?

Yes. Linked from the very first post ;)
viewtopic.php?p=5367#p5367

My word is final

User avatar
Ryzen1988
Posts: 57
Joined: Thu Aug 11, 2022 8:31 am
Location: Netherlands
Has thanked: 8 times
Been thanked: 27 times

Re: [Guide] Introducing - Phaze-A

Post by Ryzen1988 »

Hey Guys, i have been tinkering and tweaking a lot with Phaze-A and i seem to have found a combination that has fairly high output resolution, good at swapping and for its size still sort of manageable.
Currently only at training iteration 60000+ but with a high quality general purpose face dataset of around 7000 images of individual persons the facial reconstruction already looks very good and its also already started to partially swap faces.
Normally this does not really happen with a general training dataset because every picture is of a different person. The fact that its already starting to partially swap faces in this early stage of training is very exciting.

When starting from scratch with training it would result in Nan very quickly, in like 50 iterations.
After a lot of tweaking i found that starting with Adabelief, learning rate down to 2.5e-5 and epsilon up to -8 was the trick to slowly and steady starting to learn, after around 2000 its i could raise the learning rate and epsilon pretty quickly in steps
back up to 5.5e-5 and Epsilon to -16

my choice was Efficientnetv2 B3 as base since after reading up on all the papers it looked like the most efficient and modern encoder. Mobilenet v3 was also cool exept the low max res cap.
Input was with 100% enc scaling at 300px and generated output was 512px. Training started with batch of 16 with ssim and logcosh.
At around 40k its i switched to batch of 4 with Lpips-vgg16 as L3 for accelerated sharpness and feature development of the faces.
Goal is for the network to be good enough at 200k general training to be used for specific swapping with as little retraining as possible. (that is why its fairly large model with some more splitting and layering that otherwise necessary)
One of the things was to not put the bottleneck in the encoder but keep it in the FC, this results in the G-bloc and split FC bloating the parameter count of the model but in return gives the inputs to the Gblock and split layers way more data to work with.
I try to avoid upscales and just use filters.

Code: Select all

{
    "output_size": 512,
    "shared_fc": "full",
    "enable_gblock": true,
    "split_fc": true,
    "split_gblock": false,
    "split_decoders": true,
    "enc_architecture": "efficientnet_v2_b3",
    "enc_scaling": 100,
    "enc_load_weights": false,
    "bottleneck_type": "dense",
    "bottleneck_norm": null,
    "bottleneck_size": 1536,
    "bottleneck_in_encoder": false,
    "fc_depth": 1,
    "fc_min_filters": 512,
    "fc_max_filters": 512,
    "fc_dimensions": 4,
    "fc_filter_slope": -0.5,
    "fc_dropout": 0.07,
    "fc_upsampler": "subpixel",
    "fc_upsamples": 0,
    "fc_upsample_filters": 1024,
    "fc_gblock_depth": 3,
    "fc_gblock_min_nodes": 512,
    "fc_gblock_max_nodes": 512,
    "fc_gblock_filter_slope": -0.5,
    "fc_gblock_dropout": 0.05,
    "dec_upscale_method": "subpixel",
    "dec_upscales_in_fc": 0,
    "dec_norm": null,
    "dec_min_filters": 16,
    "dec_max_filters": 512,
    "dec_slope_mode": "full",
    "dec_filter_slope": -0.45,
    "dec_res_blocks": 1,
    "dec_output_kernel": 5,
    "dec_gaussian": true,
    "dec_skip_last_residual": true
}

Picture will follow when target training iterations is reached.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: [Guide] Introducing - Phaze-A

Post by torzdf »

Thanks for this post! This is exactly the kind of information and research that I would like to see from users.

I look forward to seeing how your model gets on :)

My word is final

User avatar
adam_macchiato
Posts: 16
Joined: Tue Jul 26, 2022 5:26 am
Has thanked: 4 times

Output 512px with V2 is really a difficult Task

Post by adam_macchiato »

Just want to share my case , and hope any suggestion ,

I have 2 PC , 1 with 3090 other with 3080 ,
if output 256px , 3090 can use Stojo model with V2_l , quality very good and impression
3080 can use stojo with V2_s

because for 1080p+ video , some close-up shot face go blur , so i start to try 512px output , but some setting result is out of vram ,
i share my simple setting as below :

3090 , Extraction 1024px . 512px output

ModelEncoderScalingBatchsizeResult
StojoV2_M1002/4/8Fail
StojoV2_S1008Success
StojoV2_B310012Success

3080 , Extraction 780px . 512px output

ModelEncoderScalingBatchsizeResult
StojoV2_S1002/4/8Fail
StojoV2_B31002/4/8Fail
DNY512V2_S1002/4/8Fail
DNY512V2_B31002/4/8Fail
DNY512--8Success
DNY1024--8Success
SAEDF-HDV2-S100BS 2/4/8Fail
SAEDF-HDV2_B31002/4/8Fail

any good suggestion with V2 setting can go 512px ?

thanks all

Edit by @torzdf: Inserted tables

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: [Guide] Introducing - Phaze-A

Post by torzdf »

Unfortunately my larger GPU is not available for testing at the moment.

The DNY preset is fairly simple/lightweight so I would definitely start from that (StoJo, whilst lower Res is much more complex).

Generally speaking (all things being equal) a doubling of resolution will need 4x as much VRAM. I would start with the V2_B0 encoder and go from there.

You may also be able to save some VRAM by changing the Bottleneck to a pooling layer rather than a Dense. Again, this will reduce complexity, but worth a try.

Also, extraction size does not impact VRAM usage (you may know this already, but I saw you mentioned extraction size in your post so thought I'd mention it)

My word is final

User avatar
Icarus
Posts: 8
Joined: Mon Aug 15, 2022 9:18 pm
Has thanked: 10 times
Been thanked: 8 times

Notes on Phaze A model architecture and settings

Post by Icarus »

I've been experimenting with Phaze A for a year now using Nvidia A100 cloud GPUs and have tried a few common and 1 not so common setup and wanted to share some of my notes on how different model architectures effect results.

split fc layer, gblock enabled (not split), shared decoders:
This is probably the most popular setup and is the best choice if your A data has a lot of poses / angles that your B data lacks. The shared decoder is really good at filling in the blanks, however there tends to be a fair amount of identity bleed.

shared fc layer, gblock enabled (not split), split decoder:
This has produced the worst results and in my experience has been the only setup to cause discoloration in the forehead when the hairlines differ.

split fc layer, gblock enabled (not split), split decoder:
This is the least common setup but my personal favorite when you have a good amount of B data. This setup results in strikingly accurate detail and is the closest thing to actually swapping the face with 0 identity bleed. The downside to this setup is when you don't have enough B data to fill in the blanks. The model does some frightening things when it only has the G-block (a GAN) to fill in the blanks.

I did a few experiments with a split g-block but I didn't really notice any significant improvement or degradation either way...

A few notes on what I've found to be ideal settings: :
Encoder: Efficientnet2_L has been amazing and I've noticed a huge improvement over v1. I usually try to match the scale with the output.

Bottleneck: Always go with Dense, I've tried both poolings and they result in streaks of color and poor detail. Using the 512 size has never let me down.

fc layer: overcranking this can do more harm than good. With autoencoder models, you generally don't want this to be more detailed than the encoder feeding it. I've noticed better results with a dimension of 8 and 1280 nodes than with a dimension of 12 or 16. On that note, making this deeper (increasing the depth over 1), is unnecessary, a waste of VRAM and, at least in my experience, did nothing to improve the results and may have made them worse.

fc_dropout: I hardly ever use this but the 1 time I did, it surprisingly sped up training massively (which seems counterintuitive).

Upsamplers: Since I was doing most of the training on a powerful GPU, I used subpixel for both upsamplers. I would say 512 is probably a decent amount of filters. I had this at 1280 but eventually dropped it to 512 and didn't notice any degradation in results.

Decoders: Allocating more VRAM to these parameters will give you the most bang for your buck in terms of detail in the results. I noticed a huge increase in detail and quality by increasing both first and final amount of filters. If you run into VRAM issues, adjusting the slope of the filter curve (making it steeper) can save you Vram. Adding an additional residual block (or 2) also made a huge difference. I go with kernal size of 3 but have also used the default of 5 a few times and it's hard to say if it made much of a positive difference because other parameters were also changed.

Loss functions: :
As it says in the Training Guide, the choice you make here will have an outsized impact on your entire model. I've tried the all and a combination of MS_SSIM and MAE (L1) at 100% have produced the best results. The weird quirk with MS_SSIM is whenever I've tried to start a model using it, my model crashes (which I honestly can't explain. So I usually start with SSIM then swap it out for MS_SSIM after 1k iterations. I also add a 3rd loss function, ffl at either 25% or 50% and I think it has made a positive impact. I've tried the lpips as tertiary losses and it completely ruined everything with the moire pattern described in the settings. I get that in theory using one of those as a supplimentary loss is supposed to help but I have no idea of how much weight to give it.

Mixed Precission: :
Last but not least, Mixed Precision. You love it and you hate it. It does make a huge difference in training speed and VRAM but is the frequent culprit of NaNs. I did some research on Nvidia's website regarding this and I found the holy grail of hidden information that has cured me of the downside to using it. It all comes down to the Epsilon. Nvidia recommends increasing your epsilon by 1e-3 when training with Mixed Precision. So instead of the default 1e-07, I use 1e-04 and this has made the world of difference with 0 downside in terms of the models ability to learn and most importantly no more NaNs.

These are just a few things I've noticed after experimenting a bit through trial and error and these findings are by no means scientific and would never pass a peer review :P

I usually train until loss convergence and to around 600k - 800k iterations with a batch size of 8 and learning rate of 3e-05.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: [Guide] Introducing - Phaze-A

Post by torzdf »

@Icarus Thanks for this extensive post.Hugely useful.

Could I ask that you split/duplicate your Mixed Precision and Loss posts into separate posts (duplicating the info is fine), as I think both of those items are worthy of their own independent discussion. I'm happy to c+p the info myself, but thought I would give you the opportunity to start the thread.

As for the Phaze-A info, this is great! I have linked it from the User/Presets + Experimentation posts.

My word is final

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 176 times
Been thanked: 13 times

Re: [Guide] Introducing - Phaze-A

Post by MaxHunter »

@Icarus

Thanks for the review. I've been having a hell of a time trying to get Stojo working as I keep getting NaNs- and it's driving me bonkers. Going to try your suggestions. 😉

User avatar
MaxHunter
Posts: 193
Joined: Thu May 26, 2022 6:02 am
Has thanked: 176 times
Been thanked: 13 times

Re: [Guide] Introducing - Phaze-A

Post by MaxHunter »

I have to mention I've been experimenting with the Stojo pre-set for the past several days, mixing it with Icarus' settings, but adding the third loss of lpips Alex at 5%, and the results in my opinion are outstanding. No nans at all, whereas before I couldn't get Stojo to go above 50,000 its without a NaN error. If you're looking for a general model for random "A" faces this might be your jumping off point, the results after 400,000 its and .08 loss are pretty good. I expect it to get even better as the loss slowly drops. I'd like to see it get down to .03 loss.

User avatar
zany6669
Posts: 2
Joined: Fri Sep 18, 2020 10:50 pm

Re: [Guide] Introducing - Phaze-A

Post by zany6669 »

Thanks for the great observations. Very helpful. I was wondering if is possible to change the G-BLock, split layer settings etc mid-training or does a new model need to be created?

Post Reply