Same missing alignments problem while training

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
cosmico
Posts: 95
Joined: Sat Jan 18, 2020 6:32 pm
Has thanked: 13 times
Been thanked: 35 times

Same missing alignments problem while training

Post by cosmico »

08/29/2020 15:14:28 ERROR Caught exception in thread: '_training_0'
08/29/2020 15:14:28 ERROR Alignments file does not exist: D:\Nueral Network programs\[i]Input A frames[/i]\Training frames\alignments.fsa

Its the same error I had last time, only last time deselecting penalized mask loss made it work again, This time it doesnt work regardless of whether I have it on or off.
Was running a dflsae, when I decided to update and then this happened. Randomly adjusting the loss function, mask loss function, l2 reg, eye or mouth multiplier, mask type, learn mask, and penalized mask loss all seem to do nothing. Also other projects with different models and training sets dont work either. Same with starting a brand new model

by torzdf » Sun Aug 30, 2020 10:26 am

Eye Multiplier and Mouth Multiplier both need the alignments file.

Reduce these values to 1.

You are really going to start missing out on benefits in training if you don't use an alignments file though.

Go to full post
User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: Same missing alignments problem while training

Post by torzdf »

Eye Multiplier and Mouth Multiplier both need the alignments file.

Reduce these values to 1.

You are really going to start missing out on benefits in training if you don't use an alignments file though.

My word is final

User avatar
cosmico
Posts: 95
Joined: Sat Jan 18, 2020 6:32 pm
Has thanked: 13 times
Been thanked: 35 times

Re: Same missing alignments problem while training

Post by cosmico »

Thanks again for helping me out.

You are really going to start missing out on benefits in training

They seem like great features, I would honestly love to use them, but every time I start a new project and a new model, there's just no alignments file there. -and thus I have this problem after every update. What am I doing wrong where every new project defaults me to not having this? And can I add this to a model that's already half way trained not using this? -like is it to late for me to fix this and implement these features on my dflsae at 400k iterations?

-Edit: I get everytime I extract I create an alignments file, but I use multiple clips and thus have multiple alignment files, plus the alignment files are all named after the video clip. The only time I get a single alignment file named "alignment.fsa" is when I extract from images.

User avatar
cosmico
Posts: 95
Joined: Sat Jan 18, 2020 6:32 pm
Has thanked: 13 times
Been thanked: 35 times

Re: Same missing alignments problem while training

Post by cosmico »

I figured it out but boy did it give me plenty of issues. At first it had some keyerror where it would have a problem with the first picture in my b set, and if i removed that picture it would have an issue with the second picture and on and on and on. It turns out taking out all the pathways (that there was nothing wrong with) under the timelapse solved that. Now I'm dealing with an illegal address issue.
2020-08-30 22:51:43.060518: E tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2020-08-30 22:51:43.061058: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
geforce experience is saying my drivers are up to date, and reinstalling miniconda and faceswap didnt solve

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: Same missing alignments problem while training

Post by torzdf »

That looks like an Out of Memory error. Try lowering you batchsize.

My word is final

User avatar
cosmico
Posts: 95
Joined: Sat Jan 18, 2020 6:32 pm
Has thanked: 13 times
Been thanked: 35 times

Re: Same missing alignments problem while training

Post by cosmico »

If by out of memory you mean it runs out of my computer memory, perhaps. It does seem to work best (best meaning it works for an hour or 2 ) when I restart my computer and it tends to crash when I open up even light applications.

A batchsize of 2 is what I'm using as any larger crashes after 30 seconds. Turning it down to 1 didn't seem to help that much. I would have thought my 16gb ram and rtx 2060 would have been able to handle DFLSAE @ 144pixles, growth, mixed precision, multiscale decoder, 512 autoencoder, 42 encoder and 30 decoder,a lot better than this.

In the info boxes, adjusting the encoders state that lower setting can free up vram. Would turning off growth, mixed precision, or multiscale decoder also help free up vram? Also would turning down the learning rate help in any way?

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: Same missing alignments problem while training

Post by torzdf »

Honestly, I don't know the specific settings on that model too well, so don't know what expected requirements are. [mention]abigflea[/mention] may be able to tell you whether those settings seem sensible.

By Out of Memory, I'm referring to GPU memory, System RAM shouldn't be an issue.

My word is final

User avatar
abigflea
Posts: 182
Joined: Sat Feb 22, 2020 10:59 pm
Answers: 2
Has thanked: 20 times
Been thanked: 62 times

Re: Same missing alignments problem while training

Post by abigflea »

nnifj wrote: Mon Aug 31, 2020 7:38 pm

If by out of memory you mean it runs out of my computer memory, perhaps. It does seem to work best (best meaning it works for an hour or 2 ) when I restart my computer and it tends to crash when I open up even light applications.

If your computer is crashing after an hour or 2, I personally would start thinking some component is getting hot and unstable.

nnifj wrote: Mon Aug 31, 2020 7:38 pm

A batchsize of 2 is what I'm using as any larger crashes after 30 seconds. Turning it down to 1 didn't seem to help that much. I would have thought my 16gb ram and rtx 2060 would have been able to handle DFLSAE @ 144pixles, growth, mixed precision, multiscale decoder, 512 autoencoder, 42 encoder and 30 decoder,a lot better than this.

DFL uses a lot of VRAM. I have a model with 42Enc, 22 Dec, at 128pix and get a Batch of 12 on my RTX 2070 8GB.
I can run a batch of 16 but get OOM after a few 1000 iterations.
With your 6GB card and those settings, you are not going to get a high batch.
Maybe tone down that decoder back to 21(default).
Big disclaimer here, every model will use slightly different amounts of memory, batch may go up and down a notch.
Windows will take more for itself and knock down your maximum batch size.
I suspect other programs running will use more VRAM, I use mine not connected to any monitor to keep every megabyte free as possible.

If your System Ram is filling up, you have some other program(s) stealing your memory away. FS doesn't use much (Thanks Torzdf). I can run 3 separate instances , all training, and it used about 14GB in Linux .

nnifj wrote: Mon Aug 31, 2020 7:38 pm

In the info boxes, adjusting the encoders state that lower setting can free up vram. Would turning off growth, mixed precision, or multiscale decoder also help free up vram? Also would turning down the learning rate help in any way?

Mixed Precision should increase your available memory so leave that on.
Allow growth shouldn't make much difference with this issue. I have to leave it on because my 2070 won't behave without it.
I haven't really tried turning Multi-scale decoder on/off. It may help a little, try it out.

:o I dunno what I'm doing :shock:
2X RTX 3090 : RTX 3080 : RTX: 2060 : 2x RTX 2080 Super : Ghetto 1060

Locked