Training stopped without error message

If training is failing to start, and you are not receiving an error message telling you what to do, tell us about it here


Forum rules

Read the FAQs and search the forum before posting a new topic.

This forum is for reporting errors with the Training process. If you want to get tips, or better understand the Training process, then you should look in the Training Discussion forum.

Please mark any answers that fixed your problems so others can find the solutions.

Locked
User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Training stopped without error message

Post by elecbub »

I have prepared the CPU-only docker environment (python3.6, tensorflow1.12, and other libraries as in the requirements.txt) (https://github.com/deepfakes/faceswap/). I prepared the image data (around 600 and 850 PNG files) and tried to start training a new model with the command below.

Code: Select all

python3.6 faceswap.py train -A "mydir/faces/b_4" -B "mydir/faces/a" -m "mydir/model "

But the training seemed to stop without any error messages or crash-report. Is there any clue for what should I do next ?

Below is the message in the terminal after running the command.

Code: Select all

Setting Faceswap backend to CPU                                                                                                            
02/14/2020 22:10:17 INFO Log level set to: INFO
Using TensorFlow backend.
02/14/2020 22:10:22 INFO Model A Directory: /srv/mydir/faces/b_4
02/14/2020 22:10:22 INFO Model B Directory: /srv/mydir/faces/a
02/14/2020 22:10:22 INFO Training data directory: /srv/mydir/model
02/14/2020 22:10:22 INFO ===================================================
02/14/2020 22:10:22 INFO Starting
02/14/2020 22:10:22 INFO Press 'ENTER' to save and quit
02/14/2020 22:10:22 INFO Press 'S' to save model weights immediately
02/14/2020 22:10:22 INFO ===================================================
02/14/2020 22:10:23 INFO Loading data, this may take a while...
02/14/2020 22:10:23 INFO Loading Model from Original plugin...
02/14/2020 22:10:23 INFO No existing state file found. Generating.
02/14/2020 22:10:25 INFO Creating new 'original' model in folder: '/srv/mydir/model'
02/14/2020 22:10:25 INFO Loading Trainer from Original plugin... 02/14/2020 22:10:29 INFO Enabled TensorBoard Logging
Killed
User avatar
deephomage
Posts: 33
Joined: Fri Jul 12, 2019 6:09 pm
Answers: 1
Has thanked: 2 times
Been thanked: 8 times

Re: Training stopped without error message

Post by deephomage »

Docker isn't officially supported. You're welcome to try Windows, Linux or macOS,

User avatar
bryanlyon
Site Admin
Posts: 793
Joined: Fri Jul 12, 2019 12:49 am
Answers: 44
Location: San Francisco
Has thanked: 4 times
Been thanked: 218 times
Contact:

Re: Training stopped without error message

Post by bryanlyon »

The "Killed" message implies something else killed the operation. You can see if anything caused it to be killed by checking the docker system logs. Unfortunately, I don't know what that is and I doubt you'll find anything in Faceswap's logs. You may TRY setting Faceswap to a high logging level while training and see if it gives you any information in the faceswap.log file.

That said, CPU training (especially in a docker) is going to be an absolute nightmare. I'd fully expect it to take you around a month or two to get even a basic result. We HIGHLY recommend an Nvidia graphics card for training.

User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Re: Training stopped without error message

Post by elecbub »

deephomage wrote: Fri Feb 14, 2020 11:07 pm

Docker isn't officially supported. You're welcome to try Windows, Linux or macOS,

Thank you for your reply ! The docker environment that I am using is based on Ubuntu 16.04 running from macOS. I will try other environment someday ^^

User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Re: Training stopped without error message

Post by elecbub »

bryanlyon wrote: Sat Feb 15, 2020 3:26 am

The "Killed" message implies something else killed the operation. You can see if anything caused it to be killed by checking the docker system logs. Unfortunately, I don't know what that is and I doubt you'll find anything in Faceswap's logs. You may TRY setting Faceswap to a high logging level while training and see if it gives you any information in the faceswap.log file.

That said, CPU training (especially in a docker) is going to be an absolute nightmare. I'd fully expect it to take you around a month or two to get even a basic result. We HIGHLY recommend an Nvidia graphics card for training.

Thank you for your reply ! As you said, The faceswap.log does not seem to have information more than the message on terminal. I will take a look on the docker system logs and try setting the logging level.

I understand that training needs much computation cost and time to get some good results. Rightnow I have no good environment and want to start from self-preparing data as practice.

User avatar
torzdf
Posts: 2649
Joined: Fri Jul 12, 2019 12:53 am
Answers: 159
Has thanked: 128 times
Been thanked: 622 times

Re: Training stopped without error message

Post by torzdf »

I would recommend setting up in Anaconda on Mac

My word is final

Locked