Training stopped without error message

Training your model
Forum rules
Read the FAQs and search the forum before posting a new topic.

Please mark any answers that fixed your problems so others can find the solutions.
Locked
User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Training stopped without error message

Post by elecbub »

I have prepared the CPU-only docker environment (python3.6, tensorflow1.12, and other libraries as in the requirements.txt) (https://github.com/deepfakes/faceswap/). I prepared the image data (around 600 and 850 PNG files) and tried to start training a new model with the command below.

Code: Select all

python3.6 faceswap.py train -A "mydir/faces/b_4" -B "mydir/faces/a" -m "mydir/model "

But the training seemed to stop without any error messages or crash-report. Is there any clue for what should I do next ?

Below is the message in the terminal after running the command.

Code: Select all

Setting Faceswap backend to CPU                                                                                                            
02/14/2020 22:10:17 INFO Log level set to: INFO
Using TensorFlow backend.
02/14/2020 22:10:22 INFO Model A Directory: /srv/mydir/faces/b_4
02/14/2020 22:10:22 INFO Model B Directory: /srv/mydir/faces/a
02/14/2020 22:10:22 INFO Training data directory: /srv/mydir/model
02/14/2020 22:10:22 INFO ===================================================
02/14/2020 22:10:22 INFO Starting
02/14/2020 22:10:22 INFO Press 'ENTER' to save and quit
02/14/2020 22:10:22 INFO Press 'S' to save model weights immediately
02/14/2020 22:10:22 INFO ===================================================
02/14/2020 22:10:23 INFO Loading data, this may take a while...
02/14/2020 22:10:23 INFO Loading Model from Original plugin...
02/14/2020 22:10:23 INFO No existing state file found. Generating.
02/14/2020 22:10:25 INFO Creating new 'original' model in folder: '/srv/mydir/model'
02/14/2020 22:10:25 INFO Loading Trainer from Original plugin... 02/14/2020 22:10:29 INFO Enabled TensorBoard Logging
Killed

User avatar
deephomage
Posts: 27
Joined: Fri Jul 12, 2019 6:09 pm
Answers: 1
Has thanked: 2 times
Been thanked: 6 times

Re: Training stopped without error message

Post by deephomage »

Docker isn't officially supported. You're welcome to try Windows, Linux or macOS,


User avatar
bryanlyon
Site Admin
Posts: 496
Joined: Fri Jul 12, 2019 12:49 am
Answers: 41
Location: San Francisco
Has thanked: 3 times
Been thanked: 120 times
Contact:

Re: Training stopped without error message

Post by bryanlyon »

The "Killed" message implies something else killed the operation. You can see if anything caused it to be killed by checking the docker system logs. Unfortunately, I don't know what that is and I doubt you'll find anything in Faceswap's logs. You may TRY setting Faceswap to a high logging level while training and see if it gives you any information in the faceswap.log file.

That said, CPU training (especially in a docker) is going to be an absolute nightmare. I'd fully expect it to take you around a month or two to get even a basic result. We HIGHLY recommend an Nvidia graphics card for training.


User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Re: Training stopped without error message

Post by elecbub »

deephomage wrote: Fri Feb 14, 2020 11:07 pm

Docker isn't officially supported. You're welcome to try Windows, Linux or macOS,

Thank you for your reply ! The docker environment that I am using is based on Ubuntu 16.04 running from macOS. I will try other environment someday ^^


User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Re: Training stopped without error message

Post by elecbub »

bryanlyon wrote: Sat Feb 15, 2020 3:26 am

The "Killed" message implies something else killed the operation. You can see if anything caused it to be killed by checking the docker system logs. Unfortunately, I don't know what that is and I doubt you'll find anything in Faceswap's logs. You may TRY setting Faceswap to a high logging level while training and see if it gives you any information in the faceswap.log file.

That said, CPU training (especially in a docker) is going to be an absolute nightmare. I'd fully expect it to take you around a month or two to get even a basic result. We HIGHLY recommend an Nvidia graphics card for training.

Thank you for your reply ! As you said, The faceswap.log does not seem to have information more than the message on terminal. I will take a look on the docker system logs and try setting the logging level.

I understand that training needs much computation cost and time to get some good results. Rightnow I have no good environment and want to start from self-preparing data as practice.


User avatar
torzdf
Posts: 1011
Joined: Fri Jul 12, 2019 12:53 am
Answers: 128
Has thanked: 28 times
Been thanked: 193 times

Re: Training stopped without error message

Post by torzdf »

I would recommend setting up in Anaconda on Mac

My word is final


Locked