Page 1 of 1

Training stopped without error message

Posted: Fri Feb 14, 2020 10:52 pm
by elecbub

I have prepared the CPU-only docker environment (python3.6, tensorflow1.12, and other libraries as in the requirements.txt) (https://github.com/deepfakes/faceswap/). I prepared the image data (around 600 and 850 PNG files) and tried to start training a new model with the command below.

Code: Select all

python3.6 faceswap.py train -A "mydir/faces/b_4" -B "mydir/faces/a" -m "mydir/model "

But the training seemed to stop without any error messages or crash-report. Is there any clue for what should I do next ?

Below is the message in the terminal after running the command.

Code: Select all

Setting Faceswap backend to CPU                                                                                                            
02/14/2020 22:10:17 INFO Log level set to: INFO
Using TensorFlow backend.
02/14/2020 22:10:22 INFO Model A Directory: /srv/mydir/faces/b_4
02/14/2020 22:10:22 INFO Model B Directory: /srv/mydir/faces/a
02/14/2020 22:10:22 INFO Training data directory: /srv/mydir/model
02/14/2020 22:10:22 INFO ===================================================
02/14/2020 22:10:22 INFO Starting
02/14/2020 22:10:22 INFO Press 'ENTER' to save and quit
02/14/2020 22:10:22 INFO Press 'S' to save model weights immediately
02/14/2020 22:10:22 INFO ===================================================
02/14/2020 22:10:23 INFO Loading data, this may take a while...
02/14/2020 22:10:23 INFO Loading Model from Original plugin...
02/14/2020 22:10:23 INFO No existing state file found. Generating.
02/14/2020 22:10:25 INFO Creating new 'original' model in folder: '/srv/mydir/model'
02/14/2020 22:10:25 INFO Loading Trainer from Original plugin... 02/14/2020 22:10:29 INFO Enabled TensorBoard Logging
Killed

Re: Training stopped without error message

Posted: Fri Feb 14, 2020 11:07 pm
by deephomage

Docker isn't officially supported. You're welcome to try Windows, Linux or macOS,


Re: Training stopped without error message

Posted: Sat Feb 15, 2020 3:26 am
by bryanlyon

The "Killed" message implies something else killed the operation. You can see if anything caused it to be killed by checking the docker system logs. Unfortunately, I don't know what that is and I doubt you'll find anything in Faceswap's logs. You may TRY setting Faceswap to a high logging level while training and see if it gives you any information in the faceswap.log file.

That said, CPU training (especially in a docker) is going to be an absolute nightmare. I'd fully expect it to take you around a month or two to get even a basic result. We HIGHLY recommend an Nvidia graphics card for training.


Re: Training stopped without error message

Posted: Sat Feb 15, 2020 7:32 am
by elecbub
deephomage wrote: Fri Feb 14, 2020 11:07 pm

Docker isn't officially supported. You're welcome to try Windows, Linux or macOS,

Thank you for your reply ! The docker environment that I am using is based on Ubuntu 16.04 running from macOS. I will try other environment someday ^^


Re: Training stopped without error message

Posted: Sat Feb 15, 2020 7:45 am
by elecbub
bryanlyon wrote: Sat Feb 15, 2020 3:26 am

The "Killed" message implies something else killed the operation. You can see if anything caused it to be killed by checking the docker system logs. Unfortunately, I don't know what that is and I doubt you'll find anything in Faceswap's logs. You may TRY setting Faceswap to a high logging level while training and see if it gives you any information in the faceswap.log file.

That said, CPU training (especially in a docker) is going to be an absolute nightmare. I'd fully expect it to take you around a month or two to get even a basic result. We HIGHLY recommend an Nvidia graphics card for training.

Thank you for your reply ! As you said, The faceswap.log does not seem to have information more than the message on terminal. I will take a look on the docker system logs and try setting the logging level.

I understand that training needs much computation cost and time to get some good results. Rightnow I have no good environment and want to start from self-preparing data as practice.


Re: Training stopped without error message

Posted: Sat Feb 15, 2020 10:11 am
by torzdf

I would recommend setting up in Anaconda on Mac