Training stopped without error message

Training your model
Forum rules
Read the FAQs and search the forum before posting a new topic.

Please mark any answers that fixed your problems so others can find the solutions.
Locked
User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Training stopped without error message

Post by elecbub »

I have prepared the CPU-only docker environment (python3.6, tensorflow1.12, and other libraries as in the requirements.txt) (https://github.com/deepfakes/faceswap/). I prepared the image data (around 600 and 850 PNG files) and tried to start training a new model with the command below.

Code: Select all

python3.6 faceswap.py train -A "mydir/faces/b_4" -B "mydir/faces/a" -m "mydir/model "
But the training seemed to stop without any error messages or crash-report. Is there any clue for what should I do next ?

Below is the message in the terminal after running the command.

Code: Select all

Setting Faceswap backend to CPU                                                                                                            
02/14/2020 22:10:17 INFO     Log level set to: INFO                                                                                        
Using TensorFlow backend.                                                                                                                  
02/14/2020 22:10:22 INFO     Model A Directory: /srv/mydir/faces/b_4                                                                       
02/14/2020 22:10:22 INFO     Model B Directory: /srv/mydir/faces/a                                                                         
02/14/2020 22:10:22 INFO     Training data directory: /srv/mydir/model                                                                     
02/14/2020 22:10:22 INFO     ===================================================                                                           
02/14/2020 22:10:22 INFO       Starting                                                                                                    
02/14/2020 22:10:22 INFO       Press 'ENTER' to save and quit                                                                              
02/14/2020 22:10:22 INFO       Press 'S' to save model weights immediately                                                                 
02/14/2020 22:10:22 INFO     ===================================================                                                           
02/14/2020 22:10:23 INFO     Loading data, this may take a while...                                                                        
02/14/2020 22:10:23 INFO     Loading Model from Original plugin...                                                                         
02/14/2020 22:10:23 INFO     No existing state file found. Generating.                                                                     
02/14/2020 22:10:25 INFO     Creating new 'original' model in folder: '/srv/mydir/model'                                                   
02/14/2020 22:10:25 INFO     Loading Trainer from Original plugin... 
02/14/2020 22:10:29 INFO     Enabled TensorBoard Logging                                                                                   
Killed                        

User avatar
deephomage
Posts: 26
Joined: Fri Jul 12, 2019 6:09 pm
Answers: 1
Has thanked: 2 times
Been thanked: 6 times

Re: Training stopped without error message

Post by deephomage »

Docker isn't officially supported. You're welcome to try Windows, Linux or macOS,

User avatar
bryanlyon
Site Admin
Posts: 322
Joined: Fri Jul 12, 2019 12:49 am
Answers: 27
Location: San Francisco
Has thanked: 3 times
Been thanked: 87 times
Contact:

Re: Training stopped without error message

Post by bryanlyon »

The "Killed" message implies something else killed the operation. You can see if anything caused it to be killed by checking the docker system logs. Unfortunately, I don't know what that is and I doubt you'll find anything in Faceswap's logs. You may *TRY* setting Faceswap to a high logging level while training and see if it gives you any information in the faceswap.log file.

That said, CPU training (especially in a docker) is going to be an absolute nightmare. I'd fully expect it to take you around a month or two to get even a basic result. We *HIGHLY* recommend an Nvidia graphics card for training.

User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Re: Training stopped without error message

Post by elecbub »

deephomage wrote:
Fri Feb 14, 2020 11:07 pm
Docker isn't officially supported. You're welcome to try Windows, Linux or macOS,
Thank you for your reply ! The docker environment that I am using is based on Ubuntu 16.04 running from macOS. I will try other environment someday ^^

User avatar
elecbub
Posts: 3
Joined: Fri Feb 14, 2020 10:16 pm
Has thanked: 2 times

Re: Training stopped without error message

Post by elecbub »

bryanlyon wrote:
Sat Feb 15, 2020 3:26 am
The "Killed" message implies something else killed the operation. You can see if anything caused it to be killed by checking the docker system logs. Unfortunately, I don't know what that is and I doubt you'll find anything in Faceswap's logs. You may *TRY* setting Faceswap to a high logging level while training and see if it gives you any information in the faceswap.log file.

That said, CPU training (especially in a docker) is going to be an absolute nightmare. I'd fully expect it to take you around a month or two to get even a basic result. We *HIGHLY* recommend an Nvidia graphics card for training.
Thank you for your reply ! As you said, The faceswap.log does not seem to have information more than the message on terminal. I will take a look on the docker system logs and try setting the logging level.

I understand that training needs much computation cost and time to get some good results. Rightnow I have no good environment and want to start from self-preparing data as practice.

User avatar
torzdf
Posts: 639
Joined: Fri Jul 12, 2019 12:53 am
Answers: 93
Has thanked: 17 times
Been thanked: 129 times

Re: Training stopped without error message

Post by torzdf »

I would recommend setting up in Anaconda on Mac
My word is final

Locked