Page 1 of 1

Invalid device ordinal value (1). Valid range is [0, 0]

Posted: Thu May 21, 2020 7:50 am
by koroep

Tried to continue training my first model with multiple GPUs on an AWS p2.8xlarge instance. Had no trouble with the same setup on a 1 GPU instance p2.xlarge. Tried turning on and off the -o and -msg flags, and changing the batchsize, but no help there.

Code: Select all

(faceswap) ubuntu@ip-172-31-47-38:~/faceswap$ python /home/ubuntu/faceswap/faceswap.py train \
-A /home/ubuntu/myfolder/faceswap-project/face1/output \
-ala /home/ubuntu/myfolder/faceswap-project/face1/face1.fsa \
-B /home/ubuntu/myfolder/faceswap-project/face2/output \
-alb /home/ubuntu/myfolder/faceswap-project/face2/face2.fsa \
-m /home/ubuntu/myfolder/faceswap-project/models/face1face2 \
-t villain -bs 100 -it 1000000 -g 1 -s 50 -ss 25000 -ps 50 -ag -wl -L INFO -w
Setting Faceswap backend to NVIDIA
05/21/2020 07:28:20 INFO     Log level set to: INFO
Using TensorFlow backend.
05/21/2020 07:28:22 INFO     Model A Directory: /home/ubuntu/myfolder/faceswap-project/face1/output
05/21/2020 07:28:22 INFO     Model B Directory: /home/ubuntu/myfolder/faceswap-project/face2/output
05/21/2020 07:28:22 INFO     Training data directory: /home/ubuntu/myfolder/faceswap-project/models/face1face2
05/21/2020 07:28:22 WARNING  `-wl`, ``--warp-to-landmarks``  has been deprecated and will be removed from a future update. This option will be available within training config settings (/config/train.ini).
05/21/2020 07:28:22 INFO     ===================================================
05/21/2020 07:28:22 INFO       Starting
05/21/2020 07:28:22 INFO       Press 'ENTER' to save and quit
05/21/2020 07:28:22 INFO       Press 'S' to save model weights immediately
05/21/2020 07:28:22 INFO     ===================================================
05/21/2020 07:28:23 INFO     Loading data, this may take a while...
05/21/2020 07:28:23 INFO     Loading Model from Villain plugin...
05/21/2020 07:28:23 INFO     Using configuration saved in state file
05/21/2020 07:28:28 CRITICAL Error caught! Exiting...
05/21/2020 07:28:28 ERROR    Caught exception in thread: '_training_0'
05/21/2020 07:28:30 ERROR    Got Exception on main handler:
Traceback (most recent call last):
  File "/home/ubuntu/faceswap/lib/cli/launcher.py", line 155, in execute_script
    process.process()
  File "/home/ubuntu/faceswap/scripts/train.py", line 161, in process
    self._end_thread(thread, err)
  File "/home/ubuntu/faceswap/scripts/train.py", line 201, in _end_thread
    thread.join()
  File "/home/ubuntu/faceswap/lib/multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "/home/ubuntu/faceswap/lib/multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/faceswap/scripts/train.py", line 226, in _training
    raise err
  File "/home/ubuntu/faceswap/scripts/train.py", line 214, in _training
    model = self._load_model()
  File "/home/ubuntu/faceswap/scripts/train.py", line 255, in _load_model
    predict=False)
  File "/home/ubuntu/faceswap/plugins/train/model/villain.py", line 25, in __init__
    super().__init__(*args, **kwargs)
  File "/home/ubuntu/faceswap/plugins/train/model/original.py", line 25, in __init__
    super().__init__(*args, **kwargs)
  File "/home/ubuntu/faceswap/plugins/train/model/_base.py", line 125, in __init__
    self.build()
  File "/home/ubuntu/faceswap/plugins/train/model/_base.py", line 244, in build
    self.load_models(swapped=False)
  File "/home/ubuntu/faceswap/plugins/train/model/_base.py", line 456, in load_models
    is_loaded = network.load(fullpath=model_mapping[network.side][network.type])
  File "/home/ubuntu/faceswap/plugins/train/model/_base.py", line 834, in load
    network = load_model(self.filename, custom_objects=get_custom_objects())
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/engine/saving.py", line 287, in _deserialize_model
    K.batch_set_value(weight_value_tuples)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2470, in batch_set_value
    get_session().run(assign_ops, feed_dict=feed_dict)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 186, in get_session
    _SESSION = tf.Session(config=config)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1585, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 699, in __init__
    self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid device ordinal value (1). Valid range is [0, 0].
        while setting up XLA_GPU_JIT device number 1
05/21/2020 07:28:30 CRITICAL An unexpected crash has occurred. Crash report written to '/home/ubuntu/faceswap/crash_report.2020.05.21.072828229340.log'. You MUST provide this file if seeking assistance. Please verify you are running the latest version of faceswap before reporting

The crash log:

Code: Select all

05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.RandomNormal object at 0x7f4148532150> to <keras.initializers.VarianceScaling object at 0x7f40b8474110>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_12_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x7f40b8474110>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv2d_64_12
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.VarianceScaling object at 0x7f40b8474110>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x7f40b8474110> to <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       res_block                 DEBUG    input_tensor: Tensor("residual_64_12_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: residual_64_13
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_13_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_13_conv2d_0', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.RandomNormal object at 0x7f4148532150> to <keras.initializers.VarianceScaling object at 0x7f40b848d050>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_13_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x7f40b848d050>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv2d_64_13
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.VarianceScaling object at 0x7f40b848d050>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x7f40b848d050> to <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       res_block                 DEBUG    input_tensor: Tensor("residual_64_13_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: residual_64_14
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_14_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_14_conv2d_0', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.RandomNormal object at 0x7f4148532150> to <keras.initializers.VarianceScaling object at 0x7f40b84a8050>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_14_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x7f40b84a8050>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv2d_64_14
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.VarianceScaling object at 0x7f40b84a8050>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x7f40b84a8050> to <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       res_block                 DEBUG    input_tensor: Tensor("residual_64_14_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: residual_64_15
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_15_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_15_conv2d_0', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.RandomNormal object at 0x7f4148532150> to <keras.initializers.VarianceScaling object at 0x7f40b84420d0>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_15_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x7f40b84420d0>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv2d_64_15
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.VarianceScaling object at 0x7f40b84420d0>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x7f40b84420d0> to <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       res_block                 DEBUG    input_tensor: Tensor("residual_64_15_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: residual_64_16
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_16_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_16_conv2d_0', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.RandomNormal object at 0x7f4148532150> to <keras.initializers.VarianceScaling object at 0x7f40b845b0d0>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_16_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x7f40b845b0d0>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv2d_64_16
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.VarianceScaling object at 0x7f40b845b0d0>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x7f40b845b0d0> to <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       res_block                 DEBUG    input_tensor: Tensor("residual_64_16_leakyrelu_3/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: residual_64_17
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_17_leakyrelu_0/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'residual_64_17_conv2d_0', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.RandomNormal object at 0x7f4148532150> to <keras.initializers.VarianceScaling object at 0x7f40b83f8050>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("residual_64_17_leakyrelu_1/LeakyRelu:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'kernel_initializer': <keras.initializers.VarianceScaling object at 0x7f40b83f8050>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv2d_64_17
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.VarianceScaling object at 0x7f40b83f8050>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _switch_kernel_initializer DEBUG    Switched kernel_initializer from <keras.initializers.VarianceScaling object at 0x7f40b83f8050> to <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv                      DEBUG    input_tensor: Tensor("add_23/add:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 5, strides: 2, use_instance_norm: False, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv_64_0
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("add_23/add:0", shape=(?, 64, 64, 128), dtype=float32), filters: 128, kernel_size: 5, strides: 2, padding: same, kwargs: {'name': 'conv_64_0_conv2d', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv                      DEBUG    input_tensor: Tensor("pixel_shuffler_1/Reshape_1:0", shape=(?, 64, 64, 32), dtype=float32), filters: 128, kernel_size: 5, strides: 2, use_instance_norm: False, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv_64_1
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("pixel_shuffler_1/Reshape_1:0", shape=(?, 64, 64, 32), dtype=float32), filters: 128, kernel_size: 5, strides: 2, padding: same, kwargs: {'name': 'conv_64_1_conv2d', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv                      DEBUG    input_tensor: Tensor("pixel_shuffler_2/Reshape_1:0", shape=(?, 64, 64, 32), dtype=float32), filters: 128, kernel_size: 5, strides: 2, use_instance_norm: False, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv_64_2
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("pixel_shuffler_2/Reshape_1:0", shape=(?, 64, 64, 32), dtype=float32), filters: 128, kernel_size: 5, strides: 2, padding: same, kwargs: {'name': 'conv_64_2_conv2d', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv_sep                  DEBUG    input_tensor: Tensor("conv_64_2_leakyrelu/LeakyRelu:0", shape=(?, 32, 32, 128), dtype=float32), filters: 256, kernel_size: 5, strides: 2, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: separableconv2d_32_0
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv                      DEBUG    input_tensor: Tensor("separableconv2d_32_0_relu/Relu:0", shape=(?, 16, 16, 256), dtype=float32), filters: 512, kernel_size: 5, strides: 2, use_instance_norm: False, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: conv_16_0
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("separableconv2d_32_0_relu/Relu:0", shape=(?, 16, 16, 256), dtype=float32), filters: 512, kernel_size: 5, strides: 2, padding: same, kwargs: {'name': 'conv_16_0_conv2d', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv_sep                  DEBUG    input_tensor: Tensor("conv_16_0_leakyrelu/LeakyRelu:0", shape=(?, 8, 8, 512), dtype=float32), filters: 1024, kernel_size: 5, strides: 2, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: separableconv2d_8_0
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       upscale                   DEBUG    input_tensor: Tensor("reshape_1/Reshape:0", shape=(?, 8, 8, 1024), dtype=float32), filters: 512, kernel_size: 3, use_instance_norm: False, kwargs: {'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _get_name                 DEBUG    Generating block name: upscale_8_0
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       conv2d                    DEBUG    input_tensor: Tensor("reshape_1/Reshape:0", shape=(?, 8, 8, 1024), dtype=float32), filters: 2048, kernel_size: 3, strides: (1, 1), padding: same, kwargs: {'name': 'upscale_8_0_conv2d', 'kernel_initializer': <keras.initializers.RandomNormal object at 0x7f4148532150>})
05/21/2020 07:28:25 MainProcess     _training_0     nn_blocks       _set_default_initializer  DEBUG    Using model specified initializer: <keras.initializers.RandomNormal object at 0x7f4148532150>
05/21/2020 07:28:25 MainProcess     _training_0     _base           add_network               DEBUG    network_type: 'encoder', side: 'None', network: '<keras.engine.training.Model object at 0x7f40b8388d90>', is_output: False
05/21/2020 07:28:25 MainProcess     _training_0     _base           name                      DEBUG    model name: 'villain'
05/21/2020 07:28:25 MainProcess     _training_0     _base           add_network               DEBUG    name: 'encoder', filename: 'villain_encoder.h5'
05/21/2020 07:28:25 MainProcess     _training_0     _base           __init__                  DEBUG    Initializing NNMeta: (filename: '/home/ubuntu/myfolder/faceswap-project/models/face1face2/villain_encoder.h5', network_type: 'encoder', side: 'None', network: <keras.engine.training.Model object at 0x7f40b8388d90>, is_output: False
05/21/2020 07:28:26 MainProcess     _training_0     _base           __init__                  DEBUG    Initialized NNMeta
05/21/2020 07:28:26 MainProcess     _training_0     original        add_networks              DEBUG    Added networks
05/21/2020 07:28:26 MainProcess     _training_0     _base           load_models               DEBUG    Load model: (swapped: False)
05/21/2020 07:28:26 MainProcess     _training_0     _base           models_exist              DEBUG    Pre-existing models exist: True
05/21/2020 07:28:26 MainProcess     _training_0     _base           models_exist              DEBUG    Pre-existing models exist: True
05/21/2020 07:28:26 MainProcess     _training_0     module_wrapper  _tfmw_add_deprecation_warning DEBUG    From /home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:95: The name tf.reset_default_graph is deprecated. Please use tf.compat.v1.reset_default_graph instead.\n
05/21/2020 07:28:26 MainProcess     _training_0     module_wrapper  _tfmw_add_deprecation_warning DEBUG    From /home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:98: The name tf.placeholder_with_default is deprecated. Please use tf.compat.v1.placeholder_with_default instead.\n
05/21/2020 07:28:26 MainProcess     _training_0     _base           map_models                DEBUG    Map models: (swapped: False)
05/21/2020 07:28:26 MainProcess     _training_0     _base           map_models                DEBUG    Mapped models: (models_map: {'a': {'decoder': '/home/ubuntu/myfolder/faceswap-project/models/face1face2/villain_decoder_A.h5'}, 'b': {'decoder': '/home/ubuntu/myfolder/faceswap-project/models/face1face2/villain_decoder_B.h5'}})
05/21/2020 07:28:26 MainProcess     _training_0     _base           load                      DEBUG    Loading model: '/home/ubuntu/myfolder/faceswap-project/models/face1face2/villain_decoder_A.h5'
05/21/2020 07:28:27 MainProcess     _training_0     multithreading  run                       DEBUG    Error in thread (_training_0): Invalid device ordinal value (1). Valid range is [0, 0].\n	while setting up XLA_GPU_JIT device number 1
05/21/2020 07:28:28 MainProcess     MainThread      train           _monitor                  DEBUG    Thread error detected
05/21/2020 07:28:28 MainProcess     MainThread      train           _monitor                  DEBUG    Closed Monitor
05/21/2020 07:28:28 MainProcess     MainThread      train           _end_thread               DEBUG    Ending Training thread
05/21/2020 07:28:28 MainProcess     MainThread      train           _end_thread               CRITICAL Error caught! Exiting...
05/21/2020 07:28:28 MainProcess     MainThread      multithreading  join                      DEBUG    Joining Threads: '_training'
05/21/2020 07:28:28 MainProcess     MainThread      multithreading  join                      DEBUG    Joining Thread: '_training_0'
05/21/2020 07:28:28 MainProcess     MainThread      multithreading  join                      ERROR    Caught exception in thread: '_training_0'
Traceback (most recent call last):
  File "/home/ubuntu/faceswap/lib/cli/launcher.py", line 155, in execute_script
    process.process()
  File "/home/ubuntu/faceswap/scripts/train.py", line 161, in process
    self._end_thread(thread, err)
  File "/home/ubuntu/faceswap/scripts/train.py", line 201, in _end_thread
    thread.join()
  File "/home/ubuntu/faceswap/lib/multithreading.py", line 121, in join
    raise thread.err[1].with_traceback(thread.err[2])
  File "/home/ubuntu/faceswap/lib/multithreading.py", line 37, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/faceswap/scripts/train.py", line 226, in _training
    raise err
  File "/home/ubuntu/faceswap/scripts/train.py", line 214, in _training
    model = self._load_model()
  File "/home/ubuntu/faceswap/scripts/train.py", line 255, in _load_model
    predict=False)
  File "/home/ubuntu/faceswap/plugins/train/model/villain.py", line 25, in __init__
    super().__init__(*args, **kwargs)
  File "/home/ubuntu/faceswap/plugins/train/model/original.py", line 25, in __init__
    super().__init__(*args, **kwargs)
  File "/home/ubuntu/faceswap/plugins/train/model/_base.py", line 125, in __init__
    self.build()
  File "/home/ubuntu/faceswap/plugins/train/model/_base.py", line 244, in build
    self.load_models(swapped=False)
  File "/home/ubuntu/faceswap/plugins/train/model/_base.py", line 456, in load_models
    is_loaded = network.load(fullpath=model_mapping[network.side][network.type])
  File "/home/ubuntu/faceswap/plugins/train/model/_base.py", line 834, in load
    network = load_model(self.filename, custom_objects=get_custom_objects())
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/engine/saving.py", line 419, in load_model
    model = _deserialize_model(f, custom_objects, compile)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/engine/saving.py", line 287, in _deserialize_model
    K.batch_set_value(weight_value_tuples)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 2470, in batch_set_value
    get_session().run(assign_ops, feed_dict=feed_dict)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 186, in get_session
    _SESSION = tf.Session(config=config)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1585, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/ubuntu/anaconda3/envs/faceswap/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 699, in __init__
    self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Invalid device ordinal value (1). Valid range is [0, 0].
	while setting up XLA_GPU_JIT device number 1

============ System Information ============
encoding:            UTF-8
git_branch:          master
git_commits:         ac40b0f Remove subpixel upscaling option (#1024)
gpu_cuda:            10.0
gpu_cudnn:           7.6.5
gpu_devices:         GPU_0: Tesla K80, GPU_1: Tesla K80, GPU_2: Tesla K80, GPU_3: Tesla K80, GPU_4: Tesla K80, GPU_5: Tesla K80, GPU_6: Tesla K80, GPU_7: Tesla K80
gpu_devices_active:  GPU_0, GPU_1, GPU_2, GPU_3, GPU_4, GPU_5, GPU_6, GPU_7
gpu_driver:          440.33.01
gpu_vram:            GPU_0: 11441MB, GPU_1: 11441MB, GPU_2: 11441MB, GPU_3: 11441MB, GPU_4: 11441MB, GPU_5: 11441MB, GPU_6: 11441MB, GPU_7: 11441MB
os_machine:          x86_64
os_platform:         Linux-5.3.0-1017-aws-x86_64-with-debian-buster-sid
os_release:          5.3.0-1017-aws
py_command:          /home/ubuntu/faceswap/faceswap.py train -A /home/ubuntu/myfolder/faceswap-project/face1/output -ala /home/ubuntu/myfolder/faceswap-project/face1/face1.fsa -B /home/ubuntu/myfolder/faceswap-project/face2/output -alb /home/ubuntu/myfolder/faceswap-project/face2/face2.fsa -m /home/ubuntu/myfolder/faceswap-project/models/face1face2 -t villain -bs 100 -it 1000000 -g 1 -s 50 -ss 25000 -ps 50 -ag -wl -L INFO -w
py_conda_version:    conda 4.8.3
py_implementation:   CPython
py_version:          3.7.7
py_virtual_env:      True
sys_cores:           32
sys_processor:       x86_64
sys_ram:             Total: 491594MB, Available: 485096MB, Used: 1709MB, Free: 481857MB

=============== Pip Packages ===============
absl-py==0.9.0
astor==0.8.0
certifi==2020.4.5.1
cloudpickle==1.4.1
cycler==0.10.0
cytoolz==0.10.1
dask==2.16.0
decorator==4.4.2
fastcluster==1.1.26
ffmpy==0.2.2
gast==0.2.2
google-pasta==0.2.0
grpcio==1.27.2
h5py==2.9.0
imageio==2.6.1
imageio-ffmpeg==0.4.2
joblib==0.14.1
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.2.0
Markdown==3.1.1
matplotlib==3.1.3
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
networkx==2.4
numpy==1.17.4
nvidia-ml-py3==7.352.1
olefile==0.46
opencv-python==4.1.2.30
opt-einsum==3.1.0
pathlib==1.0.1
Pillow==6.2.1
protobuf==3.11.4
psutil==5.7.0
pyparsing==2.4.7
python-dateutil==2.8.1
pytz==2020.1
PyWavelets==1.1.1
PyYAML==5.3.1
scikit-image==0.16.2
scikit-learn==0.22.1
scipy==1.4.1
six==1.14.0
tensorboard==1.15.0
tensorflow==1.15.0
tensorflow-estimator==1.15.1
termcolor==1.1.0
toolz==0.10.0
toposort==1.5
tornado==6.0.4
tqdm==4.46.0
webencodings==0.5.1
Werkzeug==0.16.1
wrapt==1.12.1

============== Conda Packages ==============
# packages in environment at /home/ubuntu/anaconda3/envs/faceswap:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
_tflow_select 2.1.0 gpu
absl-py 0.9.0 py37_0
astor 0.8.0 py37_0
blas 1.0 mkl
bzip2 1.0.8 h516909a_2 conda-forge c-ares 1.15.0 h7b6447c_1001
ca-certificates 2020.1.1 0
certifi 2020.4.5.1 py37_0
cloudpickle 1.4.1 py_0
cudatoolkit 10.0.130 0
cudnn 7.6.5 cuda10.0_0
cupti 10.0.130 0
cycler 0.10.0 py37_0
cytoolz 0.10.1 py37h7b6447c_0
dask-core 2.16.0 py_0
dbus 1.13.14 hb2f20db_0
decorator 4.4.2 py_0
expat 2.2.6 he6710b0_0
fastcluster 1.1.26 py37hb3f55d8_0 conda-forge ffmpeg 4.2 h167e202_0 conda-forge ffmpy 0.2.2 pypi_0 pypi fontconfig 2.13.0 h9420a91_0
freetype 2.9.1 h8a8886c_1
gast 0.2.2 py37_0
git 2.23.0 pl526hacde149_0
glib 2.63.1 h3eb4bd4_1
gmp 6.2.0 he1b5a44_2 conda-forge gnutls 3.6.5 hd3a4fd2_1002 conda-forge google-pasta 0.2.0 py_0
grpcio 1.27.2 py37hf8bcb03_0
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb31296c_0
h5py 2.9.0 py37h7918eee_0
hdf5 1.10.4 hb1b8bf9_0
icu 58.2 he6710b0_3
imageio 2.6.1 py37_0
imageio-ffmpeg 0.4.2 py_0 conda-forge intel-openmp 2020.1 217
joblib 0.14.1 py_0
jpeg 9b h024ee3a_2
keras 2.2.4 0
keras-applications 1.0.8 py_0
keras-base 2.2.4 py37_0
keras-preprocessing 1.1.0 py_1
kiwisolver 1.2.0 py37hfd86e86_0
krb5 1.17.1 h173b8e3_0
lame 3.100 h14c3975_1001 conda-forge ld_impl_linux-64 2.33.1 h53a641e_7
libcurl 7.69.1 h20c2e04_0
libedit 3.1.20181209 hc058e9b_0
libffi 3.3 he6710b0_1
libgcc-ng 9.1.0 hdf63c60_0
libgfortran-ng 7.3.0 hdf63c60_0
libiconv 1.15 h516909a_1006 conda-forge libpng 1.6.37 hbc83047_0
libprotobuf 3.11.4 hd408876_0
libssh2 1.9.0 h1ba5d50_1
libstdcxx-ng 9.1.0 hdf63c60_0
libtiff 4.1.0 h2733197_0
libuuid 1.0.3 h1bed415_2
libxcb 1.13 h1bed415_1
libxml2 2.9.9 hea5a465_1
markdown 3.1.1 py37_0
matplotlib 3.1.1 py37h5429711_0
matplotlib-base 3.1.3 py37hef1b27d_0
mkl 2020.1 217
mkl-service 2.3.0 py37he904b0f_0
mkl_fft 1.0.15 py37ha843d7b_0
mkl_random 1.1.0 py37hd6b4f25_0
ncurses 6.2 he6710b0_1
nettle 3.4.1 h1bed415_1002 conda-forge networkx 2.4 py_0
numpy 1.17.4 py37hc1035e2_0
numpy-base 1.17.4 py37hde5b4d6_0
nvidia-ml-py3 7.352.1 pypi_0 pypi olefile 0.46 py37_0
opencv-python 4.1.2.30 pypi_0 pypi openh264 1.8.0 hdbcaa40_1000 conda-forge openssl 1.1.1g h7b6447c_0
opt_einsum 3.1.0 py_0
pathlib 1.0.1 py37_1
pcre 8.43 he6710b0_0
perl 5.26.2 h14c3975_0
pillow 6.2.1 py37h34e0f95_0
pip 20.0.2 py37_3
protobuf 3.11.4 py37he6710b0_0
psutil 5.7.0 py37h7b6447c_0
pyparsing 2.4.7 py_0
pyqt 5.9.2 py37h05f1152_2
python 3.7.7 hcff3b4d_5
python-dateutil 2.8.1 py_0
python_abi 3.7 1_cp37m conda-forge pytz 2020.1 py_0
pywavelets 1.1.1 py37h7b6447c_0
pyyaml 5.3.1 py37h7b6447c_0
qt 5.9.7 h5867ecd_1
readline 8.0 h7b6447c_0
scikit-image 0.16.2 py37h0573a6f_0
scikit-learn 0.22.1 py37hd81dba3_0
scipy 1.4.1 py37h0b6359f_0
setuptools 46.4.0 py37_0
sip 4.19.8 py37hf484d3e_0
six 1.14.0 py37_0
sqlite 3.31.1 h62c20be_1
tensorboard 1.15.0 pyhb230dea_0
tensorflow 1.15.0 gpu_py37h0f0df58_0
tensorflow-base 1.15.0 gpu_py37h9dcbed7_0
tensorflow-estimator 1.15.1 pyh2649769_0
tensorflow-gpu 1.15.0 h0d30ee6_0
termcolor 1.1.0 py37_1
tk 8.6.8 hbc83047_0
toolz 0.10.0 py_0
toposort 1.5 py_3 conda-forge tornado 6.0.4 py37h7b6447c_1
tqdm 4.46.0 py_0
webencodings 0.5.1 py37_1
werkzeug 0.16.1 py_0
wheel 0.34.2 py37_0
wrapt 1.12.1 py37h7b6447c_1
x264 1!152.20180806 h14c3975_0 conda-forge xz 5.2.5 h7b6447c_0
yaml 0.1.7 had09818_2
zlib 1.2.11 h7b6447c_3
zstd 1.3.7 h0b5b093_0 =============== State File ================= { "name": "villain", "sessions": { "1": { "timestamp": 1589747179.0678897, "no_logs": false, "pingpong": false, "loss_names": { "a": [ "face_loss" ], "b": [ "face_loss" ] }, "batchsize": 32, "iterations": 617, "config": { "learning_rate": 5e-05 } }, "2": { "timestamp": 1589752564.8719282, "no_logs": false, "pingpong": false, "loss_names": { "a": [ "face_loss" ], "b": [ "face_loss" ] }, "batchsize": 32, "iterations": 15701, "config": { "learning_rate": 5e-05 } }, "3": { "timestamp": 1589915467.228661, "no_logs": false, "pingpong": false, "loss_names": { "a": [ "face_loss" ], "b": [ "face_loss" ] }, "batchsize": 32, "iterations": 8451, "config": { "learning_rate": 5e-05 } } }, "lowest_avg_loss": { "a": 0.011524430494755506, "b": 0.013505328968167305 }, "iterations": 24769, "inputs": { "face_in:0": [ 128, 128, 3 ], "mask_in:0": [ 128, 128, 1 ] }, "training_size": 256, "config": { "coverage": 100.0, "mask_type": "vgg-clear", "mask_blur_kernel": 3, "mask_threshold": 4, "learn_mask": false, "icnr_init": false, "conv_aware_init": false, "reflect_padding": false, "penalized_mask_loss": true, "loss_function": "mae", "learning_rate": 5e-05, "lowmem": false } } ================= Configs ================== --------- convert.ini --------- [mask.mask_blend] type: normalized kernel_size: 3 passes: 4 threshold: 4 erosion: 0.0 [mask.box_blend] type: gaussian distance: 11.0 radius: 5.0 passes: 1 [color.color_transfer] clip: True preserve_paper: True [color.manual_balance] colorspace: HSV balance_1: 0.0 balance_2: 0.0 balance_3: 0.0 contrast: 0.0 brightness: 0.0 [color.match_hist] threshold: 99.0 [scaling.sharpen] method: unsharp_mask amount: 150 radius: 0.3 threshold: 5.0 [writer.ffmpeg] container: mp4 codec: libx264 crf: 23 preset: medium tune: none profile: auto level: auto [writer.gif] fps: 25 loop: 0 palettesize: 256 subrectangles: False [writer.opencv] format: png draw_transparent: False jpg_quality: 75 png_compress_level: 3 [writer.pillow] format: png draw_transparent: False optimize: False gif_interlace: True jpg_quality: 75 png_compress_level: 3 tif_compression: tiff_deflate --------- .faceswap --------- backend: nvidia --------- extract.ini --------- [global] allow_growth: False [mask.vgg_obstructed] batch-size: 2 [mask.vgg_clear] batch-size: 6 [mask.unet_dfl] batch-size: 8 [align.fan] batch-size: 12 [detect.mtcnn] minsize: 20 threshold_1: 0.6 threshold_2: 0.7 threshold_3: 0.7 scalefactor: 0.709 batch-size: 8 [detect.s3fd] confidence: 70 batch-size: 4 [detect.cv2_dnn] confidence: 50 --------- train.ini --------- [global] coverage: 100 mask_type: vgg-clear mask_blur_kernel: 3 mask_threshold: 4 learn_mask: True icnr_init: False conv_aware_init: False reflect_padding: False penalized_mask_loss: True loss_function: mae learning_rate: 5e-05 [trainer.original] preview_images: 14 zoom_amount: 5 rotation_range: 10 shift_range: 5 flip_chance: 50 color_lightness: 30 color_ab: 8 color_clahe_chance: 50 color_clahe_max_size: 4 [model.dfl_sae] input_size: 128 clipnorm: True architecture: df autoencoder_dims: 0 encoder_dims: 42 decoder_dims: 21 multiscale_decoder: False [model.dfl_h128] lowmem: False [model.realface] input_size: 64 output_size: 128 dense_nodes: 1536 complexity_encoder: 128 complexity_decoder: 512 [model.villain] lowmem: False [model.original] lowmem: False [model.unbalanced] input_size: 128 lowmem: False clipnorm: True nodes: 1024 complexity_encoder: 128 complexity_decoder_a: 384 complexity_decoder_b: 512 [model.dlight] features: best details: good output_size: 256

Re: Invalid device ordinal value (1). Valid range is [0, 0]

Posted: Thu May 21, 2020 9:49 am
by torzdf

Without knowing the ins and outs of how AWS build their VM images, I'm not going to be able to diagnose this.

However, this is a Tensorflow issue, so googling around the error will hopefully find you a solution. You can start here:
https://github.com/tensorflow/tensorflow/issues/32793