Hello, I have tried the whole pipeline in the apple m1 using the CLI, and it worked very well.
I moved everything to the AWS Linux GPU instance. I encountered this error:
This error happened during training
at Iter 20 every time.
Please let me know if there is a place to upload the crash_log. Thanks.