Search found 2 matches

by kimjamess
Mon Nov 21, 2022 5:05 pm
Forum: Training Discussion
Topic: Training freezes after 1100 iterations
Replies: 2
Views: 971

Re: Training freezes after 1100 iterations

Thank you for the reply. I'm now trying as your suggestions. But I actually don't get "storage access problem" you mentioned. Do you think the storage it is using is not enough? I tried with "bigger storage" by adjusting parameter(volume_size), higher save interval(5000), but nei...
by kimjamess
Sun Nov 20, 2022 3:13 pm
Forum: Training Discussion
Topic: Training freezes after 1100 iterations
Replies: 2
Views: 971

Training freezes after 1100 iterations

Hi, I am trying to train phaze-A model for 30k iterations, but it stops(freezes) after around 1100 iterations . So after 1100 iterations, it stops training (no updates in "phaze_a_state.json") without messages, no logs, no crash. I also checked CPU/GPU memory utilization during training, a...