... while I am building my new pee-see...
1) What are the consequences of using a small batch (i.e. less then 6) compared to a large batch (..64 ?) Does this affect the convergence process/ learning time ? I have understood that working with small batches usually leads to better quality... but I suspect that there is more than that...=
2) Is it possible that using "more detailed trainers" (for example: Dlight in best features and good quality) can lead to a "standoff" in learning ? (i.e: the score does not go below 0.030), while a simple DFL-H128 reaches a 0.016 without issues on the same training set ? ... and yes... I have tried to swap A and B to see if this was due to the unbalanced decoders.
3) Is the "face loss" a "universal" score ? ( is a score of 0.010 for Villain as good as 0.010 for Realface ?)