device=cuda:0, depth=4, d_hidden=256, epochs=100, seed=123 eval buffer: torch.Size([1024, 3072]) === DFA training === [DFA] Ep 0: ||h||_med=[8.690464973449707, 42.59831619262695, 44.3453369140625, 42.22452926635742, 42.9206428527832] ||g||_med=[0.001107031712308526, 0.00020295626018196344, 0.00014012052270118147, 0.00011155186803080142, 9.737444634083658e-05] acc=0.1025 [DFA] Ep 1: ||h_L||=4.633e+03 ||g_2||=6.313e-07 acc=0.0889 gamma_dfa=0.0291 [DFA] Ep 2: ||h_L||=2.086e+04 ||g_2||=1.040e-07 acc=0.0850 gamma_dfa=0.0262 [DFA] Ep 3: ||h_L||=4.860e+04 ||g_2||=3.898e-08 acc=0.1299 gamma_dfa=0.0251 [DFA] Ep 4: ||h_L||=9.117e+04 ||g_2||=2.015e-08 acc=0.1250 gamma_dfa=0.0267 [DFA] Ep 5: ||h_L||=1.457e+05 ||g_2||=1.234e-08 acc=0.1152 gamma_dfa=0.0230 [DFA] Ep 6: ||h_L||=2.115e+05 ||g_2||=8.721e-09 acc=0.1094 gamma_dfa=0.0167 [DFA] Ep 7: ||h_L||=2.991e+05 ||g_2||=6.664e-09 acc=0.1104 gamma_dfa=0.0098 [DFA] Ep 8: ||h_L||=4.055e+05 ||g_2||=5.241e-09 acc=0.1084 gamma_dfa=0.0026 [DFA] Ep 9: ||h_L||=5.148e+05 ||g_2||=4.316e-09 acc=0.0879 gamma_dfa=-0.0046 [DFA] Ep 10: ||h_L||=6.453e+05 ||g_2||=3.792e-09 acc=0.0771 gamma_dfa=-0.0082 [DFA] Ep 11: ||h_L||=8.114e+05 ||g_2||=3.485e-09 acc=0.1074 gamma_dfa=-0.0135 [DFA] Ep 12: ||h_L||=9.868e+05 ||g_2||=3.308e-09 acc=0.1006 gamma_dfa=-0.0171 [DFA] Ep 13: ||h_L||=1.184e+06 ||g_2||=3.072e-09 acc=0.0889 gamma_dfa=-0.0225 [DFA] Ep 14: ||h_L||=1.390e+06 ||g_2||=3.024e-09 acc=0.0830 gamma_dfa=-0.0248 [DFA] Ep 15: ||h_L||=1.619e+06 ||g_2||=2.885e-09 acc=0.0977 gamma_dfa=-0.0281 [DFA] Ep 16: ||h_L||=1.861e+06 ||g_2||=2.708e-09 acc=0.1055 gamma_dfa=-0.0306 [DFA] Ep 17: ||h_L||=2.124e+06 ||g_2||=2.520e-09 acc=0.1064 gamma_dfa=-0.0322 [DFA] Ep 18: ||h_L||=2.399e+06 ||g_2||=2.367e-09 acc=0.1064 gamma_dfa=-0.0333 [DFA] Ep 19: ||h_L||=2.733e+06 ||g_2||=2.217e-09 acc=0.1064 gamma_dfa=-0.0335 [DFA] Ep 20: ||h_L||=3.105e+06 ||g_2||=2.069e-09 acc=0.1045 gamma_dfa=-0.0344 [DFA] Ep 21: ||h_L||=3.456e+06 ||g_2||=1.932e-09 acc=0.1045 gamma_dfa=-0.0352 [DFA] Ep 22: ||h_L||=3.835e+06 ||g_2||=1.813e-09 acc=0.1045 gamma_dfa=-0.0362 [DFA] Ep 23: ||h_L||=4.269e+06 ||g_2||=1.670e-09 acc=0.1045 gamma_dfa=-0.0372 [DFA] Ep 24: ||h_L||=4.714e+06 ||g_2||=1.561e-09 acc=0.1045 gamma_dfa=-0.0378 [DFA] Ep 25: ||h_L||=5.140e+06 ||g_2||=1.458e-09 acc=0.1045 gamma_dfa=-0.0386 [DFA] Ep 26: ||h_L||=5.621e+06 ||g_2||=1.359e-09 acc=0.1045 gamma_dfa=-0.0396 [DFA] Ep 27: ||h_L||=6.045e+06 ||g_2||=1.279e-09 acc=0.1045 gamma_dfa=-0.0402 [DFA] Ep 28: ||h_L||=6.541e+06 ||g_2||=1.201e-09 acc=0.1045 gamma_dfa=-0.0409 [DFA] Ep 29: ||h_L||=6.999e+06 ||g_2||=1.132e-09 acc=0.1045 gamma_dfa=-0.0414 [DFA] Ep 30: ||h_L||=7.506e+06 ||g_2||=1.061e-09 acc=0.1045 gamma_dfa=-0.0423 [DFA] Ep 31: ||h_L||=7.924e+06 ||g_2||=1.003e-09 acc=0.1045 gamma_dfa=-0.0428 [DFA] Ep 32: ||h_L||=8.527e+06 ||g_2||=9.494e-10 acc=0.1045 gamma_dfa=-0.0438 [DFA] Ep 33: ||h_L||=9.051e+06 ||g_2||=8.977e-10 acc=0.1045 gamma_dfa=-0.0442 [DFA] Ep 34: ||h_L||=9.560e+06 ||g_2||=8.499e-10 acc=0.1064 gamma_dfa=-0.0447 [DFA] Ep 35: ||h_L||=1.015e+07 ||g_2||=8.131e-10 acc=0.1064 gamma_dfa=-0.0450 [DFA] Ep 36: ||h_L||=1.072e+07 ||g_2||=7.740e-10 acc=0.1035 gamma_dfa=-0.0455 [DFA] Ep 37: ||h_L||=1.138e+07 ||g_2||=7.381e-10 acc=0.1055 gamma_dfa=-0.0458 [DFA] Ep 38: ||h_L||=1.195e+07 ||g_2||=7.013e-10 acc=0.1055 gamma_dfa=-0.0464 [DFA] Ep 39: ||h_L||=1.254e+07 ||g_2||=6.774e-10 acc=0.1055 gamma_dfa=-0.0464 [DFA] Ep 40: ||h_L||=1.310e+07 ||g_2||=6.487e-10 acc=0.1055 gamma_dfa=-0.0468 [DFA] Ep 41: ||h_L||=1.374e+07 ||g_2||=6.225e-10 acc=0.1055 gamma_dfa=-0.0472 [DFA] Ep 42: ||h_L||=1.425e+07 ||g_2||=5.996e-10 acc=0.1045 gamma_dfa=-0.0476 [DFA] Ep 43: ||h_L||=1.481e+07 ||g_2||=5.769e-10 acc=0.1064 gamma_dfa=-0.0479 [DFA] Ep 44: ||h_L||=1.541e+07 ||g_2||=5.581e-10 acc=0.1055 gamma_dfa=-0.0478 [DFA] Ep 45: ||h_L||=1.599e+07 ||g_2||=5.421e-10 acc=0.0996 gamma_dfa=-0.0483 [DFA] Ep 46: ||h_L||=1.666e+07 ||g_2||=5.243e-10 acc=0.0986 gamma_dfa=-0.0485 [DFA] Ep 47: ||h_L||=1.729e+07 ||g_2||=5.089e-10 acc=0.1016 gamma_dfa=-0.0487 [DFA] Ep 48: ||h_L||=1.781e+07 ||g_2||=4.924e-10 acc=0.1025 gamma_dfa=-0.0493 [DFA] Ep 49: ||h_L||=1.843e+07 ||g_2||=4.781e-10 acc=0.0986 gamma_dfa=-0.0491 [DFA] Ep 50: ||h_L||=1.904e+07 ||g_2||=4.645e-10 acc=0.0967 gamma_dfa=-0.0492 [DFA] Ep 51: ||h_L||=1.963e+07 ||g_2||=4.525e-10 acc=0.0986 gamma_dfa=-0.0493 [DFA] Ep 52: ||h_L||=2.018e+07 ||g_2||=4.412e-10 acc=0.1016 gamma_dfa=-0.0493 [DFA] Ep 53: ||h_L||=2.075e+07 ||g_2||=4.318e-10 acc=0.1016 gamma_dfa=-0.0495 [DFA] Ep 54: ||h_L||=2.129e+07 ||g_2||=4.220e-10 acc=0.1025 gamma_dfa=-0.0497 [DFA] Ep 55: ||h_L||=2.177e+07 ||g_2||=4.106e-10 acc=0.0986 gamma_dfa=-0.0498 [DFA] Ep 56: ||h_L||=2.230e+07 ||g_2||=4.025e-10 acc=0.0977 gamma_dfa=-0.0499 [DFA] Ep 57: ||h_L||=2.282e+07 ||g_2||=3.926e-10 acc=0.0986 gamma_dfa=-0.0499 [DFA] Ep 58: ||h_L||=2.332e+07 ||g_2||=3.848e-10 acc=0.1016 gamma_dfa=-0.0502 [DFA] Ep 59: ||h_L||=2.377e+07 ||g_2||=3.788e-10 acc=0.1016 gamma_dfa=-0.0503 [DFA] Ep 60: ||h_L||=2.424e+07 ||g_2||=3.706e-10 acc=0.0957 gamma_dfa=-0.0505 [DFA] Ep 61: ||h_L||=2.478e+07 ||g_2||=3.645e-10 acc=0.0986 gamma_dfa=-0.0505 [DFA] Ep 62: ||h_L||=2.525e+07 ||g_2||=3.598e-10 acc=0.0918 gamma_dfa=-0.0508 [DFA] Ep 63: ||h_L||=2.563e+07 ||g_2||=3.525e-10 acc=0.0957 gamma_dfa=-0.0507 [DFA] Ep 64: ||h_L||=2.602e+07 ||g_2||=3.496e-10 acc=0.0908 gamma_dfa=-0.0509 [DFA] Ep 65: ||h_L||=2.647e+07 ||g_2||=3.439e-10 acc=0.0889 gamma_dfa=-0.0510 [DFA] Ep 66: ||h_L||=2.684e+07 ||g_2||=3.379e-10 acc=0.0918 gamma_dfa=-0.0510 [DFA] Ep 67: ||h_L||=2.722e+07 ||g_2||=3.349e-10 acc=0.0889 gamma_dfa=-0.0510 [DFA] Ep 68: ||h_L||=2.761e+07 ||g_2||=3.320e-10 acc=0.0879 gamma_dfa=-0.0509 [DFA] Ep 69: ||h_L||=2.793e+07 ||g_2||=3.263e-10 acc=0.0918 gamma_dfa=-0.0509 [DFA] Ep 70: ||h_L||=2.820e+07 ||g_2||=3.249e-10 acc=0.0889 gamma_dfa=-0.0508 [DFA] Ep 71: ||h_L||=2.855e+07 ||g_2||=3.204e-10 acc=0.0850 gamma_dfa=-0.0510 [DFA] Ep 72: ||h_L||=2.879e+07 ||g_2||=3.177e-10 acc=0.0879 gamma_dfa=-0.0509 [DFA] Ep 73: ||h_L||=2.909e+07 ||g_2||=3.147e-10 acc=0.0859 gamma_dfa=-0.0510 [DFA] Ep 74: ||h_L||=2.932e+07 ||g_2||=3.129e-10 acc=0.0850 gamma_dfa=-0.0510 [DFA] Ep 75: ||h_L||=2.950e+07 ||g_2||=3.108e-10 acc=0.0840 gamma_dfa=-0.0510 [DFA] Ep 76: ||h_L||=2.974e+07 ||g_2||=3.089e-10 acc=0.0879 gamma_dfa=-0.0510 [DFA] Ep 77: ||h_L||=2.992e+07 ||g_2||=3.063e-10 acc=0.0889 gamma_dfa=-0.0509 [DFA] Ep 78: ||h_L||=3.012e+07 ||g_2||=3.051e-10 acc=0.0889 gamma_dfa=-0.0510 [DFA] Ep 79: ||h_L||=3.032e+07 ||g_2||=3.036e-10 acc=0.0879 gamma_dfa=-0.0511 [DFA] Ep 80: ||h_L||=3.045e+07 ||g_2||=3.023e-10 acc=0.0889 gamma_dfa=-0.0511 [DFA] Ep 81: ||h_L||=3.059e+07 ||g_2||=3.010e-10 acc=0.0879 gamma_dfa=-0.0511 [DFA] Ep 82: ||h_L||=3.073e+07 ||g_2||=3.001e-10 acc=0.0889 gamma_dfa=-0.0511 [DFA] Ep 83: ||h_L||=3.085e+07 ||g_2||=2.991e-10 acc=0.0850 gamma_dfa=-0.0512 [DFA] Ep 84: ||h_L||=3.096e+07 ||g_2||=2.982e-10 acc=0.0840 gamma_dfa=-0.0512 [DFA] Ep 85: ||h_L||=3.104e+07 ||g_2||=2.976e-10 acc=0.0889 gamma_dfa=-0.0512 [DFA] Ep 86: ||h_L||=3.113e+07 ||g_2||=2.967e-10 acc=0.0850 gamma_dfa=-0.0512 [DFA] Ep 87: ||h_L||=3.120e+07 ||g_2||=2.966e-10 acc=0.0889 gamma_dfa=-0.0512 [DFA] Ep 88: ||h_L||=3.126e+07 ||g_2||=2.957e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 89: ||h_L||=3.132e+07 ||g_2||=2.956e-10 acc=0.0889 gamma_dfa=-0.0512 [DFA] Ep 90: ||h_L||=3.137e+07 ||g_2||=2.949e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 91: ||h_L||=3.140e+07 ||g_2||=2.945e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 92: ||h_L||=3.144e+07 ||g_2||=2.947e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 93: ||h_L||=3.146e+07 ||g_2||=2.945e-10 acc=0.0889 gamma_dfa=-0.0512 [DFA] Ep 94: ||h_L||=3.148e+07 ||g_2||=2.944e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 95: ||h_L||=3.149e+07 ||g_2||=2.942e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 96: ||h_L||=3.150e+07 ||g_2||=2.942e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 97: ||h_L||=3.151e+07 ||g_2||=2.941e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 98: ||h_L||=3.151e+07 ||g_2||=2.941e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 99: ||h_L||=3.151e+07 ||g_2||=2.941e-10 acc=0.0879 gamma_dfa=-0.0512 [DFA] Ep 100: ||h_L||=3.151e+07 ||g_2||=2.941e-10 acc=0.0879 gamma_dfa=-0.0512 Saved results/h2_no_residual_full_s123/snapshot_evolution_s123.json