NO-OUT_LN VARIANT: depth=4, d_hidden=256, epochs=100, seed=42 === BP training (NO out_ln) === [BP-noLN] Ep 0: ||h_L||=8.893e+00 ||g||=5.483e-04 acc=0.1123 [BP-noLN] Ep 1: ||h_L||=3.310e+01 ||g||=1.421e-04 acc=0.4229 [BP-noLN] Ep 5: ||h_L||=5.238e+01 ||g||=9.984e-05 acc=0.5010 [BP-noLN] Ep 10: ||h_L||=6.111e+01 ||g||=8.972e-05 acc=0.5518 [BP-noLN] Ep 15: ||h_L||=6.640e+01 ||g||=8.892e-05 acc=0.5195 [BP-noLN] Ep 20: ||h_L||=6.977e+01 ||g||=8.730e-05 acc=0.5488 [BP-noLN] Ep 25: ||h_L||=7.080e+01 ||g||=8.598e-05 acc=0.5547 [BP-noLN] Ep 30: ||h_L||=7.441e+01 ||g||=8.621e-05 acc=0.5723 [BP-noLN] Ep 35: ||h_L||=7.537e+01 ||g||=7.991e-05 acc=0.6025 [BP-noLN] Ep 40: ||h_L||=7.552e+01 ||g||=8.747e-05 acc=0.5859 [BP-noLN] Ep 45: ||h_L||=7.571e+01 ||g||=8.227e-05 acc=0.5918 [BP-noLN] Ep 50: ||h_L||=7.514e+01 ||g||=9.716e-05 acc=0.5811 [BP-noLN] Ep 55: ||h_L||=7.693e+01 ||g||=9.600e-05 acc=0.6006 [BP-noLN] Ep 60: ||h_L||=7.581e+01 ||g||=9.903e-05 acc=0.6113 [BP-noLN] Ep 65: ||h_L||=7.549e+01 ||g||=1.009e-04 acc=0.6221 [BP-noLN] Ep 70: ||h_L||=7.530e+01 ||g||=1.088e-04 acc=0.6074 [BP-noLN] Ep 75: ||h_L||=7.454e+01 ||g||=1.082e-04 acc=0.6143 [BP-noLN] Ep 80: ||h_L||=7.426e+01 ||g||=1.162e-04 acc=0.6123 [BP-noLN] Ep 85: ||h_L||=7.353e+01 ||g||=1.161e-04 acc=0.6084 [BP-noLN] Ep 90: ||h_L||=7.339e+01 ||g||=1.168e-04 acc=0.6123 [BP-noLN] Ep 95: ||h_L||=7.308e+01 ||g||=1.164e-04 acc=0.6143 [BP-noLN] Ep 100: ||h_L||=7.297e+01 ||g||=1.158e-04 acc=0.6162 === DFA training (NO out_ln) === [DFA-noLN] Ep 0: ||h_L||=8.893e+00 ||g||=5.483e-04 acc=0.1123 [DFA-noLN] Ep 1: ||h_L||=1.560e+03 ||g||=6.859e-04 acc=0.1494 γ=0.0084 [DFA-noLN] Ep 5: ||h_L||=1.050e+04 ||g||=7.522e-04 acc=0.1748 γ=-0.0063 [DFA-noLN] Ep 10: ||h_L||=2.200e+04 ||g||=7.641e-04 acc=0.1445 γ=-0.0167 [DFA-noLN] Ep 15: ||h_L||=1.004e+05 ||g||=7.608e-04 acc=0.1738 γ=-0.0118 [DFA-noLN] Ep 20: ||h_L||=3.150e+05 ||g||=7.782e-04 acc=0.2070 γ=0.0027 [DFA-noLN] Ep 25: ||h_L||=6.817e+05 ||g||=7.884e-04 acc=0.1572 γ=0.0340 [DFA-noLN] Ep 30: ||h_L||=1.298e+06 ||g||=7.771e-04 acc=0.1299 γ=0.0393 [DFA-noLN] Ep 35: ||h_L||=2.143e+06 ||g||=7.980e-04 acc=0.0996 γ=0.0196 [DFA-noLN] Ep 40: ||h_L||=3.180e+06 ||g||=7.691e-04 acc=0.1016 γ=-0.0085 [DFA-noLN] Ep 45: ||h_L||=4.347e+06 ||g||=7.934e-04 acc=0.1582 γ=0.0262 [DFA-noLN] Ep 50: ||h_L||=5.552e+06 ||g||=7.869e-04 acc=0.2197 γ=0.0165 [DFA-noLN] Ep 55: ||h_L||=6.742e+06 ||g||=7.851e-04 acc=0.1885 γ=0.0046 [DFA-noLN] Ep 60: ||h_L||=7.801e+06 ||g||=7.600e-04 acc=0.1572 γ=0.0045 [DFA-noLN] Ep 65: ||h_L||=8.775e+06 ||g||=7.795e-04 acc=0.2031 γ=0.0088 [DFA-noLN] Ep 70: ||h_L||=9.556e+06 ||g||=7.968e-04 acc=0.1836 γ=0.0093 [DFA-noLN] Ep 75: ||h_L||=1.016e+07 ||g||=7.656e-04 acc=0.2490 γ=0.0168 [DFA-noLN] Ep 80: ||h_L||=1.064e+07 ||g||=7.633e-04 acc=0.2764 γ=0.0198 [DFA-noLN] Ep 85: ||h_L||=1.095e+07 ||g||=7.466e-04 acc=0.2773 γ=0.0195 [DFA-noLN] Ep 90: ||h_L||=1.107e+07 ||g||=7.453e-04 acc=0.2695 γ=0.0131 [DFA-noLN] Ep 95: ||h_L||=1.113e+07 ||g||=7.446e-04 acc=0.3105 γ=0.0146 [DFA-noLN] Ep 100: ||h_L||=1.113e+07 ||g||=7.392e-04 acc=0.3320 γ=0.0164 Saved results/snapshot_no_outln_v1/snapshot_noLN_s42.json