1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
|
============================================================
WEAK REGULARIZATION Experiment (lambda_reg=0.01)
Job ID: 15112871 | Node: gpub023
Start: Thu Jan 1 12:26:50 CST 2026
============================================================
NVIDIA A40, 46068 MiB
============================================================
================================================================================
DEPTH SCALING BENCHMARK
================================================================================
Dataset: cifar100
Depths: [4, 8, 12, 16]
Timesteps: 4
Epochs: 150
λ_reg: 0.01, λ_target: -0.1
Reg type: squared, Warmup epochs: 20
Device: cuda
================================================================================
Loading cifar100...
Classes: 100, Input: (3, 32, 32)
Train: 50000, Test: 10000
Depth configurations: [(4, '4×1'), (8, '4×2'), (12, '4×3'), (16, '4×4')]
Regularization type: squared
Warmup epochs: 20
Stable init: False
============================================================
Depth = 4 conv layers (4 stages × 1 blocks)
============================================================
Vanilla: depth=4, params=1,756,836
Epoch 10: train=0.498 test=0.419 σ=9.41e-01/3.52e-08
Epoch 20: train=0.628 test=0.476 σ=5.85e-01/2.43e-08
Epoch 30: train=0.704 test=0.536 σ=4.86e-01/2.02e-08
Epoch 40: train=0.756 test=0.544 σ=4.13e-01/1.73e-08
Epoch 50: train=0.800 test=0.569 σ=3.81e-01/1.57e-08
Epoch 60: train=0.833 test=0.560 σ=3.37e-01/1.37e-08
Epoch 70: train=0.863 test=0.585 σ=3.17e-01/1.29e-08
Epoch 80: train=0.885 test=0.595 σ=3.04e-01/1.22e-08
Epoch 90: train=0.904 test=0.601 σ=2.80e-01/1.08e-08
Epoch 100: train=0.923 test=0.599 σ=2.68e-01/1.02e-08
Epoch 110: train=0.935 test=0.613 σ=2.64e-01/9.79e-09
Epoch 120: train=0.945 test=0.606 σ=2.43e-01/8.88e-09
Epoch 130: train=0.948 test=0.612 σ=2.48e-01/9.01e-09
Epoch 140: train=0.952 test=0.616 σ=2.24e-01/8.47e-09
Epoch 150: train=0.952 test=0.616 σ=2.31e-01/8.63e-09
Best test acc: 0.618
Lyapunov: depth=4, params=1,756,836
Epoch 10: train=0.461 test=0.286 λ=1.949 σ=9.11e-01/3.46e-08
Epoch 20: train=0.458 test=0.010 λ=1.465 σ=5.22e-01/2.10e-08
Epoch 30: train=0.513 test=0.017 λ=1.736 σ=4.33e-01/1.78e-08
Epoch 40: train=0.558 test=0.010 λ=1.767 σ=3.64e-01/1.59e-08
Epoch 50: train=0.592 test=0.010 λ=1.791 σ=3.31e-01/1.49e-08
Epoch 60: train=0.627 test=0.016 λ=1.766 σ=3.16e-01/1.43e-08
Epoch 70: train=0.658 test=0.011 λ=1.765 σ=3.10e-01/1.37e-08
Epoch 80: train=0.681 test=0.015 λ=1.770 σ=2.97e-01/1.33e-08
Epoch 90: train=0.705 test=0.012 λ=1.784 σ=2.85e-01/1.28e-08
Epoch 100: train=0.730 test=0.012 λ=1.784 σ=2.86e-01/1.27e-08
Epoch 110: train=0.747 test=0.013 λ=1.797 σ=2.87e-01/1.25e-08
Epoch 120: train=0.757 test=0.014 λ=1.823 σ=2.73e-01/1.21e-08
Epoch 130: train=0.771 test=0.013 λ=1.854 σ=2.70e-01/1.19e-08
Epoch 140: train=0.772 test=0.013 λ=1.873 σ=2.67e-01/1.19e-08
Epoch 150: train=0.777 test=0.012 λ=1.882 σ=2.76e-01/1.20e-08
Best test acc: 0.333
============================================================
Depth = 8 conv layers (4 stages × 2 blocks)
============================================================
Vanilla: depth=8, params=4,892,196
Epoch 10: train=0.382 test=0.338 σ=9.40e-01/3.24e-08
Epoch 20: train=0.545 test=0.436 σ=4.81e-01/2.17e-08
Epoch 30: train=0.636 test=0.464 σ=3.88e-01/1.80e-08
Epoch 40: train=0.695 test=0.507 σ=3.33e-01/1.58e-08
Epoch 50: train=0.752 test=0.506 σ=3.07e-01/1.39e-08
Epoch 60: train=0.793 test=0.520 σ=2.96e-01/1.29e-08
Epoch 70: train=0.834 test=0.517 σ=2.68e-01/1.16e-08
Epoch 80: train=0.870 test=0.524 σ=2.49e-01/1.06e-08
Epoch 90: train=0.899 test=0.526 σ=2.41e-01/9.69e-09
Epoch 100: train=0.917 test=0.527 σ=2.36e-01/9.43e-09
Epoch 110: train=0.931 test=0.534 σ=2.25e-01/8.64e-09
Epoch 120: train=0.945 test=0.535 σ=2.08e-01/7.82e-09
Epoch 130: train=0.951 test=0.530 σ=2.02e-01/7.38e-09
Epoch 140: train=0.954 test=0.535 σ=2.02e-01/7.62e-09
Epoch 150: train=0.957 test=0.520 σ=2.01e-01/7.60e-09
Best test acc: 0.543
Lyapunov: depth=8, params=4,892,196
Epoch 10: train=0.046 test=0.010 λ=1.570 σ=4.09e-01/1.23e-08
Epoch 20: train=0.062 test=0.010 λ=1.569 σ=2.46e-01/7.84e-09
Epoch 30: train=0.069 test=0.010 λ=1.534 σ=1.81e-01/6.62e-09
Epoch 40: train=0.046 test=0.010 λ=1.562 σ=1.49e-01/4.37e-09
Epoch 50: train=0.057 test=0.010 λ=1.531 σ=1.53e-01/4.61e-09
Epoch 60: train=0.040 test=0.010 λ=1.538 σ=1.53e-01/3.35e-09
Epoch 70: train=0.046 test=0.010 λ=1.536 σ=1.19e-01/1.75e-09
Epoch 80: train=0.050 test=0.010 λ=1.534 σ=1.19e-01/2.22e-09
Epoch 90: train=0.062 test=0.010 λ=1.556 σ=1.18e-01/3.98e-09
Epoch 100: train=0.048 test=0.010 λ=1.530 σ=1.14e-01/1.46e-09
Epoch 110: train=0.055 test=0.010 λ=1.534 σ=1.11e-01/3.03e-09
Epoch 120: train=0.075 test=0.010 λ=1.539 σ=1.12e-01/4.79e-09
Epoch 130: train=0.079 test=0.010 λ=1.593 σ=1.20e-01/4.96e-09
Epoch 140: train=0.076 test=0.010 λ=1.584 σ=1.13e-01/4.96e-09
Epoch 150: train=0.077 test=0.010 λ=1.583 σ=1.15e-01/4.98e-09
Best test acc: 0.014
============================================================
Depth = 12 conv layers (4 stages × 3 blocks)
============================================================
Vanilla: depth=12, params=8,027,556
Epoch 10: train=0.216 test=0.059 σ=7.22e-01/2.38e-08
Epoch 20: train=0.291 test=0.044 σ=3.35e-01/1.60e-08
Epoch 30: train=0.339 test=0.048 σ=2.71e-01/1.39e-08
Epoch 40: train=0.377 test=0.055 σ=2.37e-01/1.27e-08
Epoch 50: train=0.412 test=0.040 σ=2.25e-01/1.23e-08
Epoch 60: train=0.440 test=0.044 σ=2.24e-01/1.23e-08
Epoch 70: train=0.471 test=0.048 σ=2.28e-01/1.19e-08
Epoch 80: train=0.497 test=0.060 σ=2.25e-01/1.23e-08
Epoch 90: train=0.533 test=0.069 σ=2.24e-01/1.19e-08
Epoch 100: train=0.563 test=0.079 σ=2.24e-01/1.20e-08
Epoch 110: train=0.580 test=0.058 σ=2.28e-01/1.19e-08
Epoch 120: train=0.602 test=0.056 σ=2.30e-01/1.19e-08
Epoch 130: train=0.608 test=0.070 σ=2.29e-01/1.18e-08
Epoch 140: train=0.616 test=0.068 σ=2.27e-01/1.18e-08
Epoch 150: train=0.620 test=0.064 σ=2.28e-01/1.22e-08
Best test acc: 0.079
Lyapunov: depth=12, params=8,027,556
Epoch 10: train=0.017 test=0.010 λ=1.584 σ=2.89e-01/5.97e-12
Epoch 20: train=0.012 test=0.010 λ=1.566 σ=2.21e-01/1.75e-20
Epoch 30: train=0.012 test=0.010 λ=1.567 σ=3.65e-01/7.23e-20
Epoch 40: train=0.021 test=0.010 λ=1.623 σ=2.45e-01/8.70e-13
Epoch 50: train=0.022 test=0.010 λ=1.660 σ=1.84e-01/9.38e-13
Epoch 60: train=0.020 test=0.010 λ=1.695 σ=1.61e-01/5.37e-13
Epoch 70: train=0.019 test=0.010 λ=1.635 σ=1.40e-01/1.78e-12
Epoch 80: train=0.018 test=0.010 λ=1.641 σ=1.37e-01/2.32e-12
Epoch 90: train=0.025 test=0.010 λ=1.637 σ=1.37e-01/1.13e-09
Epoch 100: train=0.027 test=0.010 λ=1.684 σ=1.29e-01/1.39e-09
Epoch 110: train=0.022 test=0.010 λ=1.779 σ=1.13e-01/1.11e-10
Epoch 120: train=0.022 test=0.010 λ=1.769 σ=1.08e-01/1.12e-11
Epoch 130: train=0.021 test=0.010 λ=1.888 σ=9.60e-02/3.75e-12
Epoch 140: train=0.021 test=0.010 λ=1.788 σ=1.00e-01/9.24e-12
Epoch 150: train=0.022 test=0.010 λ=1.799 σ=9.76e-02/4.48e-12
Best test acc: 0.010
============================================================
Depth = 16 conv layers (4 stages × 4 blocks)
============================================================
Vanilla: depth=16, params=11,162,916
Epoch 10: train=0.091 test=0.011 σ=4.40e-01/1.32e-08
Epoch 20: train=0.133 test=0.015 σ=2.83e-01/1.07e-08
Epoch 30: train=0.156 test=0.018 σ=2.23e-01/9.48e-09
Epoch 40: train=0.177 test=0.022 σ=2.04e-01/9.14e-09
Epoch 50: train=0.191 test=0.024 σ=1.78e-01/8.86e-09
Epoch 60: train=0.203 test=0.031 σ=1.74e-01/9.04e-09
Epoch 70: train=0.219 test=0.026 σ=1.62e-01/8.97e-09
Epoch 80: train=0.229 test=0.032 σ=1.63e-01/8.94e-09
Epoch 90: train=0.242 test=0.031 σ=1.60e-01/9.16e-09
Epoch 100: train=0.251 test=0.027 σ=1.62e-01/9.14e-09
Epoch 110: train=0.259 test=0.032 σ=1.58e-01/9.11e-09
Epoch 120: train=0.264 test=0.028 σ=1.64e-01/9.10e-09
Epoch 130: train=0.271 test=0.029 σ=1.61e-01/9.33e-09
Epoch 140: train=0.272 test=0.031 σ=1.64e-01/9.34e-09
Epoch 150: train=0.272 test=0.028 σ=1.66e-01/9.31e-09
Best test acc: 0.035
Lyapunov: depth=16, params=11,162,916
Epoch 10: train=0.014 test=0.010 λ=1.722 σ=2.76e-01/4.41e-13
Epoch 20: train=0.010 test=0.010 λ=1.723 σ=3.64e-01/5.20e-17
Epoch 30: train=0.011 test=0.010 λ=1.721 σ=8.95e-02/2.45e-17
Epoch 40: train=0.012 test=0.010 λ=1.787 σ=1.74e-01/5.48e-14
Epoch 50: train=0.014 test=0.010 λ=1.672 σ=1.88e-01/1.05e-14
Epoch 60: train=0.011 test=0.010 λ=1.976 σ=9.53e-02/1.33e-14
Epoch 70: train=0.011 test=0.010 λ=1.787 σ=9.06e-02/1.54e-14
Epoch 80: train=0.012 test=0.011 λ=1.825 σ=1.01e-01/4.31e-14
Epoch 90: train=0.010 test=0.010 λ=1.829 σ=1.48e-01/4.61e-13
Epoch 100: train=0.010 test=0.010 λ=1.605 σ=1.04e-01/1.42e-13
Epoch 110: train=0.010 test=0.010 λ=1.615 σ=1.21e-01/1.69e-14
Epoch 120: train=0.009 test=0.010 λ=1.613 σ=1.09e-01/1.04e-14
Epoch 130: train=0.010 test=0.010 λ=1.604 σ=5.06e-02/2.83e-24
Epoch 140: train=0.010 test=0.010 λ=1.622 σ=5.64e-02/0.00e+00
Epoch 150: train=0.010 test=0.010 λ=1.584 σ=2.54e-02/0.00e+00
Best test acc: 0.014
====================================================================================================
DEPTH SCALING RESULTS: CIFAR100
====================================================================================================
Depth Vanilla Acc Lyapunov Acc Δ Acc Lyap λ Van ∇norm Lyap ∇norm Van κ
----------------------------------------------------------------------------------------------------
4 0.616 0.012 -0.603 1.882 4.59e-01 6.55e-01 2.2e+08
8 0.520 0.010 -0.510 1.583 3.83e-01 3.29e-01 2.8e+08
12 0.064 0.010 -0.054 1.799 6.38e-01 2.04e-01 2.3e+07
16 0.028 0.010 -0.018 1.584 5.05e-01 3.21e-01 2.1e+07
====================================================================================================
GRADIENT HEALTH ANALYSIS:
Depth 4: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 8: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 12: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 16: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
KEY OBSERVATIONS:
Vanilla 4→16 layers: -0.588 accuracy change
Lyapunov 4→16 layers: -0.002 accuracy change
✓ Lyapunov regularization enables better depth scaling!
Results saved to runs/depth_scaling_weak_reg/cifar100_20260102-133933
============================================================
Finished: Fri Jan 2 13:39:37 CST 2026
============================================================
|