1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
|
============================================================
STABLE INITIALIZATION Experiment
Job ID: 15112873 | Node: gpub011
Start: Thu Jan 1 12:26:50 CST 2026
============================================================
NVIDIA A40, 46068 MiB
============================================================
================================================================================
DEPTH SCALING BENCHMARK
================================================================================
Dataset: cifar100
Depths: [4, 8, 12, 16]
Timesteps: 4
Epochs: 150
λ_reg: 0.1, λ_target: -0.1
Reg type: squared, Warmup epochs: 20
Device: cuda
================================================================================
Loading cifar100...
Classes: 100, Input: (3, 32, 32)
Train: 50000, Test: 10000
Depth configurations: [(4, '4×1'), (8, '4×2'), (12, '4×3'), (16, '4×4')]
Regularization type: squared
Warmup epochs: 20
Stable init: True
============================================================
Depth = 4 conv layers (4 stages × 1 blocks)
============================================================
Vanilla: depth=4, params=1,756,836
Epoch 10: train=0.516 test=0.431 σ=9.10e-01/3.50e-08
Epoch 20: train=0.640 test=0.517 σ=5.84e-01/2.47e-08
Epoch 30: train=0.712 test=0.558 σ=4.82e-01/2.04e-08
Epoch 40: train=0.761 test=0.558 σ=4.07e-01/1.72e-08
Epoch 50: train=0.800 test=0.577 σ=3.76e-01/1.54e-08
Epoch 60: train=0.837 test=0.581 σ=3.34e-01/1.38e-08
Epoch 70: train=0.864 test=0.579 σ=3.25e-01/1.29e-08
Epoch 80: train=0.888 test=0.592 σ=2.91e-01/1.17e-08
Epoch 90: train=0.907 test=0.602 σ=2.89e-01/1.10e-08
Epoch 100: train=0.921 test=0.604 σ=2.71e-01/1.05e-08
Epoch 110: train=0.935 test=0.606 σ=2.64e-01/9.89e-09
Epoch 120: train=0.943 test=0.617 σ=2.46e-01/9.41e-09
Epoch 130: train=0.950 test=0.615 σ=2.45e-01/8.86e-09
Epoch 140: train=0.951 test=0.615 σ=2.29e-01/8.67e-09
Epoch 150: train=0.953 test=0.615 σ=2.36e-01/8.51e-09
Best test acc: 0.620
Lyapunov: depth=4, params=1,756,836
Epoch 10: train=0.195 test=0.014 λ=1.549 σ=7.07e-01/2.57e-08
Epoch 20: train=0.135 test=0.012 λ=1.570 σ=4.14e-01/1.49e-08
Epoch 30: train=0.057 test=0.010 λ=1.488 σ=2.46e-01/6.99e-09
Epoch 40: train=0.067 test=0.010 λ=1.481 σ=2.03e-01/6.20e-09
Epoch 50: train=0.048 test=0.010 λ=1.877 σ=1.80e-01/4.00e-09
Epoch 60: train=0.009 test=0.010 λ=1.462 σ=4.58e-02/0.00e+00
Epoch 70: train=0.010 test=0.010 λ=1.467 σ=3.57e-02/0.00e+00
Epoch 80: train=0.010 test=0.010 λ=1.471 σ=1.33e-02/0.00e+00
Epoch 90: train=0.009 test=0.010 λ=1.471 σ=4.82e-03/0.00e+00
Epoch 100: train=0.009 test=0.010 λ=1.471 σ=1.18e-03/0.00e+00
Epoch 110: train=0.009 test=0.010 λ=1.471 σ=4.32e-03/0.00e+00
Epoch 120: train=0.009 test=0.010 λ=1.472
Epoch 130: train=0.010 test=0.010 λ=1.472
Epoch 140: train=0.010 test=0.010 λ=1.471
Epoch 150: train=0.010 test=0.010 λ=1.473
Best test acc: 0.106
============================================================
Depth = 8 conv layers (4 stages × 2 blocks)
============================================================
Vanilla: depth=8, params=4,892,196
Epoch 10: train=0.451 test=0.402 σ=7.31e-01/2.99e-08
Epoch 20: train=0.587 test=0.471 σ=4.69e-01/2.12e-08
Epoch 30: train=0.666 test=0.493 σ=3.81e-01/1.75e-08
Epoch 40: train=0.728 test=0.505 σ=3.27e-01/1.53e-08
Epoch 50: train=0.774 test=0.533 σ=3.18e-01/1.40e-08
Epoch 60: train=0.812 test=0.521 σ=2.93e-01/1.28e-08
Epoch 70: train=0.852 test=0.547 σ=2.81e-01/1.17e-08
Epoch 80: train=0.884 test=0.531 σ=2.48e-01/1.02e-08
Epoch 90: train=0.906 test=0.537 σ=2.35e-01/9.46e-09
Epoch 100: train=0.927 test=0.553 σ=2.24e-01/8.84e-09
Epoch 110: train=0.941 test=0.552 σ=2.09e-01/8.04e-09
Epoch 120: train=0.951 test=0.553 σ=2.09e-01/7.55e-09
Epoch 130: train=0.959 test=0.553 σ=2.10e-01/7.39e-09
Epoch 140: train=0.959 test=0.561 σ=1.95e-01/7.19e-09
Epoch 150: train=0.961 test=0.551 σ=1.94e-01/6.97e-09
Best test acc: 0.564
Lyapunov: depth=8, params=4,892,196
Epoch 10: train=0.046 test=0.010 λ=1.543 σ=3.90e-01/9.92e-09
Epoch 20: train=0.038 test=0.010 λ=1.533 σ=2.42e-01/4.88e-09
Epoch 30: train=0.038 test=0.010 λ=1.623 σ=1.93e-01/3.39e-09
Epoch 40: train=0.028 test=0.010 λ=1.706 σ=1.66e-01/2.06e-09
Epoch 50: train=0.009 test=0.010 λ=1.532 σ=7.89e-02/1.54e-17
Epoch 60: train=0.010 test=0.010 λ=1.540 σ=4.28e-02/5.11e-27
Epoch 70: train=0.009 test=0.010 λ=1.544 σ=4.22e-02/0.00e+00
Epoch 80: train=0.010 test=0.010 λ=1.548 σ=3.81e-02/0.00e+00
Epoch 90: train=0.011 test=0.010 λ=1.554 σ=3.03e-02/0.00e+00
Epoch 100: train=0.010 test=0.010 λ=1.549 σ=9.40e-03/0.00e+00
Epoch 110: train=0.010 test=0.010 λ=1.549 σ=5.91e-03/0.00e+00
Epoch 120: train=0.010 test=0.010 λ=1.548 σ=3.83e-03/0.00e+00
Epoch 130: train=0.010 test=0.010 λ=1.549 σ=7.81e-03/0.00e+00
Epoch 140: train=0.010 test=0.010 λ=1.549 σ=1.37e-02/0.00e+00
Epoch 150: train=0.010 test=0.010 λ=1.546 σ=8.69e-03/0.00e+00
Best test acc: 0.021
============================================================
Depth = 12 conv layers (4 stages × 3 blocks)
============================================================
Vanilla: depth=12, params=8,027,556
Epoch 10: train=0.253 test=0.046 σ=4.96e-01/2.03e-08
Epoch 20: train=0.322 test=0.044 σ=3.35e-01/1.58e-08
Epoch 30: train=0.364 test=0.054 σ=2.77e-01/1.38e-08
Epoch 40: train=0.404 test=0.046 σ=2.49e-01/1.30e-08
Epoch 50: train=0.439 test=0.062 σ=2.30e-01/1.24e-08
Epoch 60: train=0.469 test=0.040 σ=2.30e-01/1.24e-08
Epoch 70: train=0.498 test=0.054 σ=2.35e-01/1.21e-08
Epoch 80: train=0.532 test=0.058 σ=2.26e-01/1.20e-08
Epoch 90: train=0.565 test=0.072 σ=2.26e-01/1.18e-08
Epoch 100: train=0.276 test=0.099 σ=1.92e-01/1.10e-08
Epoch 110: train=0.409 test=0.123 σ=2.13e-01/1.20e-08
Epoch 120: train=0.470 test=0.124 σ=2.27e-01/1.20e-08
Epoch 130: train=0.495 test=0.146 σ=2.19e-01/1.22e-08
Epoch 140: train=0.510 test=0.138 σ=2.15e-01/1.17e-08
Epoch 150: train=0.512 test=0.118 σ=2.18e-01/1.17e-08
Best test acc: 0.146
Lyapunov: depth=12, params=8,027,556
Epoch 10: train=0.011 test=0.010 λ=1.563 σ=5.46e-01/7.17e-09
Epoch 20: train=0.010 test=0.010 λ=1.556 σ=8.74e-02/8.70e-15
Epoch 30: train=0.010 test=0.010 λ=1.554 σ=9.58e-02/3.05e-15
Epoch 40: train=0.009 test=0.010 λ=1.566 σ=6.06e-02/2.31e-34
Epoch 50: train=0.010 test=0.010 λ=1.566 σ=3.46e-02/0.00e+00
Epoch 60: train=0.009 test=0.010 λ=1.573 σ=4.50e-02/0.00e+00
Epoch 70: train=0.010 test=0.010 λ=1.572 σ=1.34e-02/0.00e+00
Epoch 80: train=0.009 test=0.010 λ=1.575 σ=6.32e-04/0.00e+00
Epoch 90: train=0.009 test=0.010 λ=1.576 σ=5.51e-02/0.00e+00
Epoch 100: train=0.010 test=0.010 λ=1.579 σ=2.74e-02/0.00e+00
Epoch 110: train=0.009 test=0.010 λ=1.575 σ=2.56e-02/0.00e+00
Epoch 120: train=0.010 test=0.010 λ=1.576 σ=3.61e-02/0.00e+00
Epoch 130: train=0.010 test=0.010 λ=1.576
Epoch 140: train=0.010 test=0.010 λ=1.574 σ=5.40e-03/0.00e+00
Epoch 150: train=0.010 test=0.010 λ=1.569
Best test acc: 0.011
============================================================
Depth = 16 conv layers (4 stages × 4 blocks)
============================================================
Vanilla: depth=16, params=11,162,916
Epoch 10: train=0.120 test=0.020 σ=4.06e-01/1.45e-08
Epoch 20: train=0.158 test=0.011 σ=2.71e-01/1.13e-08
Epoch 30: train=0.182 test=0.016 σ=2.16e-01/1.00e-08
Epoch 40: train=0.203 test=0.029 σ=2.01e-01/9.74e-09
Epoch 50: train=0.220 test=0.025 σ=1.83e-01/9.59e-09
Epoch 60: train=0.237 test=0.025 σ=1.78e-01/9.64e-09
Epoch 70: train=0.250 test=0.029 σ=1.67e-01/9.64e-09
Epoch 80: train=0.259 test=0.026 σ=1.65e-01/9.31e-09
Epoch 90: train=0.273 test=0.022 σ=1.63e-01/9.65e-09
Epoch 100: train=0.229 test=0.019 σ=1.52e-01/9.12e-09
Epoch 110: train=0.256 test=0.024 σ=1.54e-01/9.41e-09
Epoch 120: train=0.266 test=0.025 σ=1.60e-01/9.49e-09
Epoch 130: train=0.277 test=0.025 σ=1.57e-01/9.48e-09
Epoch 140: train=0.283 test=0.025 σ=1.61e-01/9.66e-09
Epoch 150: train=0.283 test=0.024 σ=1.63e-01/9.63e-09
Best test acc: 0.036
Lyapunov: depth=16, params=11,162,916
Epoch 10: train=0.011 test=0.010 λ=1.695 σ=3.65e-01/1.28e-13
Epoch 20: train=0.011 test=0.010 λ=1.668 σ=3.46e-01/1.58e-14
Epoch 30: train=0.011 test=0.010 λ=1.632 σ=1.93e-01/2.02e-20
Epoch 40: train=0.009 test=0.010 λ=1.610 σ=2.17e-01/1.62e-12
Epoch 50: train=0.010 test=0.010 λ=1.620 σ=1.54e-01/1.56e-15
Epoch 60: train=0.011 test=0.010 λ=1.621 σ=5.15e-02/0.00e+00
Epoch 70: train=0.009 test=0.010 λ=1.606 σ=1.16e-02/0.00e+00
Epoch 80: train=0.009 test=0.010 λ=1.605 σ=1.80e-02/0.00e+00
Epoch 90: train=0.009 test=0.010 λ=1.609
Epoch 100: train=0.009 test=0.010 λ=1.618 σ=5.85e-04/0.00e+00
Epoch 110: train=0.009 test=0.010 λ=1.610 σ=5.90e-04/0.00e+00
Epoch 120: train=0.009 test=0.010 λ=1.608
Epoch 130: train=0.009 test=0.010 λ=1.603
Epoch 140: train=0.010 test=0.010 λ=1.606
Epoch 150: train=0.010 test=0.010 λ=1.596
Best test acc: 0.016
====================================================================================================
DEPTH SCALING RESULTS: CIFAR100
====================================================================================================
Depth Vanilla Acc Lyapunov Acc Δ Acc Lyap λ Van ∇norm Lyap ∇norm Van κ
----------------------------------------------------------------------------------------------------
4 0.615 0.010 -0.605 1.473 4.63e-01 8.84e-02 3.7e+08
8 0.551 0.010 -0.541 1.546 3.64e-01 1.64e-01 2.7e+08
12 0.118 0.010 -0.108 1.569 6.43e-01 6.98e-01 4.1e+07
16 0.024 0.010 -0.014 1.596 5.19e-01 3.22e-01 2.7e+07
====================================================================================================
GRADIENT HEALTH ANALYSIS:
Depth 4: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 8: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 12: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 16: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
KEY OBSERVATIONS:
Vanilla 4→16 layers: -0.591 accuracy change
Lyapunov 4→16 layers: +0.000 accuracy change
✓ Lyapunov regularization enables better depth scaling!
Results saved to runs/depth_scaling_stable_init/cifar100_20260102-133755
============================================================
Finished: Fri Jan 2 13:37:56 CST 2026
============================================================
|