1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
|
============================================================
CIFAR-100 Depth Scaling Benchmark
Job ID: 14363508 | Node: gpub039
Start: Mon Dec 29 09:14:19 CST 2025
============================================================
NVIDIA A40, 46068 MiB
============================================================
================================================================================
DEPTH SCALING BENCHMARK
================================================================================
Dataset: cifar100
Depths: [4, 8, 12, 16, 20]
Timesteps: 4
Epochs: 150
λ_reg: 0.3, λ_target: -0.1
Device: cuda
================================================================================
Loading cifar100...
Classes: 100, Input: (3, 32, 32)
Train: 50000, Test: 10000
Depth configurations: [(4, '4×1'), (8, '4×2'), (12, '4×3'), (16, '4×4'), (20, '4×5')]
============================================================
Depth = 4 conv layers (4 stages × 1 blocks)
============================================================
Vanilla: depth=4, params=1,756,836
Epoch 10: train=0.494 test=0.423 σ=9.41e-01/3.56e-08
Epoch 20: train=0.626 test=0.503 σ=5.87e-01/2.43e-08
Epoch 30: train=0.703 test=0.550 σ=4.74e-01/1.97e-08
Epoch 40: train=0.755 test=0.564 σ=4.13e-01/1.70e-08
Epoch 50: train=0.797 test=0.542 σ=3.68e-01/1.50e-08
Epoch 60: train=0.830 test=0.581 σ=3.41e-01/1.42e-08
Epoch 70: train=0.862 test=0.583 σ=3.14e-01/1.29e-08
Epoch 80: train=0.883 test=0.599 σ=3.02e-01/1.21e-08
Epoch 90: train=0.905 test=0.594 σ=2.83e-01/1.09e-08
Epoch 100: train=0.920 test=0.607 σ=2.60e-01/9.94e-09
Epoch 110: train=0.932 test=0.611 σ=2.57e-01/9.62e-09
Epoch 120: train=0.941 test=0.610 σ=2.45e-01/9.26e-09
Epoch 130: train=0.949 test=0.616 σ=2.44e-01/8.78e-09
Epoch 140: train=0.951 test=0.613 σ=2.30e-01/8.58e-09
Epoch 150: train=0.952 test=0.614 σ=2.32e-01/8.66e-09
Best test acc: 0.621
Lyapunov: depth=4, params=1,756,836
Epoch 10: train=0.012 test=0.010 λ=1.940 σ=9.87e-02/2.73e-13
Epoch 20: train=0.010 test=0.010 λ=1.930 σ=3.70e-02/0.00e+00
Epoch 30: train=0.009 test=0.010 λ=1.920 σ=4.74e-03/0.00e+00
Epoch 40: train=0.009 test=0.010 λ=1.920 σ=2.81e-03/0.00e+00
Epoch 50: train=0.008 test=0.010 λ=1.920
Epoch 60: train=0.008 test=0.010 λ=1.921
Epoch 70: train=0.010 test=0.010 λ=1.922
Epoch 80: train=0.009 test=0.010 λ=1.923
Epoch 90: train=0.009 test=0.010 λ=1.919
Epoch 100: train=0.009 test=0.010 λ=1.923
Epoch 110: train=0.009 test=0.010 λ=1.921
Epoch 120: train=0.009 test=0.010 λ=1.923
Epoch 130: train=0.009 test=0.010 λ=1.923
Epoch 140: train=0.009 test=0.010 λ=1.922
Epoch 150: train=0.010 test=0.010 λ=1.921
Best test acc: 0.054
============================================================
Depth = 8 conv layers (4 stages × 2 blocks)
============================================================
Vanilla: depth=8, params=4,892,196
Epoch 10: train=0.390 test=0.338 σ=8.34e-01/3.11e-08
Epoch 20: train=0.547 test=0.438 σ=4.73e-01/2.15e-08
Epoch 30: train=0.633 test=0.454 σ=3.91e-01/1.80e-08
Epoch 40: train=0.699 test=0.489 σ=3.30e-01/1.55e-08
Epoch 50: train=0.754 test=0.509 σ=3.13e-01/1.41e-08
Epoch 60: train=0.795 test=0.503 σ=2.84e-01/1.27e-08
Epoch 70: train=0.836 test=0.511 σ=2.72e-01/1.18e-08
Epoch 80: train=0.869 test=0.517 σ=2.46e-01/1.02e-08
Epoch 90: train=0.897 test=0.523 σ=2.46e-01/1.00e-08
Epoch 100: train=0.917 test=0.519 σ=2.33e-01/9.01e-09
Epoch 110: train=0.932 test=0.528 σ=2.26e-01/8.65e-09
Epoch 120: train=0.947 test=0.537 σ=2.11e-01/8.16e-09
Epoch 130: train=0.953 test=0.526 σ=2.06e-01/7.74e-09
Epoch 140: train=0.954 test=0.538 σ=1.98e-01/7.35e-09
Epoch 150: train=0.956 test=0.522 σ=2.00e-01/7.34e-09
Best test acc: 0.541
Lyapunov: depth=8, params=4,892,196
Epoch 10: train=0.010 test=0.010 λ=2.704 σ=1.07e-01/7.42e-16
Epoch 20: train=0.009 test=0.010 λ=2.262
Epoch 30: train=0.009 test=0.010 λ=2.272
Epoch 40: train=0.009 test=0.010 λ=2.264 σ=6.25e-03/0.00e+00
Epoch 50: train=0.008 test=0.010 λ=2.281
Epoch 60: train=0.009 test=0.010 λ=2.273
Epoch 70: train=0.008 test=0.010 λ=2.267
Epoch 80: train=0.009 test=0.010 λ=2.263
Epoch 90: train=0.009 test=0.010 λ=2.264
Epoch 100: train=0.009 test=0.010 λ=2.261
Epoch 110: train=0.008 test=0.010 λ=2.264
Epoch 120: train=0.009 test=0.010 λ=2.261
Epoch 130: train=0.009 test=0.010 λ=2.264
Epoch 140: train=0.009 test=0.010 λ=2.263
Epoch 150: train=0.010 test=0.010 λ=2.262
Best test acc: 0.028
============================================================
Depth = 12 conv layers (4 stages × 3 blocks)
============================================================
Vanilla: depth=12, params=8,027,556
Epoch 10: train=0.215 test=0.064 σ=6.68e-01/2.31e-08
Epoch 20: train=0.286 test=0.052 σ=3.31e-01/1.58e-08
Epoch 30: train=0.336 test=0.081 σ=2.75e-01/1.39e-08
Epoch 40: train=0.369 test=0.069 σ=2.34e-01/1.27e-08
Epoch 50: train=0.410 test=0.064 σ=2.31e-01/1.24e-08
Epoch 60: train=0.435 test=0.059 σ=2.20e-01/1.22e-08
Epoch 70: train=0.262 test=0.108 σ=2.07e-01/1.12e-08
Epoch 80: train=0.390 test=0.110 σ=2.14e-01/1.20e-08
Epoch 90: train=0.437 test=0.106 σ=2.23e-01/1.22e-08
Epoch 100: train=0.473 test=0.124 σ=2.28e-01/1.25e-08
Epoch 110: train=0.500 test=0.103 σ=2.28e-01/1.24e-08
Epoch 120: train=0.527 test=0.095 σ=2.35e-01/1.25e-08
Epoch 130: train=0.536 test=0.107 σ=2.38e-01/1.28e-08
Epoch 140: train=0.545 test=0.111 σ=2.40e-01/1.26e-08
Epoch 150: train=0.547 test=0.102 σ=2.40e-01/1.29e-08
Best test acc: 0.126
Lyapunov: depth=12, params=8,027,556
Epoch 10: train=0.013 test=0.010 λ=2.873 σ=2.57e-01/2.15e-13
Epoch 20: train=0.010 test=0.010 λ=2.629 σ=2.81e-02/0.00e+00
Epoch 30: train=0.009 test=0.010 λ=2.465 σ=6.68e-03/0.00e+00
Epoch 40: train=0.009 test=0.010 λ=2.480
Epoch 50: train=0.009 test=0.010 λ=2.470
Epoch 60: train=0.009 test=0.010 λ=2.482
Epoch 70: train=0.008 test=0.010 λ=2.473
Epoch 80: train=0.008 test=0.010 λ=2.463
Epoch 90: train=0.008 test=0.010 λ=2.465
Epoch 100: train=0.009 test=0.010 λ=2.463
Epoch 110: train=0.008 test=0.010 λ=2.470
Epoch 120: train=0.009 test=0.010 λ=2.468
Epoch 130: train=0.010 test=0.010 λ=2.470
Epoch 140: train=0.009 test=0.010 λ=2.463
Epoch 150: train=0.010 test=0.010 λ=2.462
Best test acc: 0.011
============================================================
Depth = 16 conv layers (4 stages × 4 blocks)
============================================================
Vanilla: depth=16, params=11,162,916
Epoch 10: train=0.094 test=0.011 σ=4.41e-01/1.38e-08
Epoch 20: train=0.134 test=0.020 σ=2.83e-01/1.10e-08
Epoch 30: train=0.156 test=0.022 σ=2.27e-01/9.71e-09
Epoch 40: train=0.174 test=0.022 σ=1.97e-01/9.00e-09
Epoch 50: train=0.184 test=0.022 σ=1.79e-01/8.89e-09
Epoch 60: train=0.198 test=0.021 σ=1.70e-01/8.88e-09
Epoch 70: train=0.212 test=0.022 σ=1.60e-01/8.82e-09
Epoch 80: train=0.224 test=0.027 σ=1.63e-01/8.93e-09
Epoch 90: train=0.235 test=0.031 σ=1.57e-01/8.95e-09
Epoch 100: train=0.241 test=0.032 σ=1.60e-01/9.14e-09
Epoch 110: train=0.255 test=0.037 σ=1.58e-01/9.23e-09
Epoch 120: train=0.259 test=0.034 σ=1.61e-01/9.22e-09
Epoch 130: train=0.263 test=0.038 σ=1.61e-01/9.35e-09
Epoch 140: train=0.265 test=0.032 σ=1.63e-01/9.35e-09
Epoch 150: train=0.269 test=0.037 σ=1.65e-01/9.36e-09
Best test acc: 0.040
Lyapunov: depth=16, params=11,162,916
Epoch 10: train=0.013 test=0.010 λ=2.901 σ=2.73e-01/2.05e-13
Epoch 20: train=0.009 test=0.010 λ=3.238 σ=1.03e-02/0.00e+00
Epoch 30: train=0.009 test=0.010 λ=2.605 σ=3.07e-03/0.00e+00
Epoch 40: train=0.008 test=0.010 λ=2.603
Epoch 50: train=0.008 test=0.010 λ=2.610
Epoch 60: train=0.009 test=0.010 λ=2.627
Epoch 70: train=0.009 test=0.010 λ=2.609
Epoch 80: train=0.009 test=0.010 λ=2.607
Epoch 90: train=0.009 test=0.010 λ=2.622
Epoch 100: train=0.009 test=0.010 λ=2.614
Epoch 110: train=0.009 test=0.010 λ=2.606
Epoch 120: train=0.009 test=0.010 λ=2.602
Epoch 130: train=0.009 test=0.010 λ=2.615
Epoch 140: train=0.010 test=0.010 λ=2.602
Epoch 150: train=0.010 test=0.010 λ=2.603
Best test acc: 0.011
============================================================
Depth = 20 conv layers (4 stages × 5 blocks)
============================================================
Vanilla: depth=20, params=14,298,276
Epoch 10: train=0.010 test=0.011 σ=3.06e+00/4.22e-08
Epoch 20: train=0.010 test=0.010 σ=2.15e+00/2.95e-08
Epoch 30: train=0.010 test=0.010 σ=7.74e-01/2.37e-11
Epoch 40: train=0.009 test=0.010 σ=1.44e-01/0.00e+00
Epoch 50: train=0.009 test=0.010 σ=1.51e-02/0.00e+00
Epoch 60: train=0.025 test=0.010 σ=2.05e-01/1.31e-11
Epoch 70: train=0.032 test=0.010 σ=1.80e-01/1.69e-09
Epoch 80: train=0.040 test=0.010 σ=1.61e-01/1.82e-09
Epoch 90: train=0.043 test=0.010 σ=1.51e-01/2.04e-09
Epoch 100: train=0.046 test=0.011 σ=1.49e-01/2.28e-09
Epoch 110: train=0.050 test=0.011 σ=1.56e-01/2.59e-09
Epoch 120: train=0.049 test=0.012 σ=1.53e-01/2.89e-09
Epoch 130: train=0.053 test=0.010 σ=1.51e-01/3.14e-09
Epoch 140: train=0.055 test=0.010 σ=1.49e-01/3.28e-09
Epoch 150: train=0.053 test=0.010 σ=1.51e-01/3.29e-09
Best test acc: 0.012
Lyapunov: depth=20, params=14,298,276
Epoch 10: train=0.013 test=0.010 λ=2.968 σ=3.33e-01/5.32e-13
Epoch 20: train=0.011 test=0.010 λ=2.969 σ=5.00e-02/2.54e-43
Epoch 30: train=0.008 test=0.010 λ=2.719 σ=1.06e-02/0.00e+00
Epoch 40: train=0.009 test=0.010 λ=2.737
Epoch 50: train=0.009 test=0.010 λ=2.729
Epoch 60: train=0.009 test=0.010 λ=2.748
Epoch 70: train=0.009 test=0.010 λ=2.740
Epoch 80: train=0.008 test=0.010 λ=2.721
Epoch 90: train=0.009 test=0.010 λ=2.763
Epoch 100: train=0.008 test=0.010 λ=2.735
Epoch 110: train=0.009 test=0.010 λ=2.716
Epoch 120: train=0.009 test=0.010 λ=2.718
Epoch 130: train=0.009 test=0.010 λ=2.756
Epoch 140: train=0.009 test=0.010 λ=2.726
Epoch 150: train=0.010 test=0.010 λ=2.714
Best test acc: 0.016
====================================================================================================
DEPTH SCALING RESULTS: CIFAR100
====================================================================================================
Depth Vanilla Acc Lyapunov Acc Δ Acc Lyap λ Van ∇norm Lyap ∇norm Van κ
----------------------------------------------------------------------------------------------------
4 0.614 0.010 -0.604 1.921 4.57e-01 8.82e-02 1.2e+09
8 0.522 0.010 -0.512 2.262 3.86e-01 8.73e-02 1.4e+09
12 0.102 0.010 -0.092 2.462 6.74e-01 8.77e-02 2.5e+07
16 0.037 0.010 -0.027 2.603 5.04e-01 8.77e-02 2.4e+07
20 0.010 0.010 -0.000 2.714 2.96e-01 8.80e-02 6.5e+07
====================================================================================================
GRADIENT HEALTH ANALYSIS:
Depth 4: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 8: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 12: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 16: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 20: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
KEY OBSERVATIONS:
Vanilla 4→20 layers: -0.604 accuracy change
Lyapunov 4→20 layers: +0.000 accuracy change
✓ Lyapunov regularization enables better depth scaling!
Results saved to runs/depth_scaling/cifar100_20251230-213033
============================================================
Finished: Tue Dec 30 21:30:34 CST 2025
============================================================
|