1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
|
============================================================
EXTREME-ONLY PENALTY Experiment (lambda > 2.0)
Job ID: 15112874 | Node: gpub032
Start: Thu Jan 1 12:26:50 CST 2026
============================================================
NVIDIA A40, 46068 MiB
============================================================
================================================================================
DEPTH SCALING BENCHMARK
================================================================================
Dataset: cifar100
Depths: [4, 8, 12, 16]
Timesteps: 4
Epochs: 150
λ_reg: 0.3, λ_target: -0.1
Reg type: extreme, Warmup epochs: 10
Device: cuda
================================================================================
Loading cifar100...
Classes: 100, Input: (3, 32, 32)
Train: 50000, Test: 10000
Depth configurations: [(4, '4×1'), (8, '4×2'), (12, '4×3'), (16, '4×4')]
Regularization type: extreme
Warmup epochs: 10
Stable init: False
============================================================
Depth = 4 conv layers (4 stages × 1 blocks)
============================================================
Vanilla: depth=4, params=1,756,836
Epoch 10: train=0.498 test=0.436 σ=9.55e-01/3.56e-08
Epoch 20: train=0.629 test=0.527 σ=5.80e-01/2.40e-08
Epoch 30: train=0.701 test=0.546 σ=4.83e-01/2.01e-08
Epoch 40: train=0.756 test=0.566 σ=4.24e-01/1.76e-08
Epoch 50: train=0.799 test=0.566 σ=3.67e-01/1.52e-08
Epoch 60: train=0.832 test=0.580 σ=3.40e-01/1.41e-08
Epoch 70: train=0.858 test=0.563 σ=3.15e-01/1.28e-08
Epoch 80: train=0.883 test=0.584 σ=2.98e-01/1.18e-08
Epoch 90: train=0.906 test=0.595 σ=2.79e-01/1.07e-08
Epoch 100: train=0.920 test=0.597 σ=2.63e-01/1.02e-08
Epoch 110: train=0.932 test=0.612 σ=2.50e-01/9.41e-09
Epoch 120: train=0.941 test=0.610 σ=2.47e-01/8.94e-09
Epoch 130: train=0.947 test=0.614 σ=2.45e-01/9.05e-09
Epoch 140: train=0.949 test=0.614 σ=2.41e-01/8.80e-09
Epoch 150: train=0.953 test=0.615 σ=2.35e-01/8.52e-09
Best test acc: 0.619
Lyapunov: depth=4, params=1,756,836
Epoch 10: train=0.415 test=0.120 λ=1.974 σ=8.95e-01/3.40e-08
Epoch 20: train=0.551 test=0.414 λ=1.943 σ=5.70e-01/2.44e-08
Epoch 30: train=0.631 test=0.479 λ=1.922 σ=4.64e-01/2.05e-08
Epoch 40: train=0.692 test=0.394 λ=1.908 σ=4.18e-01/1.83e-08
Epoch 50: train=0.739 test=0.418 λ=1.909 σ=3.71e-01/1.60e-08
Epoch 60: train=0.780 test=0.446 λ=1.917 σ=3.56e-01/1.52e-08
Epoch 70: train=0.815 test=0.458 λ=1.914 σ=3.28e-01/1.36e-08
Epoch 80: train=0.845 test=0.480 λ=1.923 σ=3.10e-01/1.32e-08
Epoch 90: train=0.868 test=0.486 λ=1.919 σ=2.93e-01/1.20e-08
Epoch 100: train=0.887 test=0.480 λ=1.923 σ=2.79e-01/1.15e-08
Epoch 110: train=0.902 test=0.489 λ=1.929 σ=2.75e-01/1.08e-08
Epoch 120: train=0.913 test=0.467 λ=1.926 σ=2.66e-01/1.05e-08
Epoch 130: train=0.920 test=0.479 λ=1.931 σ=2.59e-01/1.03e-08
Epoch 140: train=0.924 test=0.483 λ=1.928 σ=2.51e-01/9.81e-09
Epoch 150: train=0.925 test=0.475 λ=1.937 σ=2.47e-01/9.86e-09
Best test acc: 0.508
============================================================
Depth = 8 conv layers (4 stages × 2 blocks)
============================================================
Vanilla: depth=8, params=4,892,196
Epoch 10: train=0.390 test=0.350 σ=8.10e-01/3.04e-08
Epoch 20: train=0.546 test=0.435 σ=4.82e-01/2.15e-08
Epoch 30: train=0.632 test=0.473 σ=3.79e-01/1.78e-08
Epoch 40: train=0.697 test=0.513 σ=3.29e-01/1.55e-08
Epoch 50: train=0.752 test=0.512 σ=3.12e-01/1.42e-08
Epoch 60: train=0.795 test=0.520 σ=2.97e-01/1.31e-08
Epoch 70: train=0.836 test=0.526 σ=2.73e-01/1.18e-08
Epoch 80: train=0.869 test=0.533 σ=2.55e-01/1.10e-08
Epoch 90: train=0.897 test=0.525 σ=2.44e-01/9.77e-09
Epoch 100: train=0.916 test=0.530 σ=2.35e-01/9.36e-09
Epoch 110: train=0.933 test=0.539 σ=2.27e-01/8.60e-09
Epoch 120: train=0.943 test=0.537 σ=2.18e-01/8.20e-09
Epoch 130: train=0.952 test=0.541 σ=2.05e-01/7.82e-09
Epoch 140: train=0.956 test=0.538 σ=2.12e-01/7.90e-09
Epoch 150: train=0.956 test=0.534 σ=1.94e-01/7.48e-09
Best test acc: 0.547
Lyapunov: depth=8, params=4,892,196
Epoch 10: train=0.078 test=0.016 λ=1.777 σ=5.17e-01/1.78e-08
Epoch 20: train=0.121 test=0.016 λ=1.693 σ=2.63e-01/1.25e-08
Epoch 30: train=0.143 test=0.020 λ=1.696 σ=2.23e-01/1.14e-08
Epoch 40: train=0.147 test=0.013 λ=1.657 σ=1.86e-01/1.06e-08
Epoch 50: train=0.129 test=0.012 λ=1.659 σ=1.68e-01/8.87e-09
Epoch 60: train=0.137 test=0.011 λ=1.625 σ=1.54e-01/8.80e-09
Epoch 70: train=0.082 test=0.009 λ=1.589 σ=1.32e-01/6.54e-09
Epoch 80: train=0.127 test=0.011 λ=1.590 σ=1.42e-01/7.63e-09
Epoch 90: train=0.142 test=0.009 λ=1.609 σ=1.45e-01/8.25e-09
Epoch 100: train=0.147 test=0.012 λ=1.590 σ=1.41e-01/8.09e-09
Epoch 110: train=0.152 test=0.010 λ=1.598 σ=1.43e-01/8.06e-09
Epoch 120: train=0.156 test=0.010 λ=1.592 σ=1.40e-01/8.22e-09
Epoch 130: train=0.162 test=0.010 λ=1.589 σ=1.43e-01/8.35e-09
Epoch 140: train=0.163 test=0.010 λ=1.584 σ=1.40e-01/8.47e-09
Epoch 150: train=0.163 test=0.010 λ=1.583 σ=1.38e-01/8.27e-09
Best test acc: 0.025
============================================================
Depth = 12 conv layers (4 stages × 3 blocks)
============================================================
Vanilla: depth=12, params=8,027,556
Epoch 10: train=0.214 test=0.048 σ=6.28e-01/2.26e-08
Epoch 20: train=0.293 test=0.067 σ=3.29e-01/1.56e-08
Epoch 30: train=0.342 test=0.086 σ=2.69e-01/1.36e-08
Epoch 40: train=0.383 test=0.097 σ=2.47e-01/1.30e-08
Epoch 50: train=0.420 test=0.080 σ=2.44e-01/1.30e-08
Epoch 60: train=0.451 test=0.116 σ=2.32e-01/1.26e-08
Epoch 70: train=0.484 test=0.102 σ=2.35e-01/1.23e-08
Epoch 80: train=0.518 test=0.114 σ=2.31e-01/1.26e-08
Epoch 90: train=0.547 test=0.118 σ=2.30e-01/1.23e-08
Epoch 100: train=0.576 test=0.118 σ=2.30e-01/1.23e-08
Epoch 110: train=0.598 test=0.114 σ=2.37e-01/1.23e-08
Epoch 120: train=0.619 test=0.110 σ=2.33e-01/1.22e-08
Epoch 130: train=0.629 test=0.121 σ=2.33e-01/1.23e-08
Epoch 140: train=0.637 test=0.116 σ=2.31e-01/1.20e-08
Epoch 150: train=0.638 test=0.116 σ=2.34e-01/1.22e-08
Best test acc: 0.136
Lyapunov: depth=12, params=8,027,556
Epoch 10: train=0.031 test=0.012 λ=1.794 σ=4.39e-01/1.15e-08
Epoch 20: train=0.028 test=0.009 λ=1.711 σ=1.99e-01/4.47e-09
Epoch 30: train=0.028 test=0.010 λ=1.684 σ=1.36e-01/2.73e-09
Epoch 40: train=0.023 test=0.006 λ=1.653 σ=1.19e-01/4.52e-12
Epoch 50: train=0.037 test=0.010 λ=1.668 σ=1.13e-01/2.71e-09
Epoch 60: train=0.029 test=0.010 λ=1.646 σ=1.11e-01/8.32e-12
Epoch 70: train=0.021 test=0.010 λ=1.727 σ=1.30e-01/5.03e-13
Epoch 80: train=0.024 test=0.010 λ=1.749 σ=1.01e-01/9.40e-13
Epoch 90: train=0.022 test=0.010 λ=1.665 σ=8.78e-02/9.05e-13
Epoch 100: train=0.022 test=0.010 λ=1.676 σ=7.62e-02/9.14e-13
Epoch 110: train=0.025 test=0.010 λ=1.660 σ=8.45e-02/1.40e-12
Epoch 120: train=0.024 test=0.010 λ=1.627 σ=8.26e-02/1.30e-12
Epoch 130: train=0.024 test=0.010 λ=1.663 σ=8.21e-02/7.96e-13
Epoch 140: train=0.028 test=0.010 λ=1.644 σ=9.22e-02/3.67e-12
Epoch 150: train=0.029 test=0.010 λ=1.647 σ=9.05e-02/2.90e-12
Best test acc: 0.014
============================================================
Depth = 16 conv layers (4 stages × 4 blocks)
============================================================
Vanilla: depth=16, params=11,162,916
Epoch 10: train=0.091 test=0.011 σ=4.40e-01/1.32e-08
Epoch 20: train=0.135 test=0.014 σ=2.84e-01/1.06e-08
Epoch 30: train=0.157 test=0.017 σ=2.21e-01/9.39e-09
Epoch 40: train=0.174 test=0.021 σ=2.00e-01/9.09e-09
Epoch 50: train=0.190 test=0.021 σ=1.78e-01/8.83e-09
Epoch 60: train=0.201 test=0.023 σ=1.72e-01/8.80e-09
Epoch 70: train=0.214 test=0.026 σ=1.62e-01/8.89e-09
Epoch 80: train=0.228 test=0.025 σ=1.63e-01/8.94e-09
Epoch 90: train=0.238 test=0.027 σ=1.58e-01/9.07e-09
Epoch 100: train=0.249 test=0.025 σ=1.61e-01/9.11e-09
Epoch 110: train=0.256 test=0.029 σ=1.59e-01/9.10e-09
Epoch 120: train=0.261 test=0.027 σ=1.63e-01/9.11e-09
Epoch 130: train=0.270 test=0.027 σ=1.60e-01/9.22e-09
Epoch 140: train=0.270 test=0.027 σ=1.63e-01/9.32e-09
Epoch 150: train=0.272 test=0.027 σ=1.65e-01/9.27e-09
Best test acc: 0.033
Lyapunov: depth=16, params=11,162,916
Epoch 10: train=0.019 test=0.010 λ=1.891 σ=4.08e-01/7.96e-09
Epoch 20: train=0.018 test=0.010 λ=1.853 σ=1.49e-01/4.73e-11
Epoch 30: train=0.016 test=0.010 λ=2.038 σ=1.09e-01/1.08e-12
Epoch 40: train=0.016 test=0.007 λ=1.845 σ=9.66e-02/4.94e-14
Epoch 50: train=0.012 test=0.010 λ=1.807 σ=1.11e-01/3.35e-27
Epoch 60: train=0.013 test=0.009 λ=1.801 σ=1.01e-01/2.59e-28
Epoch 70: train=0.013 test=0.010 λ=2.064 σ=1.36e-01/9.48e-16
Epoch 80: train=0.020 test=0.010 λ=2.055 σ=1.11e-01/7.37e-14
Epoch 90: train=0.017 test=0.010 λ=1.959 σ=1.20e-01/1.56e-13
Epoch 100: train=0.022 test=0.010 λ=1.887 σ=1.01e-01/4.19e-13
Epoch 110: train=0.019 test=0.010 λ=1.881 σ=9.46e-02/4.77e-13
Epoch 120: train=0.018 test=0.010 λ=1.889 σ=8.10e-02/9.50e-14
Epoch 130: train=0.014 test=0.010 λ=1.892 σ=7.23e-02/1.42e-14
Epoch 140: train=0.015 test=0.010 λ=1.898 σ=7.02e-02/1.63e-14
Epoch 150: train=0.015 test=0.010 λ=1.899 σ=7.15e-02/1.18e-14
Best test acc: 0.012
====================================================================================================
DEPTH SCALING RESULTS: CIFAR100
====================================================================================================
Depth Vanilla Acc Lyapunov Acc Δ Acc Lyap λ Van ∇norm Lyap ∇norm Van κ
----------------------------------------------------------------------------------------------------
4 0.615 0.475 -0.140 1.937 4.58e-01 5.38e-01 5.0e+08
8 0.534 0.010 -0.524 1.583 3.88e-01 4.57e-01 3.6e+08
12 0.116 0.010 -0.106 1.647 6.51e-01 2.14e-01 5.8e+08
16 0.027 0.010 -0.017 1.899 5.07e-01 1.38e-01 3.8e+07
====================================================================================================
GRADIENT HEALTH ANALYSIS:
Depth 4: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 8: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 12: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
Depth 16: ⚠️ Vanilla has ill-conditioned gradients (κ > 1e6)
KEY OBSERVATIONS:
Vanilla 4→16 layers: -0.588 accuracy change
Lyapunov 4→16 layers: -0.465 accuracy change
✓ Lyapunov regularization enables better depth scaling!
Results saved to runs/depth_scaling_extreme/cifar100_20260102-133536
============================================================
Finished: Fri Jan 2 13:35:39 CST 2026
============================================================
|