blob: 6a4f6c3c395391a5b693ba9e374e52285a7986e8 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
Device: cuda:0, seed=123, epochs=30
=== BP frozen-blocks baseline (4 random-init transformer blocks, frozen), seed=123 ===
BP-frozen-blocks: 16266/809354 params trainable
BP-frozen ep 1: test_acc=0.3805
BP-frozen ep 5: test_acc=0.4832
BP-frozen ep 10: test_acc=0.5225
BP-frozen ep 15: test_acc=0.5236
BP-frozen ep 20: test_acc=0.5381
BP-frozen ep 25: test_acc=0.5519
BP-frozen ep 30: test_acc=0.5521
FINAL BP-frozen-blocks acc: 0.5521
=== DFA frozen-blocks baseline, seed=123 ===
DFA-frozen-blocks: 16266/809354 params trainable
DFA-frozen ep 1: test_acc=0.2587
DFA-frozen ep 5: test_acc=0.2585
DFA-frozen ep 10: test_acc=0.2597
DFA-frozen ep 15: test_acc=0.2508
DFA-frozen ep 20: test_acc=0.2578
DFA-frozen ep 25: test_acc=0.2553
DFA-frozen ep 30: test_acc=0.2605
FINAL DFA-frozen-blocks acc: 0.2605
=== Summary ===
BP-frozen-blocks: 0.5521 (chance=0.10)
DFA-frozen-blocks: 0.2605
Compare to ViT-Mini 4-block trainable (3-seed avg): BP=0.792, DFA=0.237
Compare to ViT-Mini 0-block (shallow baseline): BP=0.10, DFA=0.10
Interpretation:
If DFA-frozen-blocks ≈ 0.237: blocks are passengers, DFA is just learning patch_embed+head
If DFA-frozen-blocks << 0.237: trainable blocks ARE doing learned work
If DFA-frozen-blocks ~ 0.10: untrained blocks add no useful mixing (less informative)
|