Device: cuda:0, seed=456, epochs=30

=== BP frozen-blocks baseline (4 random-init transformer blocks, frozen), seed=456 ===
BP-frozen-blocks: 16266/809354 params trainable
  BP-frozen ep 1: test_acc=0.3755
  BP-frozen ep 5: test_acc=0.4748
  BP-frozen ep 10: test_acc=0.5053
  BP-frozen ep 15: test_acc=0.5210
  BP-frozen ep 20: test_acc=0.5304
  BP-frozen ep 25: test_acc=0.5443
  BP-frozen ep 30: test_acc=0.5410
FINAL BP-frozen-blocks acc: 0.5410

=== DFA frozen-blocks baseline, seed=456 ===
DFA-frozen-blocks: 16266/809354 params trainable
  DFA-frozen ep 1: test_acc=0.2538
  DFA-frozen ep 5: test_acc=0.2617
  DFA-frozen ep 10: test_acc=0.2537
  DFA-frozen ep 15: test_acc=0.2571
  DFA-frozen ep 20: test_acc=0.2540
  DFA-frozen ep 25: test_acc=0.2540
  DFA-frozen ep 30: test_acc=0.2540
FINAL DFA-frozen-blocks acc: 0.2540

=== Summary ===
BP-frozen-blocks: 0.5410  (chance=0.10)
DFA-frozen-blocks: 0.2540
Compare to ViT-Mini 4-block trainable (3-seed avg): BP=0.792, DFA=0.237
Compare to ViT-Mini 0-block (shallow baseline): BP=0.10, DFA=0.10

Interpretation:
  If DFA-frozen-blocks ≈ 0.237: blocks are passengers, DFA is just learning patch_embed+head
  If DFA-frozen-blocks << 0.237: trainable blocks ARE doing learned work
  If DFA-frozen-blocks ~ 0.10: untrained blocks add no useful mixing (less informative)