diff options
Diffstat (limited to 'results/vit_frozen_blocks_s123.log')
| -rw-r--r-- | results/vit_frozen_blocks_s123.log | 34 |
1 files changed, 34 insertions, 0 deletions
diff --git a/results/vit_frozen_blocks_s123.log b/results/vit_frozen_blocks_s123.log new file mode 100644 index 0000000..6a4f6c3 --- /dev/null +++ b/results/vit_frozen_blocks_s123.log @@ -0,0 +1,34 @@ +Device: cuda:0, seed=123, epochs=30 + +=== BP frozen-blocks baseline (4 random-init transformer blocks, frozen), seed=123 === +BP-frozen-blocks: 16266/809354 params trainable + BP-frozen ep 1: test_acc=0.3805 + BP-frozen ep 5: test_acc=0.4832 + BP-frozen ep 10: test_acc=0.5225 + BP-frozen ep 15: test_acc=0.5236 + BP-frozen ep 20: test_acc=0.5381 + BP-frozen ep 25: test_acc=0.5519 + BP-frozen ep 30: test_acc=0.5521 +FINAL BP-frozen-blocks acc: 0.5521 + +=== DFA frozen-blocks baseline, seed=123 === +DFA-frozen-blocks: 16266/809354 params trainable + DFA-frozen ep 1: test_acc=0.2587 + DFA-frozen ep 5: test_acc=0.2585 + DFA-frozen ep 10: test_acc=0.2597 + DFA-frozen ep 15: test_acc=0.2508 + DFA-frozen ep 20: test_acc=0.2578 + DFA-frozen ep 25: test_acc=0.2553 + DFA-frozen ep 30: test_acc=0.2605 +FINAL DFA-frozen-blocks acc: 0.2605 + +=== Summary === +BP-frozen-blocks: 0.5521 (chance=0.10) +DFA-frozen-blocks: 0.2605 +Compare to ViT-Mini 4-block trainable (3-seed avg): BP=0.792, DFA=0.237 +Compare to ViT-Mini 0-block (shallow baseline): BP=0.10, DFA=0.10 + +Interpretation: + If DFA-frozen-blocks ≈ 0.237: blocks are passengers, DFA is just learning patch_embed+head + If DFA-frozen-blocks << 0.237: trainable blocks ARE doing learned work + If DFA-frozen-blocks ~ 0.10: untrained blocks add no useful mixing (less informative) |
