|
WARNING: All methods (including BP) show near-zero BP hidden gradients (~1e-12-1e-14)
when computed via manual forward with detached hidden states. This is inconsistent with
the earlier first-priority analysis which showed BP at 2.86e-04. Investigation needed.
T1: 40 rows (4 methods × 10 seeds) - full metrics
T2: 800 rows (support sparsity, 5 thresholds × 4 methods × 10 seeds × 4 layers)
T3: 48 rows (gradient norm distributions, 3 seeds × 4 methods × 4 layers)
T4: 100 rows (active-subset Gamma, 5 thresholds × 2 methods × 10 seeds)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|