diff options
Diffstat (limited to 'paper/main.tex')
| -rw-r--r-- | paper/main.tex | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/paper/main.tex b/paper/main.tex index e5e5e5a..184d054 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -309,9 +309,9 @@ Max Jaderberg, Wojciech~M. Czarnecki, Simon Osindero, Oriol Vinyals, Alex \section{Reference Implementation} \label{app:reference_impl} -We will release a reference implementation at \url{https://github.com/REPO-URL-TO-BE-INSERTED}. The release is intended to make the evaluation protocol easy to run and difficult to misreport: it contains one command path for training or loading checkpoints, one command path for computing the four diagnostics, and one command path for rendering the audit tables and figures used in the paper. The reference code should be treated as part of the evaluation artifact rather than as an auxiliary convenience, because several of the failure cases in this paper arise from seemingly minor choices in how gradients, layers, and baselines are measured. +\paragraph{Release scope.} We will release a reference implementation at \url{https://github.com/REPO-URL-TO-BE-INSERTED}. The release is intended to make the evaluation protocol easy to run and difficult to misreport: it contains one command path for training or loading checkpoints, one command path for computing the four diagnostics, and one command path for rendering the audit tables and figures used in the paper. The reference code should be treated as part of the evaluation artifact rather than as an auxiliary convenience, because several of the failure cases in this paper arise from seemingly minor choices in how gradients, layers, and baselines are measured. -The repository is organized around the claims in the paper rather than around model classes. A minimal run should expose: (i) architecture-matched trainable-block and random-block baselines, (ii) per-layer residual-scale and BP-gradient measurements at fixed checkpoints, (iii) deep-layer cosine computations with the exact batch and masking conventions used by the audit, and (iv) summary scripts that emit the tables underlying \autoref{tab:main_audit}, \autoref{tab:mode_validation}, and \autoref{tab:protocol_def}. The goal is that an outside reader can reproduce both the verdict and the reason for the verdict from a single checkpoint bundle without reverse-engineering hidden notebook logic. +\paragraph{Repository organization.} The repository is organized around the claims in the paper rather than around model classes. A minimal run should expose: (i) architecture-matched trainable-block and random-block baselines, (ii) per-layer residual-scale and BP-gradient measurements at fixed checkpoints, (iii) deep-layer cosine computations with the exact batch and masking conventions used by the audit, and (iv) summary scripts that emit the tables underlying \autoref{tab:main_audit}, \autoref{tab:mode_validation}, and \autoref{tab:protocol_def}. The goal is that an outside reader can reproduce both the verdict and the reason for the verdict from a single checkpoint bundle without reverse-engineering hidden notebook logic. \section{Pipeline Pitfalls Catalog} \label{app:pipeline_pitfalls} |
