dagformer.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	YurenHao0426 <blackhao0426@gmail.com>	2026-02-09 11:23:15 -0600
committer	YurenHao0426 <blackhao0426@gmail.com>	2026-02-09 11:23:15 -0600
commit	93d77b197d457b1fdfa7341ecd59fc460b20d6b1 (patch)
tree	0becc0a9c122ddd80a2f88431546a59b3915e0e3 /src/model/__init__.py
parent	13ddc8dc583d8b1355909970cb8c27f85b7d3c8b (diff)

Fix init state: add logit_bias so A≈1 at init (dense connectivity)

- Add learnable logit_bias=15.0 to PredictorMLP, so σ(15/τ_init) ≈ 0.95 at init, reproducing dense connectivity instead of random A≈0.25 - Fix dtype mismatch: cast A to model dtype (bfloat16) in DAGFormerOLMo.forward - Fix YAML lr parsing: add type coercion in TrainConfig.from_yaml - Fix device mismatch: call self.to(device) in StructurePredictor.__init__ - Add python -u for unbuffered SLURM output, TOKENIZERS_PARALLELISM=false - Delete stale eval_cache.pt (built with buggy MLP input code) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Diffstat (limited to 'src/model/__init__.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: