| Age | Commit message (Collapse) | Author |
|
- Auto-resume: find latest checkpoint in save_dir on startup
- SIGUSR1 handler: save checkpoint before SLURM timeout
- S1 config (constant tau=5, identity init verification)
- S2 config (constant tau=2, gradient flow check)
- Experiment results tracker with S0/S1 data
- Speed estimates and experiment plan
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Add learnable logit_bias=15.0 to PredictorMLP, so σ(15/τ_init) ≈ 0.95
at init, reproducing dense connectivity instead of random A≈0.25
- Fix dtype mismatch: cast A to model dtype (bfloat16) in DAGFormerOLMo.forward
- Fix YAML lr parsing: add type coercion in TrainConfig.from_yaml
- Fix device mismatch: call self.to(device) in StructurePredictor.__init__
- Add python -u for unbuffered SLURM output, TOKENIZERS_PARALLELISM=false
- Delete stale eval_cache.pt (built with buggy MLP input code)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- olmo_graph.py: Modified OLMo2-1B forward with per-head routing via 256x256 adjacency matrix A
- Proportional attribution for post-norm decomposition
- All 6 GPU sanity checks pass (baseline diff = 0.000001)
- predictor.py: Qwen3-Embedding encoder + MLP decoder + Gumbel-Sigmoid + cascading gate
- pipeline.py: End-to-end glue (predictor -> A -> OLMo -> NLL)
- trainer.py: Full training loop with DDP, gradient accumulation, eval, checkpointing
- dolma.py: Streaming Dolma v1.7 with sequence packing
- 43/43 unit tests pass
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|