<feed xmlns='http://www.w3.org/2005/Atom'>
<title>dagformer.git/configs, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/dagformer.git/'/>
<entry>
<title>A12-A14 init_logit ablation: confirm frozen OLMo cannot benefit from sparse topology</title>
<updated>2026-02-11T20:21:11+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>blackhao0426@gmail.com</email>
</author>
<published>2026-02-11T20:21:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/dagformer.git/commit/?id=c69a4c6e3596f75bd392c27d3c072adc825ce497'/>
<id>c69a4c6e3596f75bd392c27d3c072adc825ce497</id>
<content type='text'>
- A12 (logit=3): NLL 2.76, A13 (logit=0): NLL 3.51, A14 (logit=1): NLL 3.26
- All worse than baseline (2.46). Lower init_logit = more deviation = worse NLL
- Confirms: gradient flows (gates move), but A=1 is global optimum for frozen model
- Added Dolma streaming retry logic (max 10 retries, 30s wait)
- Phase 1 frozen approach has fundamental limitation; Phase 2 (unfreeze) needed

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- A12 (logit=3): NLL 2.76, A13 (logit=0): NLL 3.51, A14 (logit=1): NLL 3.26
- All worse than baseline (2.46). Lower init_logit = more deviation = worse NLL
- Confirms: gradient flows (gates move), but A=1 is global optimum for frozen model
- Added Dolma streaming retry logic (max 10 retries, 30s wait)
- Phase 1 frozen approach has fundamental limitation; Phase 2 (unfreeze) needed

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add auto-resume checkpointing, S1/S2 configs, and experiment results</title>
<updated>2026-02-10T15:50:33+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>blackhao0426@gmail.com</email>
</author>
<published>2026-02-10T15:50:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/dagformer.git/commit/?id=039c12d3cf7178db6a7d80b02cf022d67231014e'/>
<id>039c12d3cf7178db6a7d80b02cf022d67231014e</id>
<content type='text'>
- Auto-resume: find latest checkpoint in save_dir on startup
- SIGUSR1 handler: save checkpoint before SLURM timeout
- S1 config (constant tau=5, identity init verification)
- S2 config (constant tau=2, gradient flow check)
- Experiment results tracker with S0/S1 data
- Speed estimates and experiment plan

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- Auto-resume: find latest checkpoint in save_dir on startup
- SIGUSR1 handler: save checkpoint before SLURM timeout
- S1 config (constant tau=5, identity init verification)
- S2 config (constant tau=2, gradient flow check)
- Experiment results tracker with S0/S1 data
- Speed estimates and experiment plan

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Initial implementation: DAGFormer Phase 1</title>
<updated>2026-02-09T17:00:39+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>blackhao0426@gmail.com</email>
</author>
<published>2026-02-09T17:00:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/dagformer.git/commit/?id=13ddc8dc583d8b1355909970cb8c27f85b7d3c8b'/>
<id>13ddc8dc583d8b1355909970cb8c27f85b7d3c8b</id>
<content type='text'>
- olmo_graph.py: Modified OLMo2-1B forward with per-head routing via 256x256 adjacency matrix A
  - Proportional attribution for post-norm decomposition
  - All 6 GPU sanity checks pass (baseline diff = 0.000001)
- predictor.py: Qwen3-Embedding encoder + MLP decoder + Gumbel-Sigmoid + cascading gate
- pipeline.py: End-to-end glue (predictor -&gt; A -&gt; OLMo -&gt; NLL)
- trainer.py: Full training loop with DDP, gradient accumulation, eval, checkpointing
- dolma.py: Streaming Dolma v1.7 with sequence packing
- 43/43 unit tests pass

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- olmo_graph.py: Modified OLMo2-1B forward with per-head routing via 256x256 adjacency matrix A
  - Proportional attribution for post-norm decomposition
  - All 6 GPU sanity checks pass (baseline diff = 0.000001)
- predictor.py: Qwen3-Embedding encoder + MLP decoder + Gumbel-Sigmoid + cascading gate
- pipeline.py: End-to-end glue (predictor -&gt; A -&gt; OLMo -&gt; NLL)
- trainer.py: Full training loop with DDP, gradient accumulation, eval, checkpointing
- dolma.py: Streaming Dolma v1.7 with sequence packing
- 43/43 unit tests pass

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
