<feed xmlns='http://www.w3.org/2005/Atom'>
<title>dagformer.git/tests, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/dagformer.git/'/>
<entry>
<title>Fix cascading gate: exempt layer 0 from disconnection check</title>
<updated>2026-02-09T20:40:31+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>blackhao0426@gmail.com</email>
</author>
<published>2026-02-09T20:40:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/dagformer.git/commit/?id=80579d6cc254d337a23e71404ae7ecab1849d1e5'/>
<id>80579d6cc254d337a23e71404ae7ecab1849d1e5</id>
<content type='text'>
Layer 0 has no incoming edges structurally (no prior layers), but
receives the embedding as input. The cascading gate was killing its
outgoing edges (hard: g=0, soft: g=0.5), causing nll_hard to be
~2x worse than baseline. Fix: set g=1 for layer 0 nodes.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Layer 0 has no incoming edges structurally (no prior layers), but
receives the embedding as input. The cascading gate was killing its
outgoing edges (hard: g=0, soft: g=0.5), causing nll_hard to be
~2x worse than baseline. Fix: set g=1 for layer 0 nodes.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix NLL double-shift bug and head weight init</title>
<updated>2026-02-09T18:28:55+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>blackhao0426@gmail.com</email>
</author>
<published>2026-02-09T18:28:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/dagformer.git/commit/?id=ef678d2e1ba70b1a9dadb78c73ed372f986aea13'/>
<id>ef678d2e1ba70b1a9dadb78c73ed372f986aea13</id>
<content type='text'>
- NLL loss was shifting labels twice (olmo_labels already shifted,
  then code did logits[:,:-1] vs labels[:,1:]). Fixed in 9 locations:
  trainer, pipeline, olmo_graph, sanity_check, eval.
- Head U/V weights init with std=0.01 (was Kaiming ~5.7 std) so
  UV^T≈0 at init, ensuring Z≈logit_bias=15 and A≈0.953.
- Updated SVD rank test to subtract logit_bias before checking.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- NLL loss was shifting labels twice (olmo_labels already shifted,
  then code did logits[:,:-1] vs labels[:,1:]). Fixed in 9 locations:
  trainer, pipeline, olmo_graph, sanity_check, eval.
- Head U/V weights init with std=0.01 (was Kaiming ~5.7 std) so
  UV^T≈0 at init, ensuring Z≈logit_bias=15 and A≈0.953.
- Updated SVD rank test to subtract logit_bias before checking.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Initial implementation: DAGFormer Phase 1</title>
<updated>2026-02-09T17:00:39+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>blackhao0426@gmail.com</email>
</author>
<published>2026-02-09T17:00:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/dagformer.git/commit/?id=13ddc8dc583d8b1355909970cb8c27f85b7d3c8b'/>
<id>13ddc8dc583d8b1355909970cb8c27f85b7d3c8b</id>
<content type='text'>
- olmo_graph.py: Modified OLMo2-1B forward with per-head routing via 256x256 adjacency matrix A
  - Proportional attribution for post-norm decomposition
  - All 6 GPU sanity checks pass (baseline diff = 0.000001)
- predictor.py: Qwen3-Embedding encoder + MLP decoder + Gumbel-Sigmoid + cascading gate
- pipeline.py: End-to-end glue (predictor -&gt; A -&gt; OLMo -&gt; NLL)
- trainer.py: Full training loop with DDP, gradient accumulation, eval, checkpointing
- dolma.py: Streaming Dolma v1.7 with sequence packing
- 43/43 unit tests pass

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- olmo_graph.py: Modified OLMo2-1B forward with per-head routing via 256x256 adjacency matrix A
  - Proportional attribution for post-norm decomposition
  - All 6 GPU sanity checks pass (baseline diff = 0.000001)
- predictor.py: Qwen3-Embedding encoder + MLP decoder + Gumbel-Sigmoid + cascading gate
- pipeline.py: End-to-end glue (predictor -&gt; A -&gt; OLMo -&gt; NLL)
- trainer.py: Full training loop with DDP, gradient accumulation, eval, checkpointing
- dolma.py: Streaming Dolma v1.7 with sequence packing
- 43/43 unit tests pass

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
