Per-token context-conditioned DAG · Head-level 256×256 upper-triangular adjacency · Cascading activation gate
⚡ Topology predicted before each forward pass · Fully differentiable via continuous relaxation
Cascading Gate enforces: no incoming edges → no outgoing edges · differentiable via soft sigmoid gate Future: Phase 1 data → train diffusion decoder to capture multi-modal optimal topologies