summaryrefslogtreecommitdiff
path: root/docs/hardware/COLLABORATOR_BRIEF.md
diff options
context:
space:
mode:
authorYuren Hao <yurenh2@illinois.edu>2026-07-03 05:56:50 -0500
committerYuren Hao <yurenh2@illinois.edu>2026-07-03 05:56:50 -0500
commitb83947778e2c776f757a07d4719b7ce961d7ed55 (patch)
treeb9cc01d7adda691d9156d9d04f4fb2f644674e96 /docs/hardware/COLLABORATOR_BRIEF.md
Initial commit: ept — backprop-free equilibrium transformer (EP)
Code (ep_run/), organized docs (docs/{method,campaign,hardware,outreach,paper}), analysis scripts (scripts/), ONBOARDING.md entry point. Large data/checkpoints git-ignored (share separately). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014FAPDWQ49M5Ye3NpTndTpn
Diffstat (limited to 'docs/hardware/COLLABORATOR_BRIEF.md')
-rw-r--r--docs/hardware/COLLABORATOR_BRIEF.md46
1 files changed, 46 insertions, 0 deletions
diff --git a/docs/hardware/COLLABORATOR_BRIEF.md b/docs/hardware/COLLABORATOR_BRIEF.md
new file mode 100644
index 0000000..3728657
--- /dev/null
+++ b/docs/hardware/COLLABORATOR_BRIEF.md
@@ -0,0 +1,46 @@
+# Backprop-free analog training of a transformer — collaboration brief
+**One-page ask for hardware-side collaborators · 2026-06-21 · Yuren Hao (UIUC)**
+
+## The idea in three sentences
+We train a **transformer block as a physical equilibrium (fixed-point) system** using **Equilibrium Propagation
+(EP)** — no backpropagation. The forward pass is a damped relaxation `z ← z + ε·F(z)` that **settles** to a fixed
+point (on analog hardware, the settling *is* the physics — nearly free); the weight update is **local**, computed
+from the contrast between a free settle and a slightly-nudged settle. This is exactly the computation an analog
+in-memory / memristive array is good at — and unlike every shipping analog-AI chip (all inference-only), it needs
+**in-situ weight update**, which is the open opportunity.
+
+## Why now / why it's real (not speculative)
+- **Algorithm side (ours, in simulation):** EP's gradient matches true backprop (cosine ≈ 0.99–1.0 per component);
+ the equilibrium transformer trains stably and **matches/beats a same-parameter BP transformer** on language modeling.
+ Currently scaling the recipe; a fix for the one known instability (a residual-defense term) is under validation.
+- **Hardware precedent exists:** local contrastive/EP learning has been physically demonstrated (self-learning analog
+ resistor networks, ~1 µs settling, on-chip weight update from a local free-vs-clamped difference; EP on a D-Wave
+ Ising machine). **But nobody has built an EP-trained *transformer* in analog hardware — that is the first-mover demo.**
+- **Endurance clears the bar:** HfOx-class RRAM survives ~10^10 write cycles; a training run needs ≤10^8 device writes
+ (fewer with digital-accumulate-then-threshold-program). Endurance is not the blocker — update linearity/symmetry is
+ the real device challenge.
+
+## What a hardware demo needs (three layers) — and the UIUC ECE fit
+| Layer | What it does | Closest collaborator |
+|---|---|---|
+| **Trainable device** | in-situ-updatable analog weights (RRAM/FeFET/ECRAM) — *the part you cannot buy* | **Wenjuan Zhu** (UIUC ECE, memristor/RRAM/FeFET/2D devices) |
+| **In-memory MVM circuit** | analog matrix-vector multiply + on-chip weight write-back | **Naresh Shanbhag** (UIUC ECE) — his JSSC-2018 DIMA chip *already* does analog MVM **+ on-chip SGD weight write-back** in 65nm; nearest existing substrate |
+| **Mixed-signal glue / control loop** | ADC/DAC to read settled states + apply the nudge; switched-cap integrators = relaxation primitives | **Pavan Hanumolu** (UIUC ECE, data converters / PLL / switched-cap) |
+| **EP control + sim** | the settle→nudge→settle→local-Δθ loop, noise/endurance de-risk in simulation | **us** (FPGA + the trained model + analog-noise sim already built) |
+
+**Escalation / device frontier:** **H.-S. Philip Wong (黄汉森, Stanford EE / TSMC Chief Scientist)** — NeuRRAM (Nature
+2022) is the most EP-relevant analog-MVM substrate (inference-only today); the RRAM-device heavyweight + a TSMC-foundry
+path, reachable via a Stanford student contact.
+
+## The concrete ask (staged, modular — stitch existing capabilities, no startup-scale custom fab)
+- **Phase 1:** put ONE equilibrium-transformer block on an existing in-situ-trainable substrate (Shanbhag's DIMA-class
+ chip + Hanumolu converter/integrator glue; Zhu devices) + our FPGA EP-control loop → prove end-to-end analog EP training.
+- **Phase 2:** scale weights (foundry RRAM MPW — e.g. SkyWater S130 + Weebit ReRAM IP — or a fixed-weight inference array
+ for the forward path with the trainable layer in-situ).
+- **What we bring:** the validated algorithm, the trained model + scaling data, the EP control logic, and a simulator
+ that already models analog non-idealities (device noise / quantization / asymmetric update) to de-risk before tape-out.
+
+**Bottom line:** the science is done in sim and the hardware pieces all exist in-house at UIUC ECE — this is a
+stitching + first-demo opportunity, not a multi-year custom-silicon program.
+
+*(Backing detail + citations: HW_RESEARCH_FINDINGS.md; method: ept_method_intro.pdf)*