# Backprop-free analog training of a transformer — collaboration brief
**One-page ask for hardware-side collaborators · 2026-06-21 · Yuren Hao (UIUC)**

## The idea in three sentences
We train a **transformer block as a physical equilibrium (fixed-point) system** using **Equilibrium Propagation
(EP)** — no backpropagation. The forward pass is a damped relaxation `z ← z + ε·F(z)` that **settles** to a fixed
point (on analog hardware, the settling *is* the physics — nearly free); the weight update is **local**, computed
from the contrast between a free settle and a slightly-nudged settle. This is exactly the computation an analog
in-memory / memristive array is good at — and unlike every shipping analog-AI chip (all inference-only), it needs
**in-situ weight update**, which is the open opportunity.

## Why now / why it's real (not speculative)
- **Algorithm side (ours, in simulation):** EP's gradient matches true backprop (cosine ≈ 0.99–1.0 per component);
  the equilibrium transformer trains stably and **matches/beats a same-parameter BP transformer** on language modeling.
  Currently scaling the recipe; a fix for the one known instability (a residual-defense term) is under validation.
- **Hardware precedent exists:** local contrastive/EP learning has been physically demonstrated (self-learning analog
  resistor networks, ~1 µs settling, on-chip weight update from a local free-vs-clamped difference; EP on a D-Wave
  Ising machine). **But nobody has built an EP-trained *transformer* in analog hardware — that is the first-mover demo.**
- **Endurance clears the bar:** HfOx-class RRAM survives ~10^10 write cycles; a training run needs ≤10^8 device writes
  (fewer with digital-accumulate-then-threshold-program). Endurance is not the blocker — update linearity/symmetry is
  the real device challenge.

## What a hardware demo needs (three layers) — and the UIUC ECE fit
| Layer | What it does | Closest collaborator |
|---|---|---|
| **Trainable device** | in-situ-updatable analog weights (RRAM/FeFET/ECRAM) — *the part you cannot buy* | **Wenjuan Zhu** (UIUC ECE, memristor/RRAM/FeFET/2D devices) |
| **In-memory MVM circuit** | analog matrix-vector multiply + on-chip weight write-back | **Naresh Shanbhag** (UIUC ECE) — his JSSC-2018 DIMA chip *already* does analog MVM **+ on-chip SGD weight write-back** in 65nm; nearest existing substrate |
| **Mixed-signal glue / control loop** | ADC/DAC to read settled states + apply the nudge; switched-cap integrators = relaxation primitives | **Pavan Hanumolu** (UIUC ECE, data converters / PLL / switched-cap) |
| **EP control + sim** | the settle→nudge→settle→local-Δθ loop, noise/endurance de-risk in simulation | **us** (FPGA + the trained model + analog-noise sim already built) |

**Escalation / device frontier:** **H.-S. Philip Wong (黄汉森, Stanford EE / TSMC Chief Scientist)** — NeuRRAM (Nature
2022) is the most EP-relevant analog-MVM substrate (inference-only today); the RRAM-device heavyweight + a TSMC-foundry
path, reachable via a Stanford student contact.

## The concrete ask (staged, modular — stitch existing capabilities, no startup-scale custom fab)
- **Phase 1:** put ONE equilibrium-transformer block on an existing in-situ-trainable substrate (Shanbhag's DIMA-class
  chip + Hanumolu converter/integrator glue; Zhu devices) + our FPGA EP-control loop → prove end-to-end analog EP training.
- **Phase 2:** scale weights (foundry RRAM MPW — e.g. SkyWater S130 + Weebit ReRAM IP — or a fixed-weight inference array
  for the forward path with the trainable layer in-situ).
- **What we bring:** the validated algorithm, the trained model + scaling data, the EP control logic, and a simulator
  that already models analog non-idealities (device noise / quantization / asymmetric update) to de-risk before tape-out.

**Bottom line:** the science is done in sim and the hardware pieces all exist in-house at UIUC ECE — this is a
stitching + first-demo opportunity, not a multi-year custom-silicon program.

*(Backing detail + citations: HW_RESEARCH_FINDINGS.md; method: ept_method_intro.pdf)*