=== full + component toggles (ms/step, B=24, C512) === /home/yurenh2/miniconda3/lib/python3.13/site-packages/torch/autograd/graph.py:865: UserWarning: Attempting to run cuBLAS, but there was no current CUDA context! Attempting to set the primary context... (Triggered internally at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:330.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass FULL ep_step: 7266 -jacreg: 7242 -resreg: 7312 -t1max(no refine): 5886 t2sel=80: 7384 t2sel=40: 4485 plain nudge holo=0 T2=20: 3179 free relax T1=150 alone: 740 free relax T1=300 alone: 1480 === batch sweep (full) === B=8: 2353 ms (294.1/sample) B=24: 7405 ms (308.5/sample) B=48: 14496 ms (302.0/sample) === compile free relax === free relax T1=150 COMPILED: 507 === bf16 full === full bf16: ERR RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn DONE