From cd99d6b874d9d09b3bb87b8485cc787885af71f1 Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Tue, 13 Jan 2026 23:49:05 -0600 Subject: init commit --- runs/slurm_logs/14632859_speedup.out | 55 ++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) create mode 100644 runs/slurm_logs/14632859_speedup.out (limited to 'runs/slurm_logs/14632859_speedup.out') diff --git a/runs/slurm_logs/14632859_speedup.out b/runs/slurm_logs/14632859_speedup.out new file mode 100644 index 0000000..8589010 --- /dev/null +++ b/runs/slurm_logs/14632859_speedup.out @@ -0,0 +1,55 @@ +============================================================ +Lyapunov Speedup Benchmark +Job ID: 14632859 | Node: gpub073 +Start: Tue Dec 30 06:43:26 CST 2025 +============================================================ +NVIDIA A40, 46068 MiB +============================================================ +================================================================================ +LYAPUNOV COMPUTATION SPEEDUP BENCHMARK +================================================================================ +Batch size: 64 +Timesteps: 4 +Hidden dims: [64, 128, 256] +Device: cuda +================================================================================ + +[1/6] Benchmarking Baseline... +[2/6] Benchmarking Approach A (batched)... +[3/6] Benchmarking Approach B (global renorm)... +[4/6] Benchmarking Approach A+B (combined)... +[5/6] Benchmarking Approach C (compiled baseline)... +[6/6] Benchmarking A+B+C (all optimizations)... + torch.compile failed: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 64, 256]], which is output 0 of torch::autograd::CopySlices, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). + +================================================================================ +RESULTS +================================================================================ +Baseline | Fwd: 10.28ms | Bwd: 7.75ms | Total: 18.03ms | λ: +1.0693 | Mem: 22.6MB +A: Batched trajectories | Fwd: 7.20ms | Bwd: 7.43ms | Total: 14.63ms | λ: +1.0849 | Mem: 23.7MB +B: Global renorm | Fwd: 9.49ms | Bwd: 6.99ms | Total: 16.48ms | λ: +0.6573 | Mem: 25.0MB +A+B: Combined | Fwd: 6.55ms | Bwd: 6.76ms | Total: 13.30ms | λ: +0.6575 | Mem: 26.1MB +C: Compiled baseline | Fwd: 8150.22ms | Bwd: 7502.86ms | Total: 15653.07ms | λ: +1.0758 | Mem: 44.5MB +A+B+C: All optimized | Fwd: 0.00ms | Bwd: 0.00ms | Total: 0.00ms | λ: +0.0000 | Mem: 0.0MB + +-------------------------------------------------------------------------------- +SPEEDUP vs BASELINE: +-------------------------------------------------------------------------------- + A: Batched trajectories : 1.23x + B: Global renorm : 1.09x + A+B: Combined : 1.36x + C: Compiled baseline : 0.00x + +-------------------------------------------------------------------------------- +LYAPUNOV VALUE CONSISTENCY CHECK: +-------------------------------------------------------------------------------- + A: Batched trajectories : λ=+1.0849 (diff=0.0156) ✓ + B: Global renorm : λ=+0.6573 (diff=0.4119) ✗ + A+B: Combined : λ=+0.6575 (diff=0.4117) ✗ + C: Compiled baseline : λ=+1.0758 (diff=0.0065) ✓ + +================================================================================ +SCALING TESTS +================================================================================ +Config | Baseline | A+B | Speedup +-------------------------------------------------------------------------------- -- cgit v1.2.3