blob: 8589010b6cb9dde8c2fd5c703324906b1e397997 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
|
============================================================
Lyapunov Speedup Benchmark
Job ID: 14632859 | Node: gpub073
Start: Tue Dec 30 06:43:26 CST 2025
============================================================
NVIDIA A40, 46068 MiB
============================================================
================================================================================
LYAPUNOV COMPUTATION SPEEDUP BENCHMARK
================================================================================
Batch size: 64
Timesteps: 4
Hidden dims: [64, 128, 256]
Device: cuda
================================================================================
[1/6] Benchmarking Baseline...
[2/6] Benchmarking Approach A (batched)...
[3/6] Benchmarking Approach B (global renorm)...
[4/6] Benchmarking Approach A+B (combined)...
[5/6] Benchmarking Approach C (compiled baseline)...
[6/6] Benchmarking A+B+C (all optimizations)...
torch.compile failed: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [2, 64, 256]], which is output 0 of torch::autograd::CopySlices, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
================================================================================
RESULTS
================================================================================
Baseline | Fwd: 10.28ms | Bwd: 7.75ms | Total: 18.03ms | λ: +1.0693 | Mem: 22.6MB
A: Batched trajectories | Fwd: 7.20ms | Bwd: 7.43ms | Total: 14.63ms | λ: +1.0849 | Mem: 23.7MB
B: Global renorm | Fwd: 9.49ms | Bwd: 6.99ms | Total: 16.48ms | λ: +0.6573 | Mem: 25.0MB
A+B: Combined | Fwd: 6.55ms | Bwd: 6.76ms | Total: 13.30ms | λ: +0.6575 | Mem: 26.1MB
C: Compiled baseline | Fwd: 8150.22ms | Bwd: 7502.86ms | Total: 15653.07ms | λ: +1.0758 | Mem: 44.5MB
A+B+C: All optimized | Fwd: 0.00ms | Bwd: 0.00ms | Total: 0.00ms | λ: +0.0000 | Mem: 0.0MB
--------------------------------------------------------------------------------
SPEEDUP vs BASELINE:
--------------------------------------------------------------------------------
A: Batched trajectories : 1.23x
B: Global renorm : 1.09x
A+B: Combined : 1.36x
C: Compiled baseline : 0.00x
--------------------------------------------------------------------------------
LYAPUNOV VALUE CONSISTENCY CHECK:
--------------------------------------------------------------------------------
A: Batched trajectories : λ=+1.0849 (diff=0.0156) ✓
B: Global renorm : λ=+0.6573 (diff=0.4119) ✗
A+B: Combined : λ=+0.6575 (diff=0.4117) ✗
C: Compiled baseline : λ=+1.0758 (diff=0.0065) ✓
================================================================================
SCALING TESTS
================================================================================
Config | Baseline | A+B | Speedup
--------------------------------------------------------------------------------
|