research/flossing/flossing_suite/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89

# Flossing Suite

This directory is a reproducible wrapper around the existing flossing code.
It separates three questions:

1. Does the Engelken algorithm itself reproduce on a vanilla RNN toy task?
2. Does a faithful pre/interfloss analogue help TRM/HRM when the flossing phase is separate from task loss?
3. Do our one-sided variants (`top1_cf`, `spectrum_cf`, `volume_cf`) behave differently from Engelken's two-sided L2 target?

## Important Algorithmic Distinctions

- `engelken_python_flossing.py` is the toy RNN faithful port. It keeps the paper-style separate flossing phase, no task/floss mixed objective, flosses only input/recurrent/bias parameters, uses differentiable QR, and optimizes `mean((lambda_i - lambda_star)^2)`.
- `step7_interfloss.py` is the HRM/TRM analogue. It also uses separate floss-only episodes. Ordinary training steps use only supervised ACT loss.
- `step7_interfloss.py --floss-mode engelken_l2` is the Rainer-style two-sided target.
- `top1_cf`, `spectrum_cf`, and `volume_cf` are our one-sided contractive variants, not the paper method.
- KL preservation is optional and only applies during floss-only episodes.

## Current Known Sanity Check

Existing toy RNN result:

- Baseline no floss: final eval accuracy about `0.777`.
- Prefloss: final eval accuracy about `0.997`.

This means the Python port can reproduce a positive toy result. Negative HRM/TRM results should therefore be interpreted as model/task transfer issues, not simply "flossing code cannot work."

## Recommended Workflow

Run smoke tests:

```bash
bash research/flossing/flossing_suite/smoke_test.sh 0
```

Launch toy RNN paper-style suite:

```bash
bash research/flossing/flossing_suite/launch_toy_official_suite.sh
```

Launch TRM faithful Rainer-style suite:

```bash
GPU_BASE=0 GPU_PREFLOSS=1 GPU_INTER=3 bash research/flossing/flossing_suite/launch_trm_faithful_suite.sh
```

Launch TRM CF/volume variants:

```bash
GPU_TOP1=0 GPU_VOLUME=1 GPU_KL=3 bash research/flossing/flossing_suite/launch_trm_variant_suite.sh
```

Summarize all available flossing logs:

```bash
/home/yurenh2/miniconda3/envs/rrm/bin/python research/flossing/flossing_suite/summarize_flossing.py
```

Check active jobs:

```bash
bash research/flossing/flossing_suite/status.sh
```

Wait for current TRM faithful jobs and refresh summary:

```bash
bash research/flossing/flossing_suite/watch_and_summarize.sh
```

Outputs go to:

- `results/toy_rnn/`
- `results/trm_faithful/`
- `results/trm_variants/`
- `results/smoke/`
- `results/summary/`

## Existing Historical Results Included By Summarizer

The summarizer also scans:

- `research/flossing/engelken_python/*.json`
- `research/flossing/engelken_paper_faithful/*.json`
- `research/flossing/step6_*.json`
- `research/flossing/step7_*.json`
- `research/flossing/flossing_suite/results/**/*.json`

This keeps old negative/positive evidence visible without rerunning everything.