summaryrefslogtreecommitdiff
path: root/scripts/build_memory_bank.py
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-02-16 14:44:42 -0600
committerYurenHao0426 <Blackhao0426@gmail.com>2026-02-16 14:44:42 -0600
commit09d50e47860da0035e178a442dc936028808a0b3 (patch)
tree9d651b0c7d289a9a0405953f2da989a3c431f147 /scripts/build_memory_bank.py
parentc90b48e3f8da9dd0f8d2ae82ddf977436bb0cfc3 (diff)
Add memory centering, grid search experiments, and energy visualizationsHEADmaster
- Add centering support to MemoryBank (center_query, apply_centering, mean persistence in save/load) to remove centroid attractor in Hopfield dynamics - Add center flag to MemoryBankConfig, device field to PipelineConfig - Grid search scripts: initial (β≤8), residual, high-β, and centered grids with dedup-based LLM caching (89-91% call savings) - Energy landscape visualization: 2D contour, 1D profile, UMAP, PCA heatmap comparing centered vs uncentered dynamics - Experiment log (note.md) documenting 4 rounds of results and root cause analysis of centroid attractor problem - Key finding: β_critical ≈ 37.6 for centered memory; best configs beat FAISS baseline by +3-4% F1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat (limited to 'scripts/build_memory_bank.py')
-rw-r--r--scripts/build_memory_bank.py4
1 files changed, 2 insertions, 2 deletions
diff --git a/scripts/build_memory_bank.py b/scripts/build_memory_bank.py
index 2aff828..0fc1c51 100644
--- a/scripts/build_memory_bank.py
+++ b/scripts/build_memory_bank.py
@@ -50,13 +50,13 @@ def main() -> None:
logger.info("Loaded %d passages", len(passages))
# Encode passages in batches
- encoder = Encoder(encoder_config)
+ encoder = Encoder(encoder_config, device=args.device)
all_embeddings = []
for i in tqdm(range(0, len(passages), encoder_config.batch_size), desc="Encoding"):
batch = passages[i : i + encoder_config.batch_size]
emb = encoder.encode(batch) # (batch_size, d)
- all_embeddings.append(emb.cpu())
+ all_embeddings.append(emb.cpu()) # Always store on CPU for saving
embeddings = torch.cat(all_embeddings, dim=0) # (N, d)
logger.info("Encoded %d passages -> embeddings shape: %s", len(passages), embeddings.shape)