<feed xmlns='http://www.w3.org/2005/Atom'>
<title>blazing8.git, branch main</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/'/>
<entry>
<title>Add 2-player PPO training log (500k episodes, 60.4% vs greedy)</title>
<updated>2026-02-22T21:32:53+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>blackhao0426@gmail.com</email>
</author>
<published>2026-02-22T21:32:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=3d397d15dda5a6ff52f2c9a3244e6772d06a21a0'/>
<id>3d397d15dda5a6ff52f2c9a3244e6772d06a21a0</id>
<content type='text'>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Raise entropy floor to 0.02, increase eval games to 2000</title>
<updated>2026-02-22T18:55:03+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T18:55:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=1f4ee77d131649b674a80f3e43804acf323b462c'/>
<id>1f4ee77d131649b674a80f3e43804acf323b462c</id>
<content type='text'>
Prevents premature convergence with higher entropy minimum and
reduces eval variance with 4x more evaluation games.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Prevents premature convergence with higher entropy minimum and
reduces eval variance with 4x more evaluation games.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Change default eval_every from 10000 to 2500</title>
<updated>2026-02-22T18:19:57+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T18:19:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=ba4715c767d6538fc6f0c6c7e92073938ea03f2c'/>
<id>ba4715c767d6538fc6f0c6c7e92073938ea03f2c</id>
<content type='text'>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Use auto-calibrated collect_batch in Colab notebook</title>
<updated>2026-02-22T18:17:41+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T18:17:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=7345f23d69fe40313907c9eac3094c0f05673166'/>
<id>7345f23d69fe40313907c9eac3094c0f05673166</id>
<content type='text'>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add training curve plots to Colab notebook</title>
<updated>2026-02-22T18:14:42+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T18:14:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=c41ea8629ce1351e415cce1551a7b52260a66790'/>
<id>c41ea8629ce1351e415cce1551a7b52260a66790</id>
<content type='text'>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add entropy annealing to escape greedy local minimum after warmup</title>
<updated>2026-02-22T18:09:01+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T18:09:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=0735c68037566ae6731ac5dd349329b1c8d44851'/>
<id>0735c68037566ae6731ac5dd349329b1c8d44851</id>
<content type='text'>
After behavioral cloning warmup, policy is very peaked on greedy
actions. Start with higher entropy coefficient (default: 5x ent_coef)
and linearly decay to target, encouraging exploration of non-greedy
strategies early in training.

New arg: --ent_start (default: 5x --ent_coef)

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
After behavioral cloning warmup, policy is very peaked on greedy
actions. Start with higher entropy coefficient (default: 5x ent_coef)
and linearly decay to target, encouraging exploration of non-greedy
strategies early in training.

New arg: --ent_start (default: 5x --ent_coef)

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Auto-calibrate collect_batch when not specified</title>
<updated>2026-02-22T18:06:23+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T18:06:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=800e1f1f33d93cb7a1812dff1dc0ef85289ef075'/>
<id>800e1f1f33d93cb7a1812dff1dc0ef85289ef075</id>
<content type='text'>
Benchmarks batch sizes [64,128,256,512] and picks smallest
within 10% of peak throughput. Smaller batches = more frequent
PPO updates = better training quality at similar speed.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Benchmarks batch sizes [64,128,256,512] and picks smallest
within 10% of peak throughput. Smaller batches = more frequent
PPO updates = better training quality at similar speed.

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix total_mem → total_memory in Colab GPU check</title>
<updated>2026-02-22T18:01:17+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T18:01:17+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=dda6db0777620f8139bd476e27e6b275c0679358'/>
<id>dda6db0777620f8139bd476e27e6b275c0679358</id>
<content type='text'>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Fix invalid notebook cell schema (markdown with execution_count)</title>
<updated>2026-02-22T17:59:01+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T17:59:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=7e15218730fe86b88ac0a53cc84bf929416a0687'/>
<id>7e15218730fe86b88ac0a53cc84bf929416a0687</id>
<content type='text'>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Batched game collection for ~7x training speedup</title>
<updated>2026-02-22T17:56:48+00:00</updated>
<author>
<name>haoyuren</name>
<email>13851610112@163.com</email>
</author>
<published>2026-02-22T17:56:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/blazing8.git/commit/?id=8392bdfc10f92e61303e39bb356522ee491ce97c'/>
<id>8392bdfc10f92e61303e39bb356522ee491ce97c</id>
<content type='text'>
- collect_games_batch(): run N games in parallel with single batched forward pass per step
- evaluate_vs_greedy_batch(): batched evaluation replacing sequential eval
- Add --collect_batch CLI arg for configurable parallel game count
- Use torch.inference_mode() for faster collection
- Update Colab notebook: GPU info, --collect_batch, log download cell

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
- collect_games_batch(): run N games in parallel with single batched forward pass per step
- evaluate_vs_greedy_batch(): batched evaluation replacing sequential eval
- Add --collect_batch CLI arg for configurable parallel game count
- Use torch.inference_mode() for faster collection
- Update Colab notebook: GPU info, --collect_batch, log download cell

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
