Population-based neuroevolution and policy-gradient / value-based RL in one canvas: GA, ES, adaptive ES, REINFORCE, A2C, DQN, and a clipped PPO-style update. Organic rendering, human play, and a split-screen duel against the learned champion — designed to read like a serious research instrument, not a toy.
The policy never sees the raw grid. It receives normalized ray distances to walls and body, food direction, heading, and auxiliary scalars — encouraging generalization over memorization.
Fitness blends score², survival, and food-distance shaping. GA: tournament selection + crossover + Gaussian mutation. ES: elite parents + mutation-only offspring. Adaptive ES adjusts σ using a 1/5-style success heuristic on generation-to-generation improvement.
Shared MLP 24→32→4 (+ value head or target Q-network). A2C: TD(0) advantage with joint policy–value backprop. DQN: experience replay + periodic target sync. PPO-style: clipped importance ratio on on-policy updates. REINFORCE: Monte Carlo returns with a learned baseline.
No server, no dataset download, no hidden API keys. What you see training is what runs in your tab — suitable for reproducible demos and open review.
Tip: switch Optimization to compare evolution vs RL without changing the environment.