Data-Regularized Q-Learning for Snake

Implemented DrQ framework with random shift augmentation in a custom Snake environment, demonstrating 30% faster convergence and reduced overfitting compared to baseline DQN.

Key Highlights

•30% faster convergence in long training runs (500k+ steps) compared to baseline DQN
•Reduced overfitting through data augmentation with random shift techniques
•Demonstrated 25% higher score stability with DrQ versus baselines in harsh reward environments
•Published results in ICML-style research paper with comprehensive experimental analysis

Implemented DrQ framework with random shift augmentation, double Q-learning, and custom Snake environment with 84×84 pixel observations. Trained convolutional Q-networks with augmented replay buffers. Authored ICML-style research paper comparing DrQ against baseline DQN across harsh and forgiving reward conditions. Demonstrated 30% faster convergence and reduced overfitting in long training runs (500k+ steps).

DrQ training results in forgiving environment

DrQ in a forgiving environment

DQN (Baseline) in a forgiving environment

AI/MLReinforcement LearningResearchComputer Vision

Tech Stack

PythonPyTorchOpenAI GymReinforcement LearningComputer VisionCNNs

Impact & Results

▸30% faster convergence in long training runs (500k+ steps) compared to baseline DQN
▸Reduced overfitting through data augmentation with random shift techniques
▸Demonstrated 25% higher score stability with DrQ versus baselines in harsh reward environments
▸Published results in ICML-style research paper with comprehensive experimental analysis
▸Created custom Snake environment with pixel-based observations for vision-based RL research

Project Links

GitHub Repository View Report Watch Video