Back to Projects
Data-Regularized Q-Learning for Snake

Data-Regularized Q-Learning for Snake

Implemented DrQ framework with random shift augmentation in a custom Snake environment, demonstrating 30% faster convergence and reduced overfitting compared to baseline DQN.

Key Highlights

  • 30% faster convergence in long training runs (500k+ steps) compared to baseline DQN
  • Reduced overfitting through data augmentation with random shift techniques
  • Demonstrated 25% higher score stability with DrQ versus baselines in harsh reward environments
  • Published results in ICML-style research paper with comprehensive experimental analysis
Implemented DrQ framework with random shift augmentation, double Q-learning, and custom Snake environment with 84×84 pixel observations. Trained convolutional Q-networks with augmented replay buffers. Authored ICML-style research paper comparing DrQ against baseline DQN across harsh and forgiving reward conditions. Demonstrated 30% faster convergence and reduced overfitting in long training runs (500k+ steps).
DrQ training results in forgiving environment

DrQ in a forgiving environment

DQN baseline training results

DQN (Baseline) in a forgiving environment

AI/MLReinforcement LearningResearchComputer Vision

Tech Stack

PythonPyTorchOpenAI GymReinforcement LearningComputer VisionCNNs

Impact & Results

  • 30% faster convergence in long training runs (500k+ steps) compared to baseline DQN
  • Reduced overfitting through data augmentation with random shift techniques
  • Demonstrated 25% higher score stability with DrQ versus baselines in harsh reward environments
  • Published results in ICML-style research paper with comprehensive experimental analysis
  • Created custom Snake environment with pixel-based observations for vision-based RL research