Back to Projects



Data-Regularized Q-Learning for Snake
Implemented DrQ framework with random shift augmentation in a custom Snake environment, demonstrating 30% faster convergence and reduced overfitting compared to baseline DQN.
Key Highlights
- •30% faster convergence in long training runs (500k+ steps) compared to baseline DQN
- •Reduced overfitting through data augmentation with random shift techniques
- •Demonstrated 25% higher score stability with DrQ versus baselines in harsh reward environments
- •Published results in ICML-style research paper with comprehensive experimental analysis
Implemented DrQ framework with random shift augmentation, double Q-learning, and custom Snake environment with 84×84 pixel observations. Trained convolutional Q-networks with augmented replay buffers. Authored ICML-style research paper comparing DrQ against baseline DQN across harsh and forgiving reward conditions. Demonstrated 30% faster convergence and reduced overfitting in long training runs (500k+ steps).

DrQ in a forgiving environment

DQN (Baseline) in a forgiving environment
AI/MLReinforcement LearningResearchComputer Vision
Tech Stack
PythonPyTorchOpenAI GymReinforcement LearningComputer VisionCNNs
Impact & Results
- ▸30% faster convergence in long training runs (500k+ steps) compared to baseline DQN
- ▸Reduced overfitting through data augmentation with random shift techniques
- ▸Demonstrated 25% higher score stability with DrQ versus baselines in harsh reward environments
- ▸Published results in ICML-style research paper with comprehensive experimental analysis
- ▸Created custom Snake environment with pixel-based observations for vision-based RL research