This document provides an overview of recent research in deep learning for computer vision and natural language processing tasks. It summarizes key papers on neural image caption generation with visual attention, a neural network with an external memory component, and explaining and harnessing adversarial examples. It also lists several datasets for video description, visual question answering, and summarizing videos.