The Microsoft Kinect depth sensor provides depth maps of indoor environments with unknown depth gaps because of infrared (IR) reflectance properties of the objects inside the scene and occlusion. In this paper, we propose a grid-based hierarchical learning algorithm for predicting the depth values of the gaps inside the Kinect's depth map. The scene is divided into hierarchical grids and the depth of each grid is modeled using supervised learning. The learned models can be directly applied to upcoming frames without repeating the learning procedure and occluded regions in the background can be recovered. The proposed algorithm outperforms an inpainting-based approach quantitatively and successfully recovers the objects that are occluded at the background.