In this paper, we propose to jointly learn optical flow and stereo matching from unlabeled stereoscopic videos. Our first intuition is stereo matching can be viewed as a special case of optical flow, and we can leverage 3D geometry behind stereo videos to guide the learning of these two forms of correspondences. We then enroll this knowledge into the state-of-the-art self-supervised learning framework, and train one single network to estimate both flow and stereo. Second, we unveil the bottlenecks in prior self-supervised learning approaches, and propose to create a new set of challenging proxy tasks to boost performance. These two insights yield a single model that achieves the highest accuracy among all existing unsupervised flow and stereo methods on KITTI 2012 and 2015 benchmarks. More remarkably, our self-supervised method even outperforms several state-of-the-art fully supervised optical flow learning methods, including PWC-Net and FlowNet2 on KITTI 2012.