Abstract:Environmental perception is a vital part of driverless driving. The present widely used radar is trapped with its expensive cost and unitary information. Based on the deep learning technology, a joint training network referred to as spatial semantic network (SSN) was proposed, which can realize image segmentation and stereo estimation simultaneously. Through spatial mapping, SSN can input binocular images and output semantic point clouds. The SSN was trained by the KITTI dataset, and then the trained model was validated by KITTI test set, of which the verification result showed that the accuracy of image segmentation can reach 82.5%. And for the near points, the accuracy of stereo estimation can reach 95.5%, where the error within 5% was considered as accurate. Moreover, the processing speed can reach 0.135s per frame, generating around 48000 semantic cloud point coordinates per frame, which was close to the realtime requirement under lowspeed conditions, and had strong practical application value.