Sensor fusion is the process of acquiring and fusing data from two different sensors for environment perception. With the increase in the level of automation needed for almost all the systems, depth perception has become an integral part of environment perception. There are various sensors available for the same, but most of them are expensive and provide insufficient data. This paper describes in detail the sensor fusion of two cameras for depth perception. The use of cameras not only allows the user to calculate depth, it also opens up a new dimension of using machine learning algorithms because of the availability of image frames. The depth of an object is calculated using the triangulation method. The paper focuses on the use of triangulation methods for depth analysis and it is used along with ultrasonic sensors for quantifying the error. A comparative analysis of the results obtained from the cameras, the sensor, and the actual distance has led to quantifying the error. Use of cameras for depth perception helps to eliminate the use of extra sensors, thereby reducing the complexity and cost, and thus increasing the efficiency. © 2021 IEEE.