Problems with monocular SLAM

Scale Ambiguity
Scale drift
Pure rotation: Monocular SLAM dies under pure rotation, it’s that bad. With a single camera, the baseline (translation between two camera positions) is used to estimate the depth of the scene being observed, a process called triangulation. If a camera only rotates, the baseline is zero and no new points can be triangulated. What makes things worse is that the apparent motion on the image plane is greater under rotation than under translation. Effectively, the point for which we knew the depth whizz out of the field of view and no new points can be estimated since there is no baseline. The result, tracking failure!

State of the (SLAM) art

A majority of SLAM systems share several common components:

a feature detector that finds point of interest within the image (features),
a feature descriptor that matches tracks features from one image to the next,
an optimization backend that uses said correspondences to build a geometry of the scene (map) and find the position of the robot,
a loop closure detection algorithm that recognizes previously visited areas and adds constraints to the map.

Due to its similarities to well-studied image classification and retrieval problems, loop closure has the most potential to be solved with DL techniques. It's also an important issue, as correct loop closures guarantee the consistency of the SLAM map and improve all-around accuracy. Computational efficiency and robustness to false positives are the most important characteristics of a successful loop closure subsystem.

Referências

https://towardsdatascience.com/slam-in-the-era-of-deep-learning-e8a15e0d16f3

https://nicolovaligi.com/articles/deep-learning-robotics-slam/ (não peguei tudo deste link)

Content:

Introduction to capturing cloud points

Depth Estimation on Camera Images using DenseNets

Generate a point cloud

ICP

Improving VSLAM with transfer learning (LIFT-SLAM)