AUT Journal of Mathematics and Computing

AUT Journal of Mathematics and Computing

Loop closure detection in visual appearance-based SLAM using deep autoencoders

Document Type : Original Article

Authors
1 Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran
2 Staffordshire University, School of Digital, Technologies and Arts, College Rd, Stoke-on-Trent ST4 2DE, United Kingdom
Abstract
Loop closure detection (LCD) and trajectory generation are critical components of visual simultaneous localization and mapping (vSLAM). In this paper, we aim to solve the LCD and trajectory generation problem in vSLAM using a newly devised vector quantization (VQ) algorithm. The proposed new VQ algorithm is constructed based on a selfsupervised deep convolutional autoencoder (AE). The new VQ step is then incorporated into the two famous SLAM algorithms fast appearance-based mapping (FABMAP) and ORB-SLAM, which we now call AE-FABMAP and AE-ORB-SLAM, respectively. Experiments show that using self-supervised autoencoders in the VQ step is far more efficient in terms of speed and memory consumption with respect to other methods such as graph convolutional neural networks. Furthermore, the newly presented algorithms, AE-ORB-SLAM and AE-FABMAP outperform the standard FABMAP2 and ORB SLAM, and in large-scale SLAM, the new approaches improve the accuracy and recall of the LCD.
Keywords
Subjects

[1] M. Abouzahir, A. Elouardi, R. Latif, S. Bouaziz, and A. Tajer, Embedding SLAM algorithms: Has it come of age?, Robotics and Autonomous Systems, 100 (2018), pp. 14–26.
[2] S. Aldegheri, N. Bombieri, D. D. Bloisi, and A. Farinelli, Data flow ORB-SLAM for real-time per- formance on embedded GPU boards, in 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 5370–5375.
[3] Y. Bar-Shalom, X.-R. Li, and T. Kirubarajan, Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software, John Wiley & Sons, Inc., 2001.
[4] K. Boikos and C.-S. Bouganis, A high-performance system-on-chip architecture for direct tracking for SLAM, in 2017 27th International Conference on Field Programmable Logic and Applications (FPL), 2017, pp. 1–7.
[5] M. Bosse, P. Newman, J. Leonard, and S. Teller, Slam in large-scale cyclic environments using the atlas framework, International Journal of Robotic Research - IJRR, (2003).
[6] H. I. Christensen and O. Khatib, eds., Robotics Research : The 15th International Symposium ISRR, Springer International Publishing, 2017.
[7] M. Cummins, Probabilistic Localization and Mapping in Appearance Space, PhD thesis, University of Oxford (United Kingdom), 2009.
[8] M. Cummins and P. Newman, FAB-MAP: Probabilistic localization and mapping in the space of appearance, The International Journal of Robotics Research, 27 (2008), pp. 647–665.
[9] A. Davison, DSO: Direct sparse odometry. Available online: https://github.com/JakobEngel/dso, accessed on 21 January 2022.
[10] , Oxford-ptam. Available online: https://github.com/Oxford-PTAM/PTAM-GPL, accessed on 21 January 2022.
[11] , Scenelib 1.0. 2006. Available online: https://www.doc.ic.ac.uk/~ajd/Scene/index.html, accessed on 21 January 2022.
[12] , Svo. Available online: https://github.com/uzh-rpg/rpg_svo, accessed on 21 January 2022.
[13] N. de Freitas and S. J. Godsill, Rao-blackwellised particle filtering for dynamic bayesian networks, in Advances in Neural Information Processing Systems, vol. 16, 2003, pp. 489–496.
[14] J. Engel, LSD-SLAM: Large-scale direct monocular SLAM. https://github.com/tum-vision/lsd_slam, accessed on 21 January 2022.
[15] J. Engel, T. Sch¨ops, and D. Cremers, Lsd-slam: Large-scale direct monocular SLAM, in Computer Vision - ECCV 2014, Cham, 2014, Springer International Publishing, pp. 834–849.
[16] C. Estrada, J. Neira, and J. Tardos, Hierarchical SLAM: real-time accurate mapping of large environ- ments, IEEE Transactions on Robotics, 21 (2005), pp. 588–596.
[17] C. Forster, M. Pizzoli, and D. Scaramuzza, SVO: Fast semi-direct monocular visual odometry, in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 15–22.
[18] P. Foster, OpenDTAM. https://github.com/anuranbaka/OpenDTAM, accessed on 21 January 2022.
[19] D. G´alvez-L´opez and J. D. Tard´os, Bags of binary words for fast place recognition in image sequences, IEEE Transactions on Robotics, 28 (2012), pp. 1188–1197.
[20] E. Garcia-Fidalgo and A. Ortiz, ibow-lcd: An appearance-based loop closure detection approach using incremental bags of binary words, CoRR, abs/1802.05909 (2018).
[21] A. Geiger, P. Lenz, and R. Urtasun, Are we ready for autonomous driving? the kitti vision benchmark suite, in Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
[22] A. Glover, W. Maddern, M. Warren, S. Reid, M. Milford, and G. Wyeth, OpenFABMAP: An open source toolbox for appearance-based loop closure detection, in The International Conference on Robotics and Automation, Saint Paul, MN, USA, 2012, IEEE, pp. 4730–4735.
[23] K. Hajebi, Efficient Visual Search in Appearance-based SLAM, PhD thesis, University of Alberta, 2015.
[24] K. Hajebi and H. Zhang, An efficient index for visual search in appearance-based SLAM, in 2014 IEEE International Conference on Robotics and Automation (ICRA), 2014, pp. 353–358.
[25] A. Huang, A. Bachrach, P. Henry, M. Krainin, D. Maturana, D. Fox, and N. Roy, Visual odometry and mapping for autonomous flight using an RGB-D camera, (2011).
[26] A. S. Huang, A. Bachrach, P. Henry, M. Krainin, D. Maturana, D. Fox, and N. Roy, Visual odometry and mapping for autonomous flight using an RGB-D camera, in Christensen and Khatib [6].
[27] N. Kejriwal, S. Kumar, and T. Shibata, High performance loop closure detection using bag of word pairs, Robotics and Autonomous Systems, 77 (2016), pp. 55–65.
[28] S. Khan and D. Wollherr, Ibuild: Incremental bag of binary words for appearance based loop closure detection, in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 5441–5447.
[29] A. Kim and R. M. Eustice, Combined visually and geometrically informative link hypothesis for pose- graph visual SLAM using bag-of-words, in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011, pp. 1647–1654. 
[30] G. Klein and D. Murray, Parallel tracking and mapping for small ar workspaces, in 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 2007, pp. 225–234.
[31] , Parallel tracking and mapping on a camera phone, in 2009 8th IEEE International Symposium on Mixed and Augmented Reality, 2009, pp. 83–86.
[32] J. Leonard and P. Newman, Consistent, convergent, and constant-time SLAM, in IJCAI’03: Proceedings of the 18th international joint conference on Artificial intelligence, IJCAI’03, San Francisco, CA, USA, 2003, Morgan Kaufmann Publishers Inc., pp. 1143–1150.
[33] B. Liu, F. Tang, Y. Fu, Y. Yang, and Y. Wu, A flexible and efficient loop closure detection based on motion knowledge, in 2021 IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 11241–11247.
[34] A. Mostafanasab, M. B. Menhaj, M. Shamshirsaz, and R. Fesharakifard, A novel mobile robot path planning method based on neuro-fuzzy controller, AUT Journal of Mathematics and Computing, 6 (2025), pp. 41–53.
[35] R. Mur-Artal, J. M. M. Montiel, and J. D. Tard´os, ORB-SLAM: A versatile and accurate monocular SLAM system, IEEE Transactions on Robotics, 31 (2015), pp. 1147–1163.
[36] , ORB-SLAM: a versatile and accurate monocular SLAM system, (2015).
[37] R. Mur-Artal and J. D. Tardos, ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras, IEEE Transactions on Robotics, 33 (2017), pp. 1255–1262.
[38] P. Newman, M. Chandran-Ramesh, D. Cole, M. Cummins, A. Harrison, I. Posner, and D. Schroeter, Describing, navigating and recognising urban spaces - building an end-to-end SLAM system, in Proc. of the Int. Symposium of Robotics Research (ISRR), Hiroshima, Japan, 2007.
[39] P. Ondruska, P. Kohli, and S. Izadi, Mobilefusion: Real-time volumetric surface reconstruction and dense tracking on mobile phones, IEEE Transactions on Visualization and Computer Graphics, 21 (2015), pp. 1251– 1258.
[40] L. Paz, P. Jensfelt, J. Tard´os, and J. Neira, EKF SLAM updates in o(n) with divide and conquer SLAM, in Proceedings 2007 IEEE International Conference on Robotics and Automation, IEEE International Conference on Robotics and Automation ICRA, 2007, pp. 1657–1663.
[41] T. Pire, T. Fischer, G. Castro, P. D. Crist´oforis, J. Civera, and J. J. Berlles, S-ptam: Stereo parallel tracking and mapping, Robotics and Autonomous Systems, 93 (2017), pp. 27–42.
[42] S. Se, D. G. Lowe, and J. Little, Vision-based global localization and mapping for mobile robots, IEEE Transactions on Robotics, 21 (2005), pp. 364–375.
[43] J. Soares and M. Meggiolaro, Keyframe-based RGB-D SLAM for mobile robots with visual odometry in indoor environments using graph optimization, in 2018 Latin American Robotic Symposium, 2018 Brazilian Symposium on Robotics (SBR) and 2018 Workshop on Robotics in Education (WRE), 2018, pp. 94–99.
[44] J. Sturm, N. Engelhard, F. Endres, W. Burgard, and D. Cremers, A benchmark for the evaluation of rgb-d SLAM systems, in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 573–580.
[45] V. Sundar, CNN SLAM. Available online: https://github.com/iitmcvg/CNN_SLAM, accessed on 21 January 2022.
[46] V. Turchenko, E. Chalmers, and A. Luczak, A deep convolutional auto-encoder with pooling – unpooling layers in caffe, CoRR, abs/1701.04949 (2017).
[47] V. Turchenko, E. Chalmers, and A. Luczak, A deep convolutional auto-encoder with pooling – unpooling layers in caffe, International Journal of Computing, 18 (2019), pp. 8–31.
[48] B. Vincke, A. Elouardi, and A. Lambert, Design and evaluation of an embedded system based SLAM applications, in 2010 IEEE/SICE International Symposium on System Integration, 2010, pp. 224–229. 
 [49] B. Vincke, A. Elouardi, A. Lambert, and A. Merigot, Efficient implementation of EKF-SLAM on a multi-core embedded system, in IECON 2012 - 38th Annual Conference on IEEE Industrial Electronics Society, 2012, pp. 3049–3054.
[50] T. Ying, H. Yan, Z. Li, K. Shi, and X. Feng, Loop closure detection based on image covariance matrix matching for visual SLAM, International Journal of Control, Automation and Systems, 19 (2021).
[51] J. Yu, F. Gao, J. Cao, C. Yu, Z. Zhang, Z. Huang, Y. Wang, and H. Yang, CNN-based monoc- ular decentralized SLAM on embedded fpga, in 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2020, pp. 66–73.
[52] A. Zarringhalam, AE-FABMAP. https://github.com/amir-1992/AE-FABMAP-SLAM, 2024.
[53] , AE-ORB. https://github.com/amir-1992/AE-ORB-SLAM, 2024.
[54] A. Zarringhalam, S. S. Ghidary, and A. M. Khorasani, Self-supervised vector-quantization in visual SLAM using deep convolutional autoencoders, 2022.
[55] A. Zarringhalam, S. Shiry Ghidary, A. Mohades, and S.-A. Sadegh-Zadeh, CUDA and OpenMp implementation of boolean matrix product with applications in visual SLAM, Algorithms, 16 (2023).
[56] A. Zarringhalam, S. Shiry Ghidary, A. Mohades, and S.-A. Sadegh-Zadeh, Semisupervised vector quantization in visual SLAM using hgcn, International Journal of Intelligent Systems, 2024 (2024), p. 9992159.
[57] Q. Zhong and X. Fang, A BigBiGAN-based loop closure detection algorithm for indoor visual SLAM, Journal of Electrical and Computer Engineering, 2021 (2021)